Rows and Columns
  • 16 Oct 2021
  • 2 Minutes to read
  • Dark
    Light

Rows and Columns

  • Dark
    Light

Article summary

At times it becomes necessary to manipulate the data in columns and rows to clean up data or make accessing the data more efficient.

Apply

The .apply() function applies a function to every row or column in a data frame. There are two main applications for using .apply():

  • inplace changes
  • new column creation

An inplace change generally involves cleaning up the data or converting the data type. For example, there is an extra comma found in the full_name column in the table below. It is possible to remove that comma using an inplace change.


extract.ageextract.patient_full_nameextract.patient_id
043Gail Hill,94451
143Gail Hill,94451
236Rhoda Sweet,92830
313Jescie Lyons,92721
470Lyle Cash,92570

To remove the comma, we can create a custom function and use .apply():

dataframe['extract.patient_full_name'] = dataframe['extract.patient_full_name'].apply(lambda value: value.replace(',', ''))

The code above is applying a lambda function, which removes commas from the extract.patient_full_name column and then overwrites the old extract.patient_full_name column.


extract.ageextract.patient_full_nameextract.patient_id
043Gail Hill94451
143Gail Hill94451
236Rhoda Sweet92830
313Jescie Lyons92721
470Lyle Cash92570

Another use for .apply is to convert columns to numeric, date, or object (string) types by including the line import pandas as pd at the beginning of the report. By converting strings to numeric objects, the values can be used in mathematical functions.

import pandas as pd
...
dataframe = dataframe.apply(pd.to_numeric, errors="ignore")

Splitting Columns

Splitting a column can be useful if a column contains multiple pieces of information or if a complex operation needs an intermediate calculation for simplicity or efficiency.

In the above table, it may be desirable to split the full name into two columns for the first and last names. To do so, use the .apply() function once again:

dataframe['extract.patient_first_name'] = dataframe['extract.patient_full_name'].apply(lambda value: value.split()[0])
dataframe['extract.patient_last_name'] = dataframe['extract.patient_full_name'].apply(lambda value: value.split()[1])

extract.ageextract.patient_full_nameextract.patient_idextract.patient_first_nameextract.patient_last_name
043Gail Hill94451GailHill
143Gail Hill94451GailHill
236Rhoda Sweet92830RhodaSweet
313Jescie Lyons92721JescieLyons
470Lyle Cash92570LyleCash

Combining Columns

It is also possible to combine columns. In the example below, the first name and patient name concatenate to become a single column:

dataframe['nameId'] = dataframe.apply(lambda row: row['extract.patient_first_name'] + str(row['extract.patient_id']), axis=1)

This time the .apply() function is working across the entire row so that multiple columns can be used.


extract.ageextract.patient_full_nameextract.patient_idextract.patient_first_nameextract.patient_last_namenameId
043Gail Hill94451GailHillGail94451
143Gail Hill94451GailHillGail94451
236Rhoda Sweet92830RhodaSweetRhoda92830
313Jescie Lyons92721JescieLyonsJescie92721
470Lyle Cash92570LyleCashLyle92570

Useful Column Functions

The list below contains some of the more useful pre-built functions available when using a data frame. Either assign a variable to these functions, dataframe_sum = dataframe['extract.age'].sum() or print the function directly, print(dataframe['extract.age'].sum()).

  • mean
    dataframe['extract.age'].mean() returns 41.0
  • sum
    dataframe['extract.age'].sum() returns 205
  • unique
    dataframe['extract.age'].unique() returns [43 36 13 70]
  • tolist
    dataframe['extract.age'].tolist() returns [43, 43, 36, 13, 70]
  • describe
    dataframe['extract.age'].describe()returns the following information:

    count     5.000000
    mean     41.000000
    std      20.359273
    min      13.000000
    25%      36.000000
    50%      43.000000
    75%      43.000000
    max      70.000000
    Name: extract.age, dtype: float64

    dataframe.describe() returns the following table:


    extract.ageextract.patient_id
    count5.0000005.000000
    mean41.00000093404.600000
    std20.359273959.679269
    min13.00000092570.000000
    25%36.00000092721.000000
    50%43.00000093830.000000
    75%43.00000094451.000000
    max70.00000094451.000000

Stack and Unstack

The .stack() function creates a multi-index based on the columns. To print the function, use print(dataframe.stack()), where dataframe is the variable assigned to the query.

0extract.age43

extract.patient_full_nameGail Hill

extract.patient_id94451

extract.patient_first_nameGail

extract.patient_last_nameHill

nameIdGail94451
1extract.age43

extract.patient_full_nameGail Hill

extract.patient_id94451

extract.patient_first_nameGail

extract.patient_last_nameHill

nameIdGail94451
2extract.age36

extract.patient_full_nameRhoda Sweet

extract.patient_id92830

extract.patient_first_nameRhoda

extract.patient_last_nameSweet

nameIdRhoda92830
3extract.age13

extract.patient_full_nameJescie Lyons

extract.patient_id92721

extract.patient_first_nameJescie

extract.patient_last_nameLyons

nameIdJescie92721
4extract.age70

extract.patient_full_nameLyle Cash

extract.patient_id92570

extract.patient_first_nameLyle

extract.patient_last_nameCash

nameIdLyle92570
dtype: object

dataframe.loc[0, 'extract.age'] returns 43

The .unstack() function produces the opposite effect of the .stack() function.


Transposing Columns to Rows

The transpose() function switches the rows and columns.


extract.ageextract.patient_full_nameextract.patient_idextract.patient_first_nameextract.patient_last_namenameId
043Gail Hill94451GailHillGail94451
143Gail Hill94451GailHillGail94451
236Rhoda Sweet92830RhodaSweetRhoda92830
313Jescie Lyons92721JescieLyonsJescie92721
470Lyle Cash92570LyleCashLyle92570

Using the code below transposes the table:

dataframe = dataframe.transpose()

01234
extract.age4343361370
extract.patient_full_nameGail HillGail HillRhoda SweetJescie LyonsLyle Cash
extract.patient_id9445194451928309272192570
extract.patient_first_nameGailGailRhodaJescieLyle
extract.patient_last_nameHillHillSweetLyonsCash
nameIdGail94451Gail94451Rhoda92830Jescie92721Lyle92570

Was this article helpful?