Home > Blockchain >  How to add a new column attribute in a pandas dataframe using another row that appears before as a h
How to add a new column attribute in a pandas dataframe using another row that appears before as a h

Time:02-05

I have a DataFrame like this one:

import numpy as np
import pandas as pd

df = pd.DataFrame({'Sl.No.': {0: nan, 1: 1.0, 2: 2.0, 3: 3.0, 
                              4: nan, 5: 4.0, 6: 5.0, 7: 6.0},
                   'Location': {0: 'AREA 1', 1: 'Loc A', 2: 'Loc B', 3: 'Loc C',  
                                4: 'AREA 2', 5: 'Loc D', 6: 'Loc E', 7: 'Loc F'},
                   'Total Depth': {0: nan, 1: 5.6, 2: 6.8, 3: 2.9, 
                                   4: nan, 5: 4.3, 6: 6.0, 7: 5.3},
                   'Jan': {0: nan, 1: 2.3, 2: 4.2, 3: 2.1, 
                           4: nan, 5: 2.9, 6: 5.2, 7: 4.3},
                   'Feb': {0: nan, 1: 1.9, 2: 3.8, 3: 2.0, 
                           4: nan, 5: 2.5, 6: 4.6, 7: 3.2}})

This produces:

Sl.No. Location Total Depth Jan Feb
0 nan AREA 1 nan nan nan
1 1 Loc A 5.6 2.3 1.9
2 2 Loc B 6.8 4.2 3.8
3 3 Loc C 2.9 2.1 2
4 nan AREA 2 nan nan nan
5 4 Loc D 4.3 2.9 2.5
6 5 Loc E 6 5.2 4.6
7 6 Loc F 5.3 4.3 3.2

In the above table AREA 1 and AREA 2 act as the heading for the rows with Sl.No. 1-3 and 4-6 respectively.

I wish to add another column called 'Area' to the dataframe and then delete the these two rows. That is, I want the following table:

Sl.No. Area Location Total Depth Jan Feb
0 1 AREA 1 Loc A 5.6 2.3 1.9
1 2 AREA 1 Loc B 6.8 4.2 3.8
2 3 AREA 1 Loc C 2.9 2.1 2
3 4 AREA 2 Loc D 4.3 2.9 2.5
4 5 AREA 2 Loc E 6 5.2 4.6
5 6 AREA 2 Loc F 5.3 4.3 3.2

What is the best way to do this?

CodePudding user response:

You can use whether "Sl.No." column has NaN values or not as a mask on the "Location" column using where. Then use ffill to fill the NaN values. Finally, drop the rows with NaN values:

df['Area'] = df['Location'].where(df['Sl.No.'].isna()).ffill()
df = df.dropna()

You can also create Area identifier column using str.contains cumsum. Then groupby it and apply a lambda that creates a new column "Area" using the first item in the Location column for each Location:

out = (df.groupby(df['Location'].str.contains('AREA').cumsum())
       .apply(lambda g: g.assign(Area=g['Location'].iat[0]).dropna())
       .reset_index(drop=True))

Output:

   Sl.No. Location  Total Depth  Jan  Feb    Area
0     1.0    Loc A          5.6  2.3  1.9  AREA 1
1     2.0    Loc B          6.8  4.2  3.8  AREA 1
2     3.0    Loc C          2.9  2.1  2.0  AREA 1
3     4.0    Loc D          4.3  2.9  2.5  AREA 2
4     5.0    Loc E          6.0  5.2  4.6  AREA 2
5     6.0    Loc F          5.3  4.3  3.2  AREA 2
  •  Tags:  
  • Related