How to add a new column attribute in a pandas dataframe using another row that appears before as a h-CodePudding

I have a DataFrame like this one:

import numpy as np
import pandas as pd

df = pd.DataFrame({'Sl.No.': {0: nan, 1: 1.0, 2: 2.0, 3: 3.0, 
                              4: nan, 5: 4.0, 6: 5.0, 7: 6.0},
                   'Location': {0: 'AREA 1', 1: 'Loc A', 2: 'Loc B', 3: 'Loc C',  
                                4: 'AREA 2', 5: 'Loc D', 6: 'Loc E', 7: 'Loc F'},
                   'Total Depth': {0: nan, 1: 5.6, 2: 6.8, 3: 2.9, 
                                   4: nan, 5: 4.3, 6: 6.0, 7: 5.3},
                   'Jan': {0: nan, 1: 2.3, 2: 4.2, 3: 2.1, 
                           4: nan, 5: 2.9, 6: 5.2, 7: 4.3},
                   'Feb': {0: nan, 1: 1.9, 2: 3.8, 3: 2.0, 
                           4: nan, 5: 2.5, 6: 4.6, 7: 3.2}})

This produces:

	Sl.No.	Location	Total Depth	Jan	Feb
0	nan	AREA 1	nan	nan	nan
1	1	Loc A	5.6	2.3	1.9
2	2	Loc B	6.8	4.2	3.8
3	3	Loc C	2.9	2.1	2
4	nan	AREA 2	nan	nan	nan
5	4	Loc D	4.3	2.9	2.5
6	5	Loc E	6	5.2	4.6
7	6	Loc F	5.3	4.3	3.2

In the above table AREA 1 and AREA 2 act as the heading for the rows with Sl.No. 1-3 and 4-6 respectively.

I wish to add another column called 'Area' to the dataframe and then delete the these two rows. That is, I want the following table:

	Sl.No.	Area	Location	Total Depth	Jan	Feb
0	1	AREA 1	Loc A	5.6	2.3	1.9
1	2	AREA 1	Loc B	6.8	4.2	3.8
2	3	AREA 1	Loc C	2.9	2.1	2
3	4	AREA 2	Loc D	4.3	2.9	2.5
4	5	AREA 2	Loc E	6	5.2	4.6
5	6	AREA 2	Loc F	5.3	4.3	3.2

What is the best way to do this?

CodePudding user response：

You can use whether "Sl.No." column has NaN values or not as a mask on the "Location" column using where. Then use ffill to fill the NaN values. Finally, drop the rows with NaN values:

df['Area'] = df['Location'].where(df['Sl.No.'].isna()).ffill()
df = df.dropna()

You can also create Area identifier column using str.contains cumsum. Then groupby it and apply a lambda that creates a new column "Area" using the first item in the Location column for each Location:

out = (df.groupby(df['Location'].str.contains('AREA').cumsum())
       .apply(lambda g: g.assign(Area=g['Location'].iat[0]).dropna())
       .reset_index(drop=True))

Output:

   Sl.No. Location  Total Depth  Jan  Feb    Area
0     1.0    Loc A          5.6  2.3  1.9  AREA 1
1     2.0    Loc B          6.8  4.2  3.8  AREA 1
2     3.0    Loc C          2.9  2.1  2.0  AREA 1
3     4.0    Loc D          4.3  2.9  2.5  AREA 2
4     5.0    Loc E          6.0  5.2  4.6  AREA 2
5     6.0    Loc F          5.3  4.3  3.2  AREA 2