Home > Enterprise >  Concatenating strings across two rows in pandas dataframes
Concatenating strings across two rows in pandas dataframes

Time:01-06

I have a table like so after performing some data scraping on a pdf:

index colA           colB                     colC colD colE colF colG
-------------------------------------------------------------------
 1    ABCD  veryLongTextThatShouldNotCutOff   12   x    x    x    x
 2    ABCD  veryLongText                      14   x    x    x    x
 4    NaN   ThatShouldNotCutOff               NaN  NaN  NaN  NaN  NaN
 5    ABCD  veryLongTextThatShouldNotCutOff   12   x    x    x    x
 6    ABCD  veryLongTextThatShouldNotCutOff   x    19   x    x    x
 7    ABCD  veryLongText                      21   x    x    x    x
 9    NaN   ThatShouldNotCutOff               NaN  NaN  NaN  NaN  NaN
 10   ABCD  veryLongTextThatShouldNotCutOff   13   x    x    x    x
...

I need to clean the data up so that information in rows with index 2 and 4 are merged together like so, and with the indexes in the right order again.

index colA           colB                     colC colD colE colF colG
-------------------------------------------------------------------
 1    ABCD  veryLongTextThatShouldNotCutOff   12   x    x    x    x
 2    ABCD  veryLongTextThatShouldNotCutOff   14   x    x    x    x
 3    ABCD  veryLongTextThatShouldNotCutOff   12   x    x    x    x
 4    ABCD  veryLongTextThatShouldNotCutOff   x    19   x    x    x
 5    ABCD  veryLongTextThatShouldNotCutOff   21   x    x    x    x
 6    ABCD  veryLongTextThatShouldNotCutOff   13   x    x    x    x
...

Essentially, some rows with colA as NaN needs to be "concatenated" with the previous rows.

How do I do it using pandas?

CodePudding user response:

You could use the non NaN values in colA to set up a group and merge the colB. Then drop the NaN rows:

group = df['colA'].notna().cumsum()
df['colB'] = df.groupby(group)['colB'].transform('sum')

df = df.dropna(subset=['colA'])

output:

   index  colA                             colB  colC colD colE colF colG
0      1  ABCD  veryLongTextThatShouldNotCutOff  12.0    x    x    x    x
1      2  ABCD  veryLongTextThatShouldNotCutOff  14.0    x    x    x    x
3      5  ABCD  veryLongTextThatShouldNotCutOff  12.0    x    x    x    x

  •  Tags:  
  • Related