Home > Mobile >  List in pandas dataframe columns
List in pandas dataframe columns

Time:01-30

I have the following pandas dataframe

    | A |    B   |
    | :-|:------:|
    | 1 | [2,3,4]|
    | 2 | np.nan |
    | 3 | np.nan | 
    | 4 |   10   |

I would like to unlist the first row and place those values sequentially in the subsequent rows. The outcome will look like this:

    | A |    B   |
    | :-|:------:|
    | 1 |    2   |
    | 2 |    3   |
    | 3 |    4   | 
    | 4 |   10   |

How can I achieve this in a very large dataset with this phenomena occurring in many rows?

CodePudding user response:

If the number of NaN values serve as a "slack" space, so that list elements can slot in, i.e. if the lengths match, then you can explode columns "B", then drop NaN values with dropna, reset index and assign back to "B":

df['B'] = df['B'].explode().dropna().reset_index(drop=True)

Output:

   A   B
0  1   2
1  2   3
2  3   4
3  4  10

CodePudding user response:

As the number of consecutive NaNs does not match the length of the list, you can make groups starting with non NaN elements and explode while keeping the length of the group constant.

I used a slightly different example for clarity (I also assigned to a different column):

df['C'] = (df['B']
   .groupby(df['B'].notna().cumsum())
   .apply(lambda s: s.explode().iloc[:len(s)])
   .values
 )

Output:

   A          B    C
0  1  [2, 3, 4]    2
1  2        NaN    3
2  3        NaN    4
3  4        NaN  NaN
4  5         10   10

Used input:

df = pd.DataFrame({'A': range(1,6),
                   'B': [[2,3,4], np.nan, np.nan, np.nan, 10]
                  })

  •  Tags:  
  • Related