Home > OS >  dataframe combine row values to a list without NaN
dataframe combine row values to a list without NaN

Time:01-26

I want to combine values in each row to a list and add a column on the dataframe. But some columns have NaN, how can I remove NaN in the list?

df

A  B  C
2  3  NaN
1  3  1
3  4  6

Expected output

A  B  C    D
2  3  NaN  [2, 3]
1  3  1    [1, 3, 1]
3  4  6    [3, 4, 6]

I know we can use dropna() in a dataframe, but in this case of combining row values, how can we use dropna()?

My code is here, I tried dropna() after tolist() and df, they did not output correctly

df['D'] = df.values.tolist()

CodePudding user response:

Stack (this eliminates NaN), groupby index from initial frame and then aggregate to lis

df['D']=df.stack().groupby(level=0).agg(list)



   A  B    C                D
0  2  3  NaN       [2.0, 3.0]
1  1  3  1.0  [1.0, 3.0, 1.0]
2  3  4  6.0  [3.0, 4.0, 6.0]

CodePudding user response:

You can filter NaN out using pd.notna:

df['D'] = df.apply(lambda row: [x for x in row if pd.notna(x)], axis=1)

Output:

   A  B    C                D
0  2  3  NaN       [2.0, 3.0]
1  1  3  1.0  [1.0, 3.0, 1.0]
2  3  4  6.0  [3.0, 4.0, 6.0]

CodePudding user response:

I relied on a slightly more verbose method, getting a boolean list of list where the values are not null, and then only adding from the original list where this boolean list of lists is true:

include = df.notnull().values.tolist() #[[True, True, False], [True, True, True], [True, True, True]]
base = df.values.tolist()

col_d  =[]
for i in range(len(include)):
    l = []
    for j in range(len(include[i])):
        if include[i][j]:
            l.append(base[i][j])
    col_d.append(l)

df['D'] = col_d


print(df)

Output:

   A  B    C                D
0  2  3  NaN       [2.0, 3.0]
1  1  3  1.0  [1.0, 3.0, 1.0]
2  3  4  6.0  [3.0, 4.0, 6.0]
  •  Tags:  
  • Related