I want to combine values in each row to a list and add a column on the dataframe. But some columns have NaN, how can I remove NaN in the list?
df
A B C
2 3 NaN
1 3 1
3 4 6
Expected output
A B C D
2 3 NaN [2, 3]
1 3 1 [1, 3, 1]
3 4 6 [3, 4, 6]
I know we can use dropna() in a dataframe, but in this case of combining row values, how can we use dropna()?
My code is here, I tried dropna() after tolist() and df, they did not output correctly
df['D'] = df.values.tolist()
CodePudding user response:
Stack (this eliminates NaN), groupby index from initial frame and then aggregate to lis
df['D']=df.stack().groupby(level=0).agg(list)
A B C D
0 2 3 NaN [2.0, 3.0]
1 1 3 1.0 [1.0, 3.0, 1.0]
2 3 4 6.0 [3.0, 4.0, 6.0]
CodePudding user response:
You can filter NaN out using pd.notna:
df['D'] = df.apply(lambda row: [x for x in row if pd.notna(x)], axis=1)
Output:
A B C D
0 2 3 NaN [2.0, 3.0]
1 1 3 1.0 [1.0, 3.0, 1.0]
2 3 4 6.0 [3.0, 4.0, 6.0]
CodePudding user response:
I relied on a slightly more verbose method, getting a boolean list of list where the values are not null, and then only adding from the original list where this boolean list of lists is true:
include = df.notnull().values.tolist() #[[True, True, False], [True, True, True], [True, True, True]]
base = df.values.tolist()
col_d =[]
for i in range(len(include)):
l = []
for j in range(len(include[i])):
if include[i][j]:
l.append(base[i][j])
col_d.append(l)
df['D'] = col_d
print(df)
Output:
A B C D
0 2 3 NaN [2.0, 3.0]
1 1 3 1.0 [1.0, 3.0, 1.0]
2 3 4 6.0 [3.0, 4.0, 6.0]
