I have something like this:
df =
col1 col2 col3
0 B C A
1 E D G
2 NaN F B
EDIT : I need to convert it into this:
result =
Name location
0 B col1,col2
1 C col1
2 A col1
3 E col2
4 D col2
5 G col2
6 F col3
Essentially getting a "location" telling me which column an "Name" is in. Thank you in advance.
CodePudding user response:
Try melt and dropna:
>>> df.melt(var_name='location').dropna()
location value
0 col1 B
1 col1 E
3 col2 C
4 col2 D
5 col2 F
6 col3 A
7 col3 G
>>>
CodePudding user response:
Try using melt to convert columns to rows. And give the rows a column name.
Then dropna to remove the NaN values in rows.
df = df.melt(var_name="location", value_name="Name").dropna()
CodePudding user response:
You can use pandas.melt and pandas.groupby.agg:
df = df.melt(var_name="location", value_name="Name").dropna()
new_df = df.groupby("Name", as_index=False).agg(",".join)
print(new_df)
Output:
Name location
0 A col3
1 B col1,col3
2 C col2
3 D col2
4 E col1
5 F col2
6 G col3
CodePudding user response:
Or an alternative with stack():
new = df.stack().reset_index().drop('level_0',axis=1).dropna()
new.columns = ['name','location']
prints:
name location
0 col1 B
1 col2 C
2 col3 A
3 col1 E
4 col2 D
5 col3 G
6 col2 F
EDIT:
To get your updated output you could use a groupby along with join():
new.groupby('location').agg({'name':lambda x: ', '.join(list(x))}).reset_index()
Which gives you:
location name
0 A col3
1 B col1, col3
2 C col2
3 D col2
4 E col1
5 F col2
6 G col3
