I would like to ask how I can unnest a list of list and turn it into different columns of a dataframe. Specifically, I have the following dataframe where the Route_set column is a list of lists:
Generation Route_set
0 0 [[20. 19. 47. 56.] [21. 34. 78. 34.]]
The desired output is the following dataframe:
route1 route2
0 20 21
1 19 34
2 47 78
3 56 34
Any ideas how I can do it? Thank you in advance!
CodePudding user response:
You can create a dictionnary and update it using a for loop, not the fastest way but pretty easy.
new_dic = {}
# Create and fill dictionnary, each key_value pair corresponds to a list
for i, values in enumerate(df.Route_set):
new_dic[f'route{i}'] = values
# Drop the double list column
df.drop('Route_set', axis=1, inplace=True)
# Updated dataframe with dic key_value pairs
for key in new_dic.keys():
df[key] = new_dic[key]
You can probably do better, but this should be ok for a quick fix to your issue !
CodePudding user response:
I made a solution that created a NumPy [array()][1], transposes it and converts it back to a list of lists using tolist():
import numpy as np
import pandas as pd
routes = {
"Generation": 0,
"Route_set": [[[20, 19, 47, 56], [21, 34, 78, 34]]]
}
array = np.array(routes["Route_set"][0]).T.tolist()
df = pd.DataFrame(data=array, columns=["route1", "route2"])
print(df)
Outputs:
route1 route2
0 20 21
1 19 34
2 47 78
3 56 34
Note: I had to make an assumption about your data because you only provided the print output of your dataframe.
If Route_set was "Route_set": [[20, 19, 47, 56], [21, 34, 78, 34]] and not a list of a list of lists, the dataframe would look like:
Generation Route_set
0 0 [20, 19, 47, 56]
1 0 [21, 34, 78, 34]
Instead of what you provided in the question.
CodePudding user response:
You can try using df.explode and df.apply:
import pandas as pd
df = pd.DataFrame(data= {'Generation': 0, 'Route_set':[[[20., 19., 47., 56.], [21., 34., 78., 34.]]]})
df['route1']=df['Route_set'].apply(lambda x: x[0])
df['route2']=df['Route_set'].apply(lambda x: x[1])
df = df.explode(['route1', 'route2'], ignore_index=True)
df2 = df[df.columns.difference(['Route_set', 'Generation'])]
| | route1 | route2 |
|---:|---------:|---------:|
| 0 | 20 | 21 |
| 1 | 19 | 34 |
| 2 | 47 | 78 |
| 3 | 56 | 34 |
Or you can just create a new dataframe with the values like this:
import pandas as pd
df = pd.DataFrame(data= {'Generation': 0, 'Route_set':[[[20., 19., 47., 56.], [21., 34., 78., 34.]]]})
df1 = pd.DataFrame.from_dict(dict(zip(['route1', 'route2'], df.Route_set.to_numpy()[0])), orient='index').transpose()
| | route1 | route2 |
|---:|---------:|---------:|
| 0 | 20 | 21 |
| 1 | 19 | 34 |
| 2 | 47 | 78 |
| 3 | 56 | 34 |
