Home > Software design >  How to turn a list of lists into columns of a pandas dataframe?
How to turn a list of lists into columns of a pandas dataframe?

Time:02-02

I would like to ask how I can unnest a list of list and turn it into different columns of a dataframe. Specifically, I have the following dataframe where the Route_set column is a list of lists:

   Generation                              Route_set
0           0  [[20. 19. 47. 56.] [21. 34. 78. 34.]]

The desired output is the following dataframe:

   route1  route2
0      20      21
1      19      34
2      47      78
3      56      34

Any ideas how I can do it? Thank you in advance!

CodePudding user response:

You can create a dictionnary and update it using a for loop, not the fastest way but pretty easy.

new_dic = {}
# Create and fill dictionnary, each key_value pair corresponds to a list
for i, values in enumerate(df.Route_set):
    new_dic[f'route{i}'] = values
# Drop the double list column
df.drop('Route_set', axis=1, inplace=True)
# Updated dataframe with dic key_value pairs
for key in new_dic.keys():
    df[key] = new_dic[key]

You can probably do better, but this should be ok for a quick fix to your issue !

CodePudding user response:

I made a solution that created a NumPy [array()][1], transposes it and converts it back to a list of lists using tolist():

import numpy as np
import pandas as pd

routes = {
    "Generation": 0,
    "Route_set": [[[20, 19, 47, 56], [21, 34, 78, 34]]]
}

array = np.array(routes["Route_set"][0]).T.tolist()

df = pd.DataFrame(data=array, columns=["route1", "route2"])
print(df)

Outputs:

   route1  route2
0      20      21
1      19      34
2      47      78
3      56      34

Note: I had to make an assumption about your data because you only provided the print output of your dataframe. If Route_set was "Route_set": [[20, 19, 47, 56], [21, 34, 78, 34]] and not a list of a list of lists, the dataframe would look like:

   Generation         Route_set
0           0  [20, 19, 47, 56]
1           0  [21, 34, 78, 34]

Instead of what you provided in the question.

CodePudding user response:

You can try using df.explode and df.apply:

import pandas as pd

df = pd.DataFrame(data= {'Generation': 0, 'Route_set':[[[20., 19., 47., 56.], [21., 34., 78., 34.]]]})
df['route1']=df['Route_set'].apply(lambda x: x[0])
df['route2']=df['Route_set'].apply(lambda x: x[1])
df = df.explode(['route1', 'route2'], ignore_index=True)
df2 = df[df.columns.difference(['Route_set', 'Generation'])]
|    |   route1 |   route2 |
|---:|---------:|---------:|
|  0 |       20 |       21 |
|  1 |       19 |       34 |
|  2 |       47 |       78 |
|  3 |       56 |       34 |

Or you can just create a new dataframe with the values like this:

import pandas as pd

df = pd.DataFrame(data= {'Generation': 0, 'Route_set':[[[20., 19., 47., 56.], [21., 34., 78., 34.]]]})
df1 = pd.DataFrame.from_dict(dict(zip(['route1', 'route2'], df.Route_set.to_numpy()[0])), orient='index').transpose()
|    |   route1 |   route2 |
|---:|---------:|---------:|
|  0 |       20 |       21 |
|  1 |       19 |       34 |
|  2 |       47 |       78 |
|  3 |       56 |       34 |
  •  Tags:  
  • Related