Home > Net >  flatten nested list in pandas containing nan
flatten nested list in pandas containing nan

Time:02-03

I have a table like this:

index | country
---------------
1     | [nan]
2     | [nan, DE]
3     | [nan, [IT, DE]]
4     | [[FR]]
5     | [[AE], nan, [AE,  MT], [MX]]

And i need to turn this column into a flat list of unique values without nans

index | country
---------------
1     | []
2     | [DE]
3     | [IT, DE]
4     | [FR]
5     | [AE, MT, MX]

As a first step i tried to flatten the list with this function

df.applymap(lambda x: [z for y in x for z in y])

But I get the following error:

TypeError: 'float' object is not iterable

I tried several other functions that I found in this SO question here but all end up giving me the same error.

CodePudding user response:

This should work for any nested lists

from collections.abc import Iterable
def flatten(l):
    for el in l:
        if isinstance(el, Iterable) and not isinstance(el, (str, bytes)):
            yield from flatten(el)
        else:
            yield el

So recreating your df

import pandas as pd
df = pd.DataFrame([[[[float('nan')],[float('nan'), 'DE']]],
                   [[[float('nan'), ['IT', 'DE']]]],
                   [[[['FR']]]],
                   [[[['AE'], float('nan'), ['AE',  'MT'], ['MX']]]]],columns=['country'])

df['country'] = df['country'].apply(lambda x:list(set(flatten(x)))).apply(lambda x: [i for i in x if str(i) != 'nan'])

gives the following output

    country
0   [DE]
1   [IT, DE]
2   [FR]
3   [AE, MT, MX]

CodePudding user response:

Use:

In [540]: import itertools

In [553]: df['country'] = df['country'].apply(lambda x: [i for i in x if i == i]).apply(lambda x: list(itertools.chain(*x)))

In [554]: df
Out[554]: 
   index           country
0      1                []
1      2            [D, E]
2      3          [IT, DE]
3      4              [FR]
4      5  [AE, AE, MT, MX]
  •  Tags:  
  • Related