flatten nested list in pandas containing nan-CodePudding

I have a table like this:

index | country
---------------
1     | [nan]
2     | [nan, DE]
3     | [nan, [IT, DE]]
4     | [[FR]]
5     | [[AE], nan, [AE,  MT], [MX]]

And i need to turn this column into a flat list of unique values without nans

index | country
---------------
1     | []
2     | [DE]
3     | [IT, DE]
4     | [FR]
5     | [AE, MT, MX]

As a first step i tried to flatten the list with this function

df.applymap(lambda x: [z for y in x for z in y])

But I get the following error:

TypeError: 'float' object is not iterable

I tried several other functions that I found in this SO question here but all end up giving me the same error.

CodePudding user response：

This should work for any nested lists

from collections.abc import Iterable
def flatten(l):
    for el in l:
        if isinstance(el, Iterable) and not isinstance(el, (str, bytes)):
            yield from flatten(el)
        else:
            yield el

So recreating your df

import pandas as pd
df = pd.DataFrame([[[[float('nan')],[float('nan'), 'DE']]],
                   [[[float('nan'), ['IT', 'DE']]]],
                   [[[['FR']]]],
                   [[[['AE'], float('nan'), ['AE',  'MT'], ['MX']]]]],columns=['country'])

df['country'] = df['country'].apply(lambda x:list(set(flatten(x)))).apply(lambda x: [i for i in x if str(i) != 'nan'])

gives the following output

    country
0   [DE]
1   [IT, DE]
2   [FR]
3   [AE, MT, MX]

CodePudding user response：

Use:

In [540]: import itertools

In [553]: df['country'] = df['country'].apply(lambda x: [i for i in x if i == i]).apply(lambda x: list(itertools.chain(*x)))

In [554]: df
Out[554]: 
   index           country
0      1                []
1      2            [D, E]
2      3          [IT, DE]
3      4              [FR]
4      5  [AE, AE, MT, MX]