I have a table like this:
index | country
---------------
1 | [nan]
2 | [nan, DE]
3 | [nan, [IT, DE]]
4 | [[FR]]
5 | [[AE], nan, [AE, MT], [MX]]
And i need to turn this column into a flat list of unique values without nans
index | country
---------------
1 | []
2 | [DE]
3 | [IT, DE]
4 | [FR]
5 | [AE, MT, MX]
As a first step i tried to flatten the list with this function
df.applymap(lambda x: [z for y in x for z in y])
But I get the following error:
TypeError: 'float' object is not iterable
I tried several other functions that I found in this SO question here but all end up giving me the same error.
CodePudding user response:
This should work for any nested lists
from collections.abc import Iterable
def flatten(l):
for el in l:
if isinstance(el, Iterable) and not isinstance(el, (str, bytes)):
yield from flatten(el)
else:
yield el
So recreating your df
import pandas as pd
df = pd.DataFrame([[[[float('nan')],[float('nan'), 'DE']]],
[[[float('nan'), ['IT', 'DE']]]],
[[[['FR']]]],
[[[['AE'], float('nan'), ['AE', 'MT'], ['MX']]]]],columns=['country'])
df['country'] = df['country'].apply(lambda x:list(set(flatten(x)))).apply(lambda x: [i for i in x if str(i) != 'nan'])
gives the following output
country
0 [DE]
1 [IT, DE]
2 [FR]
3 [AE, MT, MX]
CodePudding user response:
Use:
In [540]: import itertools
In [553]: df['country'] = df['country'].apply(lambda x: [i for i in x if i == i]).apply(lambda x: list(itertools.chain(*x)))
In [554]: df
Out[554]:
index country
0 1 []
1 2 [D, E]
2 3 [IT, DE]
3 4 [FR]
4 5 [AE, AE, MT, MX]
