I have a dataframe df with a mix of data types (str, list values, floats):
Property Name 2234.1.1 2234.1.2.1 2234.1.3.1
0 Flash Point (°C) [-5] NaN NaN
1 Flash Point (°C) [Max] NaN NaN
2 Flash Point (°C) [nan] NaN NaN
3 Flash Point (°C) NaN NaN NaN
4 Flash Point (°C) NaN NaN NaN
5 Flash Point (°C) NaN NaN NaN
6 Flash Point (°C) NaN NaN [29]
7 Flash Point (°C) NaN NaN [nan]
8 Flash Point (°C) NaN NaN [nan]
9 Flash Point (°C) NaN [159] NaN
10 Flash Point (°C) NaN [nan] NaN
I want to convert all the list values to regular values.
So desired df:
Property Name 2234.1.1 2234.1.2.1 2234.1.3.1
0 Flash Point (°C) -5 NaN NaN
1 Flash Point (°C) Max NaN NaN
2 Flash Point (°C) NaN NaN NaN
3 Flash Point (°C) NaN NaN NaN
4 Flash Point (°C) NaN NaN NaN
5 Flash Point (°C) NaN NaN NaN
6 Flash Point (°C) NaN NaN 29
7 Flash Point (°C) NaN NaN NaN
8 Flash Point (°C) NaN NaN NaN
9 Flash Point (°C) NaN 159 NaN
10 Flash Point (°C) NaN NaN NaN
I've tried:
for i in df.columns[1:]:
df[i]=[''.join(map(str,l)) for l in df[i]]
But I get the following error:
TypeError: 'float' object is not iterable
I assume I shouldn't try looping over the df like that anyway.
Edit: adding code to reproduce the df:
{'Property Name': {0: 'Flash Point (°C)',
1: 'Flash Point (°C)',
2: 'Flash Point (°C)',
3: 'Flash Point (°C)',
4: 'Flash Point (°C)',
5: 'Flash Point (°C)',
6: 'Flash Point (°C)',
7: 'Flash Point (°C)',
8: 'Flash Point (°C)',
9: 'Flash Point (°C)',
10: 'Flash Point (°C)'},
'2234.1.1': {0: ['-5'],
1: ['Max'],
2: [nan],
3: nan,
4: nan,
5: nan,
6: nan,
7: nan,
8: nan,
9: nan,
10: nan},
'2234.1.2.1': {0: nan,
1: nan,
2: nan,
3: nan,
4: nan,
5: nan,
6: nan,
7: nan,
8: nan,
9: ['159'],
10: [nan]},
'2234.1.3.1': {0: nan,
1: nan,
2: nan,
3: nan,
4: nan,
5: nan,
6: ['29'],
7: [nan],
8: [nan],
9: nan,
10: nan}}
CodePudding user response:
The method that deals with it is explode. But just calling explode won't works since columns don't have matching element counts. One way to circumvent that problem is to stack the DataFrame, explode and unstack back:
out = df.stack().explode().unstack()
If the lists are singleton lists (as in your example), then another option is to call explode in loop:
for c in df.columns:
df = df.explode(c)
Another option could be to use str accessor and assign in a dict comprehension:
out = df.assign(**{f'{c}':df[c].str[0] for c in df.columns.drop('Property Name')})
Output:
Property Name 2234.1.1 2234.1.2.1 2234.1.3.1
0 Flash Point (°C) -5 NaN NaN
1 Flash Point (°C) Max NaN NaN
2 Flash Point (°C) NaN NaN NaN
3 Flash Point (°C) NaN NaN NaN
4 Flash Point (°C) NaN NaN NaN
5 Flash Point (°C) NaN NaN NaN
6 Flash Point (°C) NaN NaN 29
7 Flash Point (°C) NaN NaN NaN
8 Flash Point (°C) NaN NaN NaN
9 Flash Point (°C) NaN 159 NaN
10 Flash Point (°C) NaN NaN NaN
