I have a column where values are lists of strings, i.e.
Input:
df = pd.DataFrame({'Food': [[''], ['potato', 'carrot'], ['potato', '']]})
that looks like:
Food
['']
['potato', 'carrot']
['potato', '']
When I perform df['Count'] = df['Food'].str.len(), I get:
Food Count
[''] 1
['potato', 'carrot'] 2
['potato', ''] 2
However, I want to get:
Food Count
[''] 0
['potato', 'carrot'] 2
['potato', ''] 1
CodePudding user response:
You are trying to get the number of non empty strings:
df['Food'].apply(lambda lst: len([e for e in lst if e != '']))
CodePudding user response:
You can explode the column, sum the lengths greater than 0 ('' has length 0):
df = pd.DataFrame({'Food': [[''], ['potato', 'carrot'], ['potato', '']]})
df['length'] = df['Food'].explode().str.len().gt(0).groupby(level=0).sum()
Another possible solution is to use list comprehension (this is probably more efficient):
df['length'] = [len([x for x in lst if x!='']) for lst in df['Food']]
Output:
Food length
0 [] 0
1 [potato, carrot] 2
2 [potato, ] 1
CodePudding user response:
Use set difference to exclude whichever values you don't want to count:
df['Count'] = df['Food'].apply(lambda x: len(set(x).difference({''})))
Output:
Food Count
0 [] 0
1 [potato, carrot] 2
2 [potato, ] 1
CodePudding user response:
We try explode
df.Food.explode().ne('').groupby(level=0).sum()
Out[586]:
0 0
1 2
2 1
Name: Food, dtype: int64
