Column with list values, eliminate spaces as they are counted when performing str.len()-CodePudding

I have a column where values are lists of strings, i.e.

Input:

df = pd.DataFrame({'Food': [[''], ['potato', 'carrot'], ['potato', '']]})

that looks like:

Food                     
['']                       
['potato', 'carrot']      
['potato', '']

When I perform df['Count'] = df['Food'].str.len(), I get:

Food                     Count
['']                       1
['potato', 'carrot']       2  
['potato', '']             2

However, I want to get:

Food                     Count
['']                       0
['potato', 'carrot']       2  
['potato', '']             1

CodePudding user response：

You are trying to get the number of non empty strings:

df['Food'].apply(lambda lst: len([e for e in lst if e != '']))

CodePudding user response：

You can explode the column, sum the lengths greater than 0 ('' has length 0):

df = pd.DataFrame({'Food': [[''], ['potato', 'carrot'], ['potato', '']]})
df['length'] = df['Food'].explode().str.len().gt(0).groupby(level=0).sum()

Another possible solution is to use list comprehension (this is probably more efficient):

df['length'] = [len([x for x in lst if x!='']) for lst in df['Food']]

Output:

               Food  length
0                []      0
1  [potato, carrot]      2
2        [potato, ]      1

CodePudding user response：

Use set difference to exclude whichever values you don't want to count:

df['Count'] = df['Food'].apply(lambda x: len(set(x).difference({''})))

Output:

               Food  Count
0                []      0
1  [potato, carrot]      2
2        [potato, ]      1

CodePudding user response：

We try explode

df.Food.explode().ne('').groupby(level=0).sum()
Out[586]: 
0    0
1    2
2    1
Name: Food, dtype: int64