Let's stay I have a grocery list with one column titled "Groceries". In each row there is a list of strings, for example.
| Groceries |
|---|
| apples, bananas, oranges |
| apples, bananas, bananas, pears |
| oranges, pears, bananas |
Is there a way to count each string and add a "tally" is a new dataframe or similar thing with the appropriately labeled item? The dataframe would then look like:
| apples | oranges | bananas | pears |
|---|---|---|---|
| 1 | 1 | 1 | 0 |
| 1 | 0 | 2 | 1 |
| 1 | 1 | 0 | 1 |
I can't find a function that will recognize strings and count them in the appropriate row/column with the string name. I am also pretty new to Python and am not sure what would go into creating a function that would do this.
CodePudding user response:
You can split the string on commas, explode to multiple rows, get_dummies to transform to 0/1, and groupby.sum to aggregate:
out = (pd
.get_dummies(df['Groceries'].str.split(',\s*').explode())
.groupby(level=0).sum()
)
Or similar with crosstab:
s = df['Groceries'].str.split(',\s*').explode()
out = pd.crosstab(s.index, s)
output:
apples bananas oranges pears
0 1 1 1 0
1 1 2 0 1
2 0 1 1 1
