I am trying to analyze the results from a survey I administered. Each question had checkboxes of possible responses, and users could select as many or as few as they liked.
Data comes in looking like this
df = pd.DataFrame({
'Result': ['Blue, Yellow, Red',
'Red',
'Green, Blue']
})
| Result |
|---|
| Blue, Yellow, Red |
| Red |
| Green, Blue |
Ultimately, I want to know how many times each choice (color) was selected, but using value_counts() at this stage tells me the number of times each exact choice was made (returns 1,1,1, in this example).
I think I need to make it look like this:
| Result |
|---|
| Blue |
| Yellow |
| Red |
| Red |
| Green |
| Blue |
So I need to split each index on a comma, using str.split(',') but I can't guarantee how many choices will have been selected.
Everything I've found so far deals with splitting one column into multiple columns, but I want to split one column and stack the results into one column.
CodePudding user response:
You can try something like this:
df.Result.str.split(', ', expand = True).stack().reset_index(drop = True).value_counts()
Blue 2
Red 2
Yellow 1
Green 1
dtype: int64
CodePudding user response:
You could also use str.split explode value_counts:
out = df['Result'].str.split(', ').explode().value_counts()
Output:
Blue 2
Red 2
Yellow 1
Green 1
Name: Result, dtype: int64
