I have a Pandas dataframe built like:
| Col1 | Col2 |
|---|---|
| 1 | A |
| 1 | B |
| 1 | B |
| 2 | A |
| 2 | A |
| 3 | A |
| 3 | Nan |
For every value of Col1, I want to count every value of Col2 ignoring the Nan values and put the sum in the associated column, obtaining something like:
| Col1 | A | B |
|---|---|---|
| 1 | 1 | 2 |
| 2 | 2 | 0 |
| 3 | 1 | 0 |
How can I do that in Pandas? I have a lot of values in Col1 and lots of columns like Col2. Thank you very much!
CodePudding user response:
You can try crosstab
out = pd.crosstab(df.Col1, df.Col2).reset_index()
Out[66]:
Col2 Col1 A B
0 1 1 2
1 2 2 0
2 3 1 0
CodePudding user response:
simply do this !!works!!
df[3]=[0]*df.shape[0]
df.groupby(list(df.columns[:-1])).count().unstack()
output:
Col2 A B
Col1
1 1.0 2.0
2 2.0 NaN
3 1.0 NaN
CodePudding user response:
You can just add the columns and set the values to a new column.
sum_column = df["col1"] df["col2"]
df["col3"] = sum_column
CodePudding user response:
Use pandas.pivot_tabel(). Link is available [https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html][1]
