I have a pandas DataFrame with 10 columns, some of which are dtypes of object. df looks something like this:
date ID Cat Chan Num
2021-04-02 abc Chrome Direct 8
2021-04-02 xyz Safari Search 5
2021-04-02 pqr Firefox Direct 6
2021-04-02 dcf Chrome NA 1
2021-04-03 abc Safari Search 3
2021-04-03 xyz Safari Search 4
2021-04-03 pqr NA Direct 7
2021-04-03 dcf Chrome Direct 5
what I need is to write a loop that first tracks only object dtypes in this df and and then create a new DataFrame for each column. These new dfs should have columns heads for the values in the original df and rows to return count of each value per date. Something like this:
first new df: df_cat:
date Chrome Safari Firefox NA
2021-04-02 2 1 1 0
2021-04-03 1 2 0 1
Second new df: df_chan:
date Direct Search NA
2021-04-02 2 1 1
2021-04-03 2 2 0
Since column "Num" in original is int dtypes, I don't not want a new df for this column.
I tried something like the below but it gives an error:
AttributeError: 'DataFrame' object has no attribute 'item':
mylist = list(df.select_dtypes(include=['object']).columns)
for item in mylist:
category_item = pd.get_dummies(df.item)
category_item.head()
Could someone please help me? I am not that good with Pandas, and it sounds like a big one for me to fix.
CodePudding user response:
You could set_index with "date"; then use get_dummies groupby sum on each of the object dtype columns to get the desired outcome.
df0 = df0.set_index('date').fillna('NA')
out = {col: pd.get_dummies(df0[col]).groupby(level=0).sum() for col in df0.select_dtypes(include=[object]).columns}
Output:
ID:
abc dcf pqr xyz
date
2021-04-02 1 1 1 1
2021-04-03 1 1 1 1
Cat:
Chrome Firefox NA Safari
date
2021-04-02 2 1 0 1
2021-04-03 1 0 1 2
Chan:
Direct NA Search
date
2021-04-02 2 1 1
2021-04-03 2 0 2
I used a dictionary here to map column names to the new DataFrames. You could store them in a list too.
