I have a csv file like the list below (which has thousands of rows);
name,location,time
james,ond,5
conor,aas,2
james,jja,3
elisa,aab,1
mike,sjs,1
elisa,ond,5
elisa,mmm,2
How can I turn this into (basicly make the values in second column (location) as new columns, search in the names for each and make the names don't repeat and put 0 as values on new columns if it is not there. The desired outcome with the code is:
name,ond,aas,jja,aab,sjs,mmm
james,5,0,3,0,0,0
conor,0,2,0,0,0,0
elisa,5,0,0,1,0,2
mike,0,0,0,0,1,0
What I tried so far to do this is to load it twice as two different databases and get a value but it fails all the time giving exactly the same result before.
import pandas as pd
df1 = pd.read_csv("file.csv")
df2 = pd.read_csv("file.csv")
df1['time'] = df2['time'].where(df1[['name','location']].isin(df2).all(axis=1)).fillna('0')
or the below code to extract the values but still does not work:
df1.merge(df2, on=['name','location'], how='left').fillna(0)
CodePudding user response:
You can use pd.crosstab:
pd.crosstab(index=df['name'], columns=df['location'], values=df['time'], aggfunc=lambda x: x).\
fillna(0).reset_index()
Output:
location name aab aas jja mmm ond sjs
0 conor 0.0 2.0 0.0 0.0 0.0 0.0
1 elisa 1.0 0.0 0.0 2.0 5.0 0.0
2 james 0.0 0.0 3.0 0.0 5.0 0.0
3 mike 0.0 0.0 0.0 0.0 0.0 1.0
Here location is the column name, you can get rid of it via df.columns.name = None
CodePudding user response:
Maybe this one is the solution you are looking for:
Given the dataframe df
name location time
0 james ond 5
1 conor aas 2
2 james jja 3
3 elisa aab 1
4 mike sjs 1
5 elisa ond 5
6 elisa mmm 2
import pandas as pd
pivot = pd.pivot_table(df, values='time', index='name', columns='location', aggfunc='sum', fill_value=0)
Output:
location aab aas jja mmm ond sjs
name
conor 0 2 0 0 0 0
elisa 1 0 0 2 5 0
james 0 0 3 0 5 0
mike 0 0 0 0 0 1
Note that if you have different values for the same combination of name and location, you have to select an appropriate aggfunc, to properly display the value you need. Ypu can also pass a dictionary to the aggfunc value, as explained in Discussion on aggfunc
