Using groupby and append values at columns-CodePudding

Consider the following csv file where there is a duplicate name in "Name" column:

ID,Name,T,CA,I,C,IP
129,K1,1.2,64,386,5522,0.07
6,K1,1.1,3072,28800,6485,4.44
157,K2,1.1,512,1204,3257,0.37

I want to group the rows by name and record I and C columns like this

K1:
     0   I   386  28800
     1   C   5522 6485
K2:
     0   I   1204
     1   C   3257

I have written this code which groups the rows by name column and build a dictionary.

data = {'Value':[0,1]}
kernel_df = pd.DataFrame(data, index=['C','I'])
my_dict = {'dummy':kernel_df}
df = pd.read_csv('test.csv', usecols=['Name', 'I', 'C'])
for name, df_group in df.groupby('Name'):
    my_dict[name] = pd.DataFrame(df_group)
print(my_dict)

But the output is

{'dummy':    Value
C      0
I      1, 'K1':   Name      I     C
0   K1    386  5522
1   K1  28800  6485, 'K2':   Name     I     C
2   K2  1204  3257}

As you can see the I and C are written in columns, so the rows for each key are increased. That is the opposite of what I want. How can I fix that?

CodePudding user response：

I think you need select columns with transpose. I dont use dict comprehension, because in your code are added new DataFrame to existing dict:

data = {'Value':[0,1]}
kernel_df = pd.DataFrame(data, index=['C','I'])
my_dict = {'dummy':kernel_df}

for name, df_group in df.groupby('Name'):
    my_dict[name] = df_group[[ 'I', 'C']].T
print(my_dict['K1'])
      0      1
I   386  28800
C  5522   6485

If new column is necessary:

data = {'Value':[0,1]}
kernel_df = pd.DataFrame(data, index=['C','I'])
my_dict = {'dummy':kernel_df}

for name, df_group in df.groupby('Name'):
    my_dict[name] = df_group[[ 'I', 'C']].T.rename_axis('g').reset_index()
print(my_dict['K1'])
   g     0      1
0  I   386  28800
1  C  5522   6485

CodePudding user response：

This is essentially the same as @jezrael's answer. Only difference is it's a dict comprehension (and removes columns as in the desired outcome, not sure if it was intentional or not)

from io import StringIO
df = pd.read_csv(StringIO("""ID,Name,T,CA,I,C,IP
129,K1,1.2,64,386,5522,0.07
6,K1,1.1,3072,28800,6485,4.44
157,K2,1.1,512,1204,3257,0.37"""))

out = {**my_dict, **{k: d[['I','C']].T.reset_index().rename(columns={col:'' for col in ('index',0,1)}) for k, d in df.groupby('Name')}}

Output:

{'dummy':    Value
 C      0
 I      1,
 'K1':                  
 0  I   386  28800
 1  C  5522   6485,
 'K2':          2
 0  I  1204
 1  C  3257}