Pandas groupby subtract a reference row from the rest of rows in group same length-CodePudding

Trying to subtract the reference group cq from all the other things in the biorep,assay group using pandas or i guess general python. Compact or eloquent if possible. This is dead simple in R but can't figure it out here

biorep  well          sample    assay   cq
  1      C1            spA      MSGN1   10
  1      C2            spA      MSGN1   15
  1      C3            spA      MSGN1   20
  1      C4            ref      MSGN1   5
  2      C1            spB      MSGN1   12 
  2      C2            spB      MSGN1   15
  2      C3            spB      MSGN1   20
  2      C4            ref      MSGN1   4

I would like to group by biorep and assay. And for each of those groups subtract rep from the other things in the group. So i should end up with something like this. l.

biorep  well          sample    assay   cq  delta
  1      C1            spA      MSGN1   10       5
  1      C2            spA      MSGN1   15       10
  1      C3            spA      MSGN1   20       15
  1      C4            ref      MSGN1   5.        0
  2      C1            spB      MSGN1   12        8
  2      C2            spB      MSGN1   15       11
  2      C3            spB      MSGN1   20       16
  2      C4            ref      MSGN1   4         0

I know this should be dead easy but I am having trouble even with the suggestions from stackoverflow. I have tried using this suggestion which is the top one i find Subtracting group specific value from rows in pandas but to no avail. I get these NaN's. All the other suggestions dont seem relevant or have had trouble googling for this issue. Anyone have an idea here?

def func(grp):
    ref =   grp.loc[grp['sample'] == 'No Spacer Control','cq']
    grp['delta'] =   grp[grp['sample'] == 'No Spacer Control']['cq'] - ref
    return grp

summry.groupby(['biorep','assay']).apply(func)

   biorep     sample       assay        cq         delta
         1    No Spacer Control HPRT    25.237209       0.0
         1      spMSGN1_1       HPRT    25.632444       NaN
         1      spMSGN1_10      HPRT    25.298286       NaN
         1      spMSGN1_11      HPRT    25.130593       NaN
         1      spMSGN1_2       PRT     25.371607       NaN

CodePudding user response：

You can create a dataframe from ref values and merge it with your original dataframe:

df_ref = df.loc[df['sample'] == 'ref', ['biorep', 'assay', 'cq']]

df = df.merge(df_ref, how='left', on=['biorep', 'assay'], suffixes=('', '_ref'))
df['delta'] = df['cq'] - df['cq_ref']

CodePudding user response：

Given the order of the records, you can subtract the last cq from the group, as long as you include plate in your grouping.

df['delta'] = df.groupby(['biorep','assay', 'plate'])['cq'].apply(lambda x: x.sub(x.iloc[-1]))

Output

   plate well  biorep sample  assay  cq  delta
0      1   C1       1    spA  MSGN1  10      5
1      1   C2       1    spA  MSGN1  15     10
2      1   C3       1    spA  MSGN1  20     15
3      1   C4       1    ref  MSGN1   5      0
4      2   C1       1    spB  MSGN1  12      8
5      2   C2       1    spB  MSGN1  15     11
6      2   C3       1    spB  MSGN1  20     16
7      2   C4       1    ref  MSGN1   4      0

CodePudding user response：

Update after your edit:

Try:

compute_delta = lambda x: x.loc[x['sample'].ne('ref'), 'cq'] \
                           .sub(x.loc[x['sample'].eq('ref'), 'cq'].squeeze())

df['delta'] = df.groupby(['biorep', 'assay'], as_index=False) \
                .apply(compute_delta).droplevel(0).reindex(df.index, fill_value=0)
print(df)

# Output
   biorep well sample  assay  cq  delta
0       1   C1    spA  MSGN1  10      5
1       1   C2    spA  MSGN1  15     10
2       1   C3    spA  MSGN1  20     15
3       1   C4    ref  MSGN1   5      0
4       2   C1    spB  MSGN1  12      8
5       2   C2    spB  MSGN1  15     11
6       2   C3    spB  MSGN1  20     16
7       2   C4    ref  MSGN1   4      0