Trying to subtract the reference group cq from all the other things in the biorep,assay group using pandas or i guess general python. Compact or eloquent if possible. This is dead simple in R but can't figure it out here
biorep well sample assay cq
1 C1 spA MSGN1 10
1 C2 spA MSGN1 15
1 C3 spA MSGN1 20
1 C4 ref MSGN1 5
2 C1 spB MSGN1 12
2 C2 spB MSGN1 15
2 C3 spB MSGN1 20
2 C4 ref MSGN1 4
I would like to group by biorep and assay. And for each of those groups subtract rep from the other things in the group. So i should end up with something like this. l.
biorep well sample assay cq delta
1 C1 spA MSGN1 10 5
1 C2 spA MSGN1 15 10
1 C3 spA MSGN1 20 15
1 C4 ref MSGN1 5. 0
2 C1 spB MSGN1 12 8
2 C2 spB MSGN1 15 11
2 C3 spB MSGN1 20 16
2 C4 ref MSGN1 4 0
I know this should be dead easy but I am having trouble even with the suggestions from stackoverflow. I have tried using this suggestion which is the top one i find Subtracting group specific value from rows in pandas but to no avail. I get these NaN's. All the other suggestions dont seem relevant or have had trouble googling for this issue. Anyone have an idea here?
def func(grp):
ref = grp.loc[grp['sample'] == 'No Spacer Control','cq']
grp['delta'] = grp[grp['sample'] == 'No Spacer Control']['cq'] - ref
return grp
summry.groupby(['biorep','assay']).apply(func)
biorep sample assay cq delta
1 No Spacer Control HPRT 25.237209 0.0
1 spMSGN1_1 HPRT 25.632444 NaN
1 spMSGN1_10 HPRT 25.298286 NaN
1 spMSGN1_11 HPRT 25.130593 NaN
1 spMSGN1_2 PRT 25.371607 NaN
CodePudding user response:
You can create a dataframe from ref values and merge it with your original dataframe:
df_ref = df.loc[df['sample'] == 'ref', ['biorep', 'assay', 'cq']]
df = df.merge(df_ref, how='left', on=['biorep', 'assay'], suffixes=('', '_ref'))
df['delta'] = df['cq'] - df['cq_ref']
CodePudding user response:
Given the order of the records, you can subtract the last cq from the group, as long as you include plate in your grouping.
df['delta'] = df.groupby(['biorep','assay', 'plate'])['cq'].apply(lambda x: x.sub(x.iloc[-1]))
Output
plate well biorep sample assay cq delta
0 1 C1 1 spA MSGN1 10 5
1 1 C2 1 spA MSGN1 15 10
2 1 C3 1 spA MSGN1 20 15
3 1 C4 1 ref MSGN1 5 0
4 2 C1 1 spB MSGN1 12 8
5 2 C2 1 spB MSGN1 15 11
6 2 C3 1 spB MSGN1 20 16
7 2 C4 1 ref MSGN1 4 0
CodePudding user response:
Update after your edit:
Try:
compute_delta = lambda x: x.loc[x['sample'].ne('ref'), 'cq'] \
.sub(x.loc[x['sample'].eq('ref'), 'cq'].squeeze())
df['delta'] = df.groupby(['biorep', 'assay'], as_index=False) \
.apply(compute_delta).droplevel(0).reindex(df.index, fill_value=0)
print(df)
# Output
biorep well sample assay cq delta
0 1 C1 spA MSGN1 10 5
1 1 C2 spA MSGN1 15 10
2 1 C3 spA MSGN1 20 15
3 1 C4 ref MSGN1 5 0
4 2 C1 spB MSGN1 12 8
5 2 C2 spB MSGN1 15 11
6 2 C3 spB MSGN1 20 16
7 2 C4 ref MSGN1 4 0
