I have a dataframe called df that looks like this:
| Provider | fid | pid | datetime | score | system |
|---|---|---|---|---|---|
| CHE-229 | 2bfc9a62 | 2f43d557 | 2021-09-26 | -3.0 | SOFA |
| CHE-229 | 78d5d845 | 88c59d92 | 2021-09-12 | -4.0 | SAPS |
I would like to create a new column specific to the values from system. e.g. I want to create a new column called SOFA and another column called SAPS with their respective scores in their table.
The output I want is:
| Provider | fid | pid | datetime | SOFA | SAPS |
|---|---|---|---|---|---|
| CHE-229 | 2bfc9a62 | 2f43d557 | 2021-09-26 | -3.0 | |
| CHE-229 | 78d5d845 | 88c59d92 | 2021-09-12 | -4.0 |
CodePudding user response:
You can get this with an iterative procedure using numpy.where() to choose the value depending on a condition, and then dropping the original columns:
for sys in df["system"].unique():
df[sys] = np.where(df["system"] == sys, df["score"], None)
df = df.drop(columns=["system", "score"])
CodePudding user response:
df.pivot(df.columns[:-2], 'system', 'score').fillna('').reset_index()
system Provider fid pid datetime SAPS SOFA
0 CHE-229 2bfc9a62 2f43d557 2021-09-26 -3.0
1 CHE-229 78d5d845 88c59d92 2021-09-12 -4.0
If you want them as numeric, then you can remove the fillna() part or even fill the nan with 0. ie .fillna(0)
