I have this entry
id value reps
1 333 1
1 332 4
1 335 1
4 555 3
4 225 1
444 2 5
I want this, organizing the values by the column reps from less to more
id col1 col2 col3 col4
1 333 335 332 nan
4 225 555 nan nan
444 2 nan nan nan
I have tried to use pivot table and got this
dataframe = dataframe.pivot_table(index='id', columns='reps', values='value')
dataframe = dataframe.rename_axis(columns=None).reset_index()
id 1 3 4 5
1 334 nan 332 nan
4 225.5 555.5 nan nan
444 nan nan nan 2
CodePudding user response:
You can first sort the dataframe by reps (and also by id if it's the case) and then change reps by the cumulative count for each id.
Then you can perform the pivot table as you suggested.
df \
.sort_values(['id','reps']) \
.pipe(lambda df_: df_.assign(reps=df_.groupby(['id']).cumcount() 1)) \
.pivot_table(index='id', columns='reps', values='value') \
.add_prefix('col')
It will not return col4 with NaN in your expected output, but you can add it later.
CodePudding user response:
Group by "id" and create ("value", "reps") tuples, and sort it by "reps":
df_grouped = df.groupby("id")[["value", "reps"]].apply(lambda x: sorted(list(zip(x.value, x.reps)), key=lambda x: x[1]))
[Out]:
id
1 [(333, 1), (335, 1), (332, 4)]
4 [(225, 1), (555, 3)]
444 [(2, 5)]
Create result dataframe from above sorted tuples using "value" as data (which is at index 0).
df_result = pd.DataFrame(data=[[y[0] for y in x] for x in df_grouped], index=df_grouped.index)
[Out]:
0 1 2
id
1 333 335 332
4 225 555 NaN
444 2 NaN NaN
pandas fills in default column names starting with 0. You can change it if required:
df_result.columns = list(range(1, len(df_result.columns) 1))
