How to create new columns from a column value in pandas based on IDs-CodePudding

I want to create new columns based on a given column value and merge the rows with same subject names within a given date, for example for a given pandas dataframe

subject	questionid	answer	datetime
First	q1	fa1	2021-02-27T
First	q2	fa2	2021-02-27T
First	q3	fa3	2021-02-27T
First	q4	fa4	2021-02-28T
Second	q1	sa1	2021-02-27T
Second	q2		2021-02-27T
Second	q3	sa3	2021-02-27T
Second	q4		2021-02-27T

I want my final dataframe to contain columns based on questionid where each row contains data in each date range i.e. within 24 hours of datetime for a single subject. **

So, the final output should look like this,

subject	q1	q2	q3	q4	datetime
First	fa1	fa2	fa3	nan	2021-02-27T
First	nan	nan	nan	fa4	2021-02-28T
Second	sa1	nan	sa3	nan	2021-02-27T

Notice that the first subject has two rows because of different dates the data were collected on.

I was able to create columns based on the questionid in a new dataframe by using

df.pivot(columns='questionid', values='answer')

But the final dataframe looks like this,

subject	questionid	answer	datetime	q1	q2	q3	q4
First	q1	fa1	2021-02-27T	fa1	nan	nan	nan
First	q2	fa2	2021-02-27T	nan	fa2	nan	nan
First	q3	fa3	2021-02-27T	nan	nan	fa3	nan
First	q4	fa4	2021-02-28T	nan	nan	nan	fa4
Second	q1	sa1	2021-02-27T	sa1	nan	nan	nan
Second	q2		2021-02-27T	nan	nan	nan	nan
Second	q3	sa3	2021-02-27T	nan	nan	sa3	nan
Second	q4		2021-02-27T	nan	nan	nan	nan

Is there an easier way without looping through the rows to create the desired output. Again

CodePudding user response：

You're almost there, use both ['subject', 'datetime'] as index in pivot:

(df.pivot(index=['subject', 'datetime'], columns='questionid', values='answer')
   .reset_index()
)

output:

  subject     datetime   q1   q2   q3   q4
0   First  2021-02-27T  fa1  fa2  fa3  NaN
1   First  2021-02-28T  NaN  NaN  NaN  fa4
2  Second  2021-02-27T  sa1  NaN  sa3  NaN

CodePudding user response：

I think that should do it:

pd.pivot_table(index=['subject', 'datetime'], values='answer', columns='questionid')