Home > Back-end >  How do I properly set the Datetime as an index for a Pandas dataframe?
How do I properly set the Datetime as an index for a Pandas dataframe?

Time:01-08

I have a dataframe like below.

|items(index)|event|day|
|---- |------|-----|
|1000|[buy, buy]|[2019-03-26, 2019-04-26]|
|1001|[buy, sell]|[2019-05-26, 2019-06-26]|
|1002|[buy, sell]|[2019-04-21, 2019-04-29]|
|1003|[sell, buy]|[2019-03-21, 2019-04-23]|

The order of values ​​in the list of event columns corresponds to the order of values ​​in the list of day columns; thus, 1000's first buy action occurred 2019-03-26.

The problem is that I want the Datetime to be the index and the current index to be the column, so it looks like the following.

|day(index)|1000|1001|1002|1003|
|----|------|----- |-----|-----|
|2019-03-21|   |   |     |sell |
|2019-03-26|buy|   |     |     |
|2019-04-21|   |   |buy  |     |
|2019-04-23|   |   |     |     |
|2019-04-26|buy|   |     |     |
|2019-04-29|   |   |sell |     |
|2019-05-26|   |buy|     |     |
|2019-06-26|   |sell|    |     |

I don't know how I can effectively convert the above dataframe.

additional. enter image description here

CodePudding user response:

Assuming your df to be:

In [354]: df = pd.DataFrame({'items':[1000, 1001, 1002, 1003], 'event':[['buy', 'buy'], ['buy', 'sell'], ['buy', 'sell'], ['sell', 'buy']], 'day':[['2019-03-26', '2019-04-26'], ['2019-05-26', '2019-06-26'], ['2019-04-21',
     ...: '2019-04-29'], ['2019-03-21', '2019-04-23']]})

In [355]: df
Out[355]: 
   items        event                       day
0   1000   [buy, buy]  [2019-03-26, 2019-04-26]
1   1001  [buy, sell]  [2019-05-26, 2019-06-26]
2   1002  [buy, sell]  [2019-04-21, 2019-04-29]
3   1003  [sell, buy]  [2019-03-21, 2019-04-23]

Use df.apply with df.pivot and df.fillna:

In [357]: x = df.set_index('items').apply(pd.Series.explode).reset_index()
In [360]: x.pivot('day', 'items', 'event').fillna('')
Out[360]: 
items      1000  1001  1002  1003
day                              
2019-03-21                   sell
2019-03-26  buy                  
2019-04-21              buy      
2019-04-23                    buy
2019-04-26  buy                  
2019-04-29             sell      
2019-05-26        buy            
2019-06-26       sell  

CodePudding user response:

IIUC, try this:

import pandas as pd

df = pd.DataFrame({'items':[1000,1001,1002,1003],
                   'event':[['buy', 'buy'],
                           ['buy', 'sell'],
                           ['buy', 'sell'],
                        ['sell', 'buy']],
                  'day':[['2019-03-26', '2019-04-26'],
                          ['2019-05-26', '2019-06-26'],
                          ['2019-04-21',  '2019-04-39'],
                          ['2019-03-21', '2019-04-23']]})

dfe = df.explode(['event', 'day'])
dfe.pivot('day', 'items', 'event').fillna('')

Output:

items      1000  1001  1002  1003
day                              
2019-03-21                   sell
2019-03-26  buy                  
2019-04-21              buy      
2019-04-23                    buy
2019-04-26  buy                  
2019-04-39             sell      
2019-05-26        buy            
2019-06-26       sell            
  •  Tags:  
  • Related