Home > Net >  Pandas dataframe group by 10 min intervals with different actions on other columns
Pandas dataframe group by 10 min intervals with different actions on other columns

Time:02-08

I have a pandas dataframe which includes a timestamp and 71 other columns, something like this:

              timestamp           |close_price|highest_price|volume| ...
              2018-09-29 00:00:20 |1809       |1811         |  ... |
              2018-09-29 00:00:34 |1823       |1832         |
              2018-09-29 00:00:59 |1832       |1863         |
              2018-09-29 00:01:09 |1800       |1802         |
              2018-09-29 00:01:28 |1832       |1845         |
              .
              .
              .

I want to put the data into 10 min intervals and I want to do separate operations on each column, for example I want the 10 min intervals of close_price column to show the last value of the corresponding range in the real table, or for the highest_price column, I want the max value of the corresponding range, or for volume I want the mean of the values in that range. I already tried

dataTable = datefram.resample("10min").agg({'first_price':'first',
                                       'close_price':'last',
                                       'highest_price': 'max',
                                       'volume':'mean', 
                                        #other attributes...
                                        })

but the result seems to be incorrect. Is there any other ways to do what I want to do? I will appreciate any comments or thoughts.

Note that there is no specific pattern in timestamp values. In 1 minute, we can have 0 to 60 rows.

CodePudding user response:

If your data spans multiple days or periods where you don't have any data points, calling resample() can result in lots of additional rows with NaN values. I think your code is actually correct, you just got the wrong impression from seeing all the extra rows.

CodePudding user response:

Your approach is correct. The dataframe.resample("10min").agg() does the calculations for you. You might get more outputs than what you expect and that is because this: resample method continuously adds 10 minutes to the time and do the calculations that you asked. But if there was no data in any of the 10 min intervals, it creates a NULL row. Maybe your data is not continuous and causes this Null rows.

You can simply delete the NULL rows by using dataframe.dropna()

  •  Tags:  
  • Related