How to optimize the mongodb aggregate query-CodePudding

I have a collection which contains around 10.2M records.My aggregate query takes around 6s to send back a record.How do I optimize my aggregate query.? Indexes are defined on the token0address,token1address and timestamp fields.

Below is my query

  {
    $match: {
      $or: [
        {
          token0Address: quoteCurrency,
        },
        {
          token1Address: quoteCurrency,
        },
      ],
      timestamp: {
        $gte: Number(historyDTO.from),
        $lte: Number(historyDTO.to),
      },
    },
  },
  {
    $group: {
      _id: idObj,
      transactionDate: { $first: '$transactionDate' },
      timestamp: { $first: '$timestamp' },
      minimum_price: { $min: '$priceInToken0' },
      maximum_price: { $max: '$priceInToken0' },
      median_price: { $avg: '$priceInToken0' },
      open_price: { $first: '$priceInToken0' },
      close_price: { $last: '$priceInToken0' },
      volume: { $sum: '$priceInToken0' },
    },
  },
  { $sort: { timestamp: -1 } },
  { $skip: 0 },
  { $limit: Number(historyDTO.countback) },

idObj looks like this.It contains other combinations too

if (last === 'H') {
idObj = {
  day: { $dayOfYear: '$transactionDate' },
  hour: {
    $subtract: [
      { $hour: '$transactionDate' },
      { $mod: [{ $hour: '$transactionDate' }, parseInt(exceptLast)] },
    ],
  },
};

}

CodePudding user response：

I will mention some changes that I think could improve performance, however in general I will say this is pretty much optimized and I would personally not spend more time optimizing it unless it's an integral part of many process and each millisecond of runtime is important.

Create compound indexes for both fields token0Address and token1Address, Right now Mongo is using these indexes, however it then needs to fetch and match documents based on timestamp. If you could built both indexes as compound {token0Address: 1, timestamp: 1} then Mongo could finish the $match stage somewhat faster. (depending on the given range obviously and scale of the matched documents).
altering the idObj if possible, you mentioned idObj has different structures, maybe some of them have redundancies that could be eliminated, here we can't really do anything due to the dynamic usage of the $mod operator, Another possibility would be to preprocess the fields required ( for example in this case to save the $hour result so we stop using this operation ).
the last option is more of a trick but if you have known heuristics on your data distribution and queries you might be able to add earlier limits and process less data. this is usually not a realistic option especially with the somewhat low scale of your data.