I have a collection which contains around 10.2M records.My aggregate query takes around 6s to send back a record.How do I optimize my aggregate query.? Indexes are defined on the token0address,token1address and timestamp fields.
Below is my query
{
$match: {
$or: [
{
token0Address: quoteCurrency,
},
{
token1Address: quoteCurrency,
},
],
timestamp: {
$gte: Number(historyDTO.from),
$lte: Number(historyDTO.to),
},
},
},
{
$group: {
_id: idObj,
transactionDate: { $first: '$transactionDate' },
timestamp: { $first: '$timestamp' },
minimum_price: { $min: '$priceInToken0' },
maximum_price: { $max: '$priceInToken0' },
median_price: { $avg: '$priceInToken0' },
open_price: { $first: '$priceInToken0' },
close_price: { $last: '$priceInToken0' },
volume: { $sum: '$priceInToken0' },
},
},
{ $sort: { timestamp: -1 } },
{ $skip: 0 },
{ $limit: Number(historyDTO.countback) },
idObj looks like this.It contains other combinations too
if (last === 'H') {
idObj = {
day: { $dayOfYear: '$transactionDate' },
hour: {
$subtract: [
{ $hour: '$transactionDate' },
{ $mod: [{ $hour: '$transactionDate' }, parseInt(exceptLast)] },
],
},
};
}
CodePudding user response:
I will mention some changes that I think could improve performance, however in general I will say this is pretty much optimized and I would personally not spend more time optimizing it unless it's an integral part of many process and each millisecond of runtime is important.
Create compound indexes for both fields
token0Addressandtoken1Address, Right now Mongo is using these indexes, however it then needs to fetch and match documents based ontimestamp. If you could built both indexes as compound{token0Address: 1, timestamp: 1}then Mongo could finish the$matchstage somewhat faster. (depending on the given range obviously and scale of the matched documents).altering the
idObjif possible, you mentionedidObjhas different structures, maybe some of them have redundancies that could be eliminated, here we can't really do anything due to the dynamic usage of the$modoperator, Another possibility would be to preprocess the fields required ( for example in this case to save the$hourresult so we stop using this operation ).the last option is more of a trick but if you have known heuristics on your data distribution and queries you might be able to add earlier limits and process less data. this is usually not a realistic option especially with the somewhat low scale of your data.
