Home > Enterprise >  ElasticSearch Index Sorting and Time-series indexes
ElasticSearch Index Sorting and Time-series indexes

Time:02-04

I have a system in which EESS has two indexes (simplified).

  • transactions_2022
  • transactions_2021

I decide the year based on "transactionDate", my query is "give me 500 transactions which satisfy some filters ordered by transactionDate desc" <-- this is paginated, but I don't control the client, it uses from and size to do the paging.

I want to search them ordered by time by transactionDate using https://www.elastic.co/guide/en/elasticsearch/reference/8.0/index-modules-index-sorting.html

POST /transactions_2022,transactions_2021/_search{
 "from" : 0, "size" : 500,
 "query": ...,
  "sort": {
    "transactionDate": {
      "order": "desc"
    }
  }
}

The question is (i'm asking because I'm designing a new system and trying to justify a version upgrade, can't test easily), if I enable index sorting in both indexes and request 500 transactions, will EESS be smart enough to return avoid doing a full-scan of all the documents which match the filters and just return the 500 first transactions? (or 500 from each index and then discard the other 500 in a post-process phase).

Also If I sort by index name..., will that serialize the queries? or they will be executed in parallel and then one discarded if not needed

Tanks!

CodePudding user response:

Elasticsearch usually runs operations in a map-reduce manner distributed across shards, and it stores data in very efficient indices so it won't scan through each doc even without index sorting; instead it will gather 500 top docs from each shard and then merge them (and discard the "tail").

What index sorting does, it controls how docs are stored on disk so when it's time to retrieve the docs, sequential reads would be more efficient.

Also you can use "track_total_hits": false it will tell ES to short-cut back as soon as the shard had found these top 500 docs and do not touch the others. By default, ES would still compute the rest of the result to return total matching documents count. Though, if your client relies on that data for the pagination then this obviously won't work.

  •  Tags:  
  • Related