Home > Mobile >  Elasticsearch top_hits aggregation result and doc_count are different
Elasticsearch top_hits aggregation result and doc_count are different

Time:01-04

Query

GET /someindex/_search
{
   "size": 0,
   "query": {
      "ids": {
         "types": [],
         "values": ["08a2","08a3","03a2","03a3","84a1"]
      }
   },
   "aggregations": {
      "498": {
         "terms": {
            "field": "holderInfo.raw",
            "size": 50
         },
         "aggregations": {
            "tops": {
               "top_hits": {
                  "_source": {
                     "includes": ["uid"]
                  }
               }
            }
         }
      }
   }
}

Result

{
   ...
   "hits": {
      "total": 5,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "498": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "MATSUSHITA ELECTRIC INDUSTRIAL",
               "doc_count": 5,
               "tops": {
                  "hits": {
                     "total": 5,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "someindex",
                           "_id": "03a3",
                           "_score": 1,
                           "_source": {
                              "uid": "03a3"
                           }
                        },
                        {
                           "_index": "someindex",
                           "_id": "08a2",
                           "_score": 1,
                           "_source": {
                              "uid": "08a2"
                           }
                        },
                        {
                           "_index": "someindex",
                           "_id": "84a1",
                           "_score": 1,
                           "_source": {
                              "uid": "84a1"
                           }
                        }
                     ]
                  }
               }
            }
         ]
      }
   }
}

"08a2", "08a3", "03a2", "03a3" and "84a1" each clearly have 'MATSUSHITA ELECTRIC INDUSTRIAL' in the holderInfo.raw field.

Therefore, there are 5 cases in the doc_count, but only "03a3", "08a2", and "84a1" are output in the top_hits results, and "08a3" and "03a2" are omitted.

Query

GET /someindex/_search
{
   "size": 0,
   "query": {
      "ids": {
         "types": [],
         "values": ["08a2","08a3","03a2","03a3","84a1"]
      }
   },
   "aggregations": {
      "498": {
         "terms": {
            "script": {
               "inline": "doc['holderInfo.raw'].value"
            },
            "size": 50
         }
      }
   }
}

Result

{
   ...
   "hits": {
      "total": 5,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "498": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "MATSUSHITA ELECTRIC INDUSTRIAL",
               "doc_count": 3
            }
         ]
      }
   }
}

In addition, two cases are omitted when aggregating with script.

I'd like to know why some uids are missing.

I'm in a situation where I have to use the elasticsearch version 2.2. I want to know if it's an elasticsearch bug that occurs in an old version or a user's fault.

Thanks!

CodePudding user response:

By default, the top_hits aggregation returns the first 3 top hits. You just need to increase the size parameter:

GET /someindex/_search
{
   "size": 0,
   "query": {
      "ids": {
         "types": [],
         "values": ["08a2","08a3","03a2","03a3","84a1"]
      }
   },
   "aggregations": {
      "498": {
         "terms": {
            "field": "holderInfo.raw",
            "size": 50
         },
         "aggregations": {
            "tops": {
               "top_hits": {
                  "size": 5,                   <---- add this
                  "_source": {
                     "includes": ["uid"]
                  }
               }
            }
         }
      }
   }
}
  •  Tags:  
  • Related