I am performing a refactor of the code to query an ES index, and I was wondering if there is any difference between the two snippets below:
"bool" : {
"should" : [ {
"terms" : {
"myType" : [ 1 ]
}
}, {
"terms" : {
"myType" : [ 2 ]
}
}, {
"terms" : {
"myType" : [ 4 ]
}
} ]
}
and
"terms" : {
"myType" : [ 1, 2, 4 ]
}
CodePudding user response:
Please check this blog from Elastic discuss page which will answer your question. Coying here for quick referance:
There's a few differences.
- The simplest to see is the verbosity -
termsqueries just list an array whiletermqueries require more JSON. termsqueries do not score matches based on IDF (the rareness) of matched terms - thetermquery does.termqueries can only have up to 1024 values due to Boolean's max clause counttermsqueries can have more terms
By default, Elasticsearch limits the
termsquery to a maximum of 65,536 terms. You can change this limit using theindex.max_terms_countsetting.
Which of them is going to be faster? Is speed also related to the number of terms?
It depends. They execute differently. term queries do more expensive scoring but does so lazily. They may "skip" over docs during execution because other more selective criteria may advance the stream of matching docs considered.
The terms queries doesn't do expensive scoring but is more eager and creates the equivalent of a single bitset with a one or zero for every doc by ORing all the potential matching docs up front. Many terms can share the same bitset which is what provides the scalability in term numbers.
