Home > database >  Distributivity of 'must' over 'should' in elasticsearch queries
Distributivity of 'must' over 'should' in elasticsearch queries

Time:01-30

I felt puzzled by this behavior while querying my index.

Whether you interpret it in a boolean fashion or as sets (OR being a union and AND being an intersection), I take for granted that X AND (Y OR Z) = (X AND Y) OR (X AND Z). In the following examples,

  • X AnneeConstructionLogement < 1960
  • Y ResultatGlobalAmiante = true
  • Z TypeDiagnosticAmiante = "DAT"

X AND (Y OR Z)

{
 "query": {
    "bool": {
      "must": [
        {
          "range": {
            "AnneeConstructionLogement.keyword": {
              "lt": 1960
            }
          }
        },
        {
          "bool": {
            "should": [
                {"term": {
                    "ResultatGlobalAmiante.keyword": true
                }},
                {"term": {
                    "TypeDiagnosticAmiante.keyword": "DAT"
                }}
              ]
          }
        }
      ]
 }
}

gives me 37 hits

(X AND Y) OR (X AND Z)

{
  "query": {
    "bool": {
      "should": [
        {
          "bool": {
            "must": [
              {
                "term": {
                  "ResultatGlobalAmiante": true
                }
              },
              {
                "range": {
                  "AnneeConstructionLogement.keyword": {
                    "lt": 1960
                  }
                }
              }
            ]
          }
        },
        {
          "bool": {
            "must": [
              {
                "term": {
                  "TypeDiagnosticAmiante.keyword": "DAT"
                }
              },
              {
                "range": {
                  "AnneeConstructionLogement.keyword": {
                    "lt": 1960
                  }
                }
              }
            ]
          }
        }
      ]
    }
  }
}

gives me 102 hits which I find surprising, for both are logically equivalent (or, at least, I do not see any difference between those). Even more surprising, the KQL I started from _index : ace-logement and AnneeConstructionLogement <= "1960" and ResultatGlobalAmiante: true or _index : ace-logement and AnneeConstructionLogement <= "1960" and TypeDiagnosticAmiante: DAT gives me 134 hits

Is this transposition of must and should on AND and OR relevant? Is this mismatch logic or implementation related?

CodePudding user response:

The problem came from the use of .keyword (not sure why but interested to know). Thanks, @ilvar. I finally got the same number of hits

{
 "query": {
    "bool": {
      "must": [
        {
          "range": {
            "AnneeConstructionLogement": {
              "lt": 1960
            }
          }
        },
        {
          "bool": {
            "should": [
                {"term": {
                    "ResultatGlobalAmiante": true
                }},
                {"term": {
                    "TypeDiagnosticAmiante": "DAT"
                }}
              ]
          }
        }
      ]
 }
}

{
  "query": {
    "bool": {
      "should": [
        {
          "bool": {
            "must": [
              {
                "term": {
                  "ResultatGlobalAmiante": true
                }
              },
              {
                "range": {
                  "AnneeConstructionLogement": {
                    "lt": 1960
                  }
                }
              }
            ]
          }
        },
        {
          "bool": {
            "must": [
              {
                "term": {
                  "TypeDiagnosticAmiante": "DAT"
                }
              },
              {
                "range": {
                  "AnneeConstructionLogement": {
                    "lt": 1960
                  }
                }
              }
            ]
          }
        }
      ]
    }
  }
}

  •  Tags:  
  • Related