I have index and data described here, and also I have set analyzer to stop analyzer. That works fine, because when I try simple search like POST https://serverURL/_search?pretty=true
{
"query": {
"query_string": {
"default_field": "title",
"query": "Rebel the without" }
}
}
, server really returns
"title": "Rebel Without a Cause"
as result.
But, when I try to use fuzzy search
{
"query": {
"fuzzy": {
"title": {
"value": "Rebel the without"
}
}
}
}
, the result is empty. What is exactly going on here, does fuzzy search somehow disable analyzer ?
CodePudding user response:
Fuzzy query returns documents that contain terms that seem to be similar to the search term.
Since, you have not defined any explicit mapping for the "title" field, it uses standard analyzer, where the token generated will be :
{
"tokens" : [
{
"token" : "rebel",
"start_offset" : 0,
"end_offset" : 5,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "without",
"start_offset" : 6,
"end_offset" : 13,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "a",
"start_offset" : 14,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "cause",
"start_offset" : 16,
"end_offset" : 21,
"type" : "<ALPHANUM>",
"position" : 3
}
]
}
The fuzzy query will give you result for those search terms which are similar to the token generated like wihout, case, rebe, etc
GET /myidx/_search
{
"query": {
"fuzzy": {
"title": {
"value": "case"
}
}
}
}
Update 1:
Based on the comments below, you can use match bool prefix query
{
"query": {
"match_bool_prefix": {
"title": {
"query": "Rebel the without"
}
}
}
}
CodePudding user response:
It's important to understand how data is processed and stored in Elasticsearch to understand this behavior. So when you set up stop analyzer, any text you feed to the system is transformed into a list of tokens aka terms. At this point Elasticsearch field "doesn't remember" your original text (technically, it's stored in the _source field but it's not indexed) and only knows those terms (each coupled with its position in the original text, in your case - rebel, without, cause) which then get stored in an inverted index for quick lookup.
Now you run the fuzzy query - it's a term-level query which means it works against particular terms. Instead, you have to use full-text queries, like match:
POST /fuzz/_search
{
"query": {
"match": {
"title": {
"query": "Reble without",
"fuzziness": "AUTO"
}
}
}
}
