I have a elastic search index with two fields html and url and the following mapping:
{
"mappings": {
"properties": {
"html": {
"type": "text",
"fields": { "keyword": { "type": "keyword", "ignore_above": 256 } }
},
"url": {
"type": "text",
"fields": { "keyword": { "type": "keyword", "ignore_above": 256 } }
}
}
}
}
What is the best way to retrieve documents by the url? For example I want the document where the url field contains google.com. The result might be the two documents with the urls https://www.google.com and www.google.com/search. I tried different queries but it seems not to work all the time.
query = {
"query" : {
"match_phrase" : {
"url" : f"google.com"
}
}
}
response = elasticsearch.helpers.scan(
es_client,
index=my_index,
doc_type="_doc",
query=query
)
CodePudding user response:
TLDR;
You should use the keyword field not the text field.
query = {
"query" : {
"match" : {
"url.keyword" : f"google.com"
}
}
}
response = elasticsearch.helpers.scan(
es_client,
index=my_index,
doc_type="_doc",
query=query)
But keep in mind this is going to do exact match, on google.com
To reproduce
Create index and Add data
PUT /so_search_url/
{
"mappings": {
"properties": {
"html": {
"type": "text",
"fields": { "keyword": { "type": "keyword", "ignore_above": 256 } }
},
"url": {
"type": "text",
"fields": { "keyword": { "type": "keyword", "ignore_above": 256 } }
}
}
}
}
POST /so_search_url/_doc
{
"html": "<h1>Plop</h1>",
"url": "https://www.google.com"
}
POST /so_search_url/_doc
{
"html": "<h1>Plop</h1>",
"url": "https://www.google.fr"
}
POST /so_search_url/_doc
{
"html": "<h1>Plop</h1>",
"url": "https://www.google.com/search"
}
Search the data for exact match
GET /so_search_url/_search
{
"query": {
"match": {
"url.keyword": "https://www.google.com"
}
}
}
Search the data for prefix match
GET /so_search_url/_search
{
"query": {
"prefix": {
"url.keyword": {
"value": "https://www.google.com"
}
}
}
}
To understand
...two new types: text, which should be used for full-text search, and keyword, which should be used for keyword search.
