Home > database >  Elastic: How to search for document with specific url?
Elastic: How to search for document with specific url?

Time:01-07

I have a elastic search index with two fields html and url and the following mapping:

{
  "mappings": {
    "properties": {
      "html": {
        "type": "text",
        "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } }
      },
      "url": {
        "type": "text",
        "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } }
      }
    }
  }
}

What is the best way to retrieve documents by the url? For example I want the document where the url field contains google.com. The result might be the two documents with the urls https://www.google.com and www.google.com/search. I tried different queries but it seems not to work all the time.

query = {
   "query" : {
      "match_phrase" : {
         "url" : f"google.com"
      }
    }
}

response = elasticsearch.helpers.scan(
                es_client,
                index=my_index,
                doc_type="_doc",
                query=query
            )

CodePudding user response:

TLDR;

You should use the keyword field not the text field.

query = {
   "query" : {
      "match" : {
         "url.keyword" : f"google.com"
      }
    }
}

response = elasticsearch.helpers.scan(
    es_client,
    index=my_index,
    doc_type="_doc",
    query=query)

But keep in mind this is going to do exact match, on google.com

To reproduce

Create index and Add data

PUT /so_search_url/
{
  "mappings": {
    "properties": {
      "html": {
        "type": "text",
        "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } }
      },
      "url": {
        "type": "text",
        "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } }
      }
    }
  }
}

POST /so_search_url/_doc
{
  "html": "<h1>Plop</h1>",
  "url": "https://www.google.com"
}

POST /so_search_url/_doc
{
  "html": "<h1>Plop</h1>",
  "url": "https://www.google.fr"
}

POST /so_search_url/_doc
{
  "html": "<h1>Plop</h1>",
  "url": "https://www.google.com/search"
}

Search the data for exact match

GET /so_search_url/_search
{
  "query": {
    "match": {
      "url.keyword": "https://www.google.com"
    }
  }
}

Search the data for prefix match

GET /so_search_url/_search
{
  "query": {
    "prefix": {
      "url.keyword": {
        "value": "https://www.google.com"
      }
    }
  }
}

To understand

...two new types: text, which should be used for full-text search, and keyword, which should be used for keyword search.

[doc]

  •  Tags:  
  • Related