Home > Software engineering >  Using Ingest Attachment Plugin within elastic search index template
Using Ingest Attachment Plugin within elastic search index template

Time:01-17

I am trying to update my current elastic search schema which is on 1.3.2 to the latest one. For one of the indexes, the current schema looks something like the below:

curl -XPOST localhost:9200/_template/<INDEXNAME> -d '{
    "template" : "*-<INDEXNAME_TYPE>",
    "index.mapping.attachment.indexed_chars": -1,
    "mappings" : {
        "post" : {
            "properties" : {
                "sub" : { "type" : "string" },
                "sender" : { "type" : "string" },
                "dt" : { "type" : "date", "format" : "EEE, d MMM yyyy HH:mm:ss Z" },
                "body" : { "type" : "string"},
                "attachments" : {
                    "type" : "attachment",
                    "path" : "full",
                    "fields" : {
                        "attachments" : {
                            "type" : "string",
                            "term_vector" : "with_positions_offsets",
                            "store" : true
                        },
                        "name" : {"store" : "yes"},
                        "title" : {"store" : "yes"},
                        "date" : {"store" : "yes"},
                        "content_type" : {"store" : "yes"},
                        "content_length" : {"store" : "yes"}
                    }
                }
            }
        }
    }
}'

With my old version of Elastic Search, there is a "mapper-attachment" plugin installed. I am aware that the "mapper-attachment" plugin has been replaced by the "Ingest Attachment Processor" and following the examples from the plugins' website, I do understand their examples where I got to create a pipeline,

PUT _ingest/pipeline/attachment
  {
    "description" : "Extract attachment information from arrays",
    "processors" : [
      {
        "foreach": {
          "field": "attachments",
          "processor": {
            "attachment": {
              "target_field": "_ingest._value.attachment",
              "field": "_ingest._value.data",
              "indexed_chars" : -1
            }
          }
        }
      }
    ]
  }

  PUT my-index-000001/_doc/my_id?pipeline=attachment
  {
    "sub" : "This is a test post",
    "sender" : "[email protected]",
    "dt" : "Sat, 15 Jan 2022 08:50:00 AEST"
    "body" : "Test Body",
    "fromaddr": "[email protected]",
    "toaddr": "[email protected]",
    "attachments" : [
      {
        "filename" : "ipsum.txt",
        "data" : "dGhpcyBpcwpqdXN0IHNvbWUgdGV4dAo="
      },
      {
        "filename" : "test.txt",
        "data" : "VGhpcyBpcyBhIHRlc3QK"
      }
    ]
  } 

How do I make use of this new attachment processor to create the index template I had before?

Note: With my index and schema, for each "post", there will be one or many attachments,

CodePudding user response:

The answer is, unlike the previous version, I cannot use the data type of attachment. So following the example from the elastic.co website and from my own question, the answer is in my question itself.

  • 1st: create the pipeline as in the question
  • 2nd Create the schema [see below]
  • 3rd Insert the data as shown in the question. When inserting the data into the index, use pipeline=attachment as the name of the pipeline and the plugin would parse the given attachment into the schema above
curl -XPOST localhost:9200/_template/<INDEXNAME> -d '{
    "template" : "*-<INDEXNAME_TYPE>",
    "index.mapping.attachment.indexed_chars": -1,
    "mappings" : {
        "post" : {
            "properties" : {
                "sub" : { "type" : "string" },
                "sender" : { "type" : "string" },
                "dt" : { "type" : "date", "format" : "EEE, d MMM yyyy HH:mm:ss Z" },
                "body" : { "type" : "string"},
                "attachments" : {
                    "properties" : {
                        "attachment" : {
                            "properties" : {
                                "content" : { 
                                    "type" : "text",
                                    "store": true,
                                    "term_vector": "with_positions_offsets"
                                 },
                                "content_length" : { "type" : "long" },
                                "content_type" : { "type" : "keyword" },
                                "language" : { "type" : "keyword"},
                                "date" : { "type" : "date", "format" : "EEE, d MMM yyyy HH:mm:ss Z" }
                            }
                        },
                        "content" : { "type": "keyword" },
                        "name" : { "type" : "keyword" }
                    }
                }
            }
        }
    }
}'
  •  Tags:  
  • Related