I am trying to update my current elastic search schema which is on 1.3.2 to the latest one. For one of the indexes, the current schema looks something like the below:
curl -XPOST localhost:9200/_template/<INDEXNAME> -d '{
"template" : "*-<INDEXNAME_TYPE>",
"index.mapping.attachment.indexed_chars": -1,
"mappings" : {
"post" : {
"properties" : {
"sub" : { "type" : "string" },
"sender" : { "type" : "string" },
"dt" : { "type" : "date", "format" : "EEE, d MMM yyyy HH:mm:ss Z" },
"body" : { "type" : "string"},
"attachments" : {
"type" : "attachment",
"path" : "full",
"fields" : {
"attachments" : {
"type" : "string",
"term_vector" : "with_positions_offsets",
"store" : true
},
"name" : {"store" : "yes"},
"title" : {"store" : "yes"},
"date" : {"store" : "yes"},
"content_type" : {"store" : "yes"},
"content_length" : {"store" : "yes"}
}
}
}
}
}
}'
With my old version of Elastic Search, there is a "mapper-attachment" plugin installed. I am aware that the "mapper-attachment" plugin has been replaced by the "Ingest Attachment Processor" and following the examples from the plugins' website, I do understand their examples where I got to create a pipeline,
PUT _ingest/pipeline/attachment
{
"description" : "Extract attachment information from arrays",
"processors" : [
{
"foreach": {
"field": "attachments",
"processor": {
"attachment": {
"target_field": "_ingest._value.attachment",
"field": "_ingest._value.data",
"indexed_chars" : -1
}
}
}
}
]
}
PUT my-index-000001/_doc/my_id?pipeline=attachment
{
"sub" : "This is a test post",
"sender" : "[email protected]",
"dt" : "Sat, 15 Jan 2022 08:50:00 AEST"
"body" : "Test Body",
"fromaddr": "[email protected]",
"toaddr": "[email protected]",
"attachments" : [
{
"filename" : "ipsum.txt",
"data" : "dGhpcyBpcwpqdXN0IHNvbWUgdGV4dAo="
},
{
"filename" : "test.txt",
"data" : "VGhpcyBpcyBhIHRlc3QK"
}
]
}
How do I make use of this new attachment processor to create the index template I had before?
Note: With my index and schema, for each "post", there will be one or many attachments,
CodePudding user response:
The answer is, unlike the previous version, I cannot use the data type of attachment. So following the example from the elastic.co website and from my own question, the answer is in my question itself.
- 1st: create the pipeline as in the question
- 2nd Create the schema [see below]
- 3rd Insert the data as shown in the question. When inserting the data into the index, use
pipeline=attachmentas the name of the pipeline and the plugin would parse the given attachment into the schema above
curl -XPOST localhost:9200/_template/<INDEXNAME> -d '{
"template" : "*-<INDEXNAME_TYPE>",
"index.mapping.attachment.indexed_chars": -1,
"mappings" : {
"post" : {
"properties" : {
"sub" : { "type" : "string" },
"sender" : { "type" : "string" },
"dt" : { "type" : "date", "format" : "EEE, d MMM yyyy HH:mm:ss Z" },
"body" : { "type" : "string"},
"attachments" : {
"properties" : {
"attachment" : {
"properties" : {
"content" : {
"type" : "text",
"store": true,
"term_vector": "with_positions_offsets"
},
"content_length" : { "type" : "long" },
"content_type" : { "type" : "keyword" },
"language" : { "type" : "keyword"},
"date" : { "type" : "date", "format" : "EEE, d MMM yyyy HH:mm:ss Z" }
}
},
"content" : { "type": "keyword" },
"name" : { "type" : "keyword" }
}
}
}
}
}
}'
