Let my-index-0 be an ES index with an alias of my-index.
It has the following mapping:
{
"my-index-0": {
"aliases": {
"my-index": {}
},
"mappings": {
"doc": {
"properties": {
"foo": {
"properties": {
"fizz": {
"type": "keyword"
},
"baz": {
"type": "keyword"
}
}
}
}
}
}
}
}
Let's say I want to remove the baz field from foo. I'm using the following steps:
- Create a new index
my-index-1with updated mapping (foo.bazremoved) usingPUT /my-index-1
{
"mappings": {
"doc": {
"properties": {
"foo": {
"properties": {
"fizz": {
"type": "keyword"
},
}
}
}
}
}
}
- Reindex data from
my-index-0tomy-index-1usingPOST /_reindex
{
"source": {
"index": "my-index-0"
},
"dest": {
"index": "my-index-1"
}
}
- Move the
my-indexalias to themy-index-1index usingPOST /_aliases
{
"actions": [
{"remove": {"index": "my-index-0", "alias": "my-index"}},
{"add": {"index": "my-index-1", "alias": "my-index"}},
]
}
Expected result
Data in the new index does not have the foo.baz property.
Actual result
On my-index-1 creation, its mapping does not contain the foo.baz field, however, after re-indexation, my-index-1's mapping is changed to the old index' mapping.
Note: _source can be used for simple fields removal
If one wants to remove a field, for example, removal of bar from the mapping below
{
"mappings": {
"foo": {
"type": "text"
},
"bar": {
"type": "text"
}
}
}
it is sufficient to provide the _source param without the bar field in the request to reindex API:
{
"source": {
"index": "my-index-0",
"_source": ["foo"]
},
"dest": {
"index": "my-index-1"
}
}
How to achieve the same with a nested structure?
CodePudding user response:
When you use reindex ES tries to copy all data from source to destination index. If you want to make your index to not to be modified you need to add this line to your mapping:
"dynamic" : "strict"
Now if you want to reindex data you will get an error "strict_dynamic_mapping_exception" because "mapping set to strict, dynamic introduction of [baz] within [foo] is not allowed". So you need to delete this field in your reindex like this:
POST _reindex
{
"source": {
"index": "my-index-0"
},
"dest": {
"index": "my-index-1"
},
"script": {
"source": "ctx._source.remove(\"foo.baz\")"
}
}
Note: adding "dynamic" : "strict" is optional and prevents your index from modifying. It will work for you if you just edit your reindex query.
CodePudding user response:
I think I've found the generic solution I was looking for.
In the _source attribute, one can specify explicitly every nested field, therefore, the _source value for the scenario in the example should be ["foo.fiz"] - note the lack of "foo.bar" which shouldn't be copied.
{
"source": {
"index": "my-index-0",
"_source": ["foo.fiz"]
},
"dest": {
"index": "my-index-1"
}
}
Essentially, the problem of generating the "_source" attribute for a generic case, can be reduced to finding the intersection of sets of all property paths for old and new mappings.
Python solution
The function below Recursively iterate through properties and yield all property paths.
def get_property_path(properties: dict[str, Any], name: str = "") -> Iterator[str]:
for property_name, property_value in properties.items():
new_name = f"{name}.{property_name}" if name else property_name
if nested_properties := property_value.get("properties"):
yield from get_property_path(nested_properties, new_name)
else:
yield new_name
for example
>>> properties = {
"a": {
"properties": {
"b": {
"properties": {
"c": {"type": "text"},
},
},`
},
},
"e": {
"properties": {
"f": {"type": "text"},
},
},
}
>>> list(get_property_path(properties))
>>> ['a.b.c', 'e.f']
It can be later used to calculate the set of fields that should be copied (fields that are both in old and new mapping):
_source = list(
set(get_property_path(old_mapping["properties"]))
& set(get_property_path(new_mapping["properties"]))
)
I won't accept my answer tho, as there might be a simpler solution that is based on the ES API.
