I currently have a recursive function that removes ALL keys that match a pattern. Here is the background:
Example Json
{
"results": [{
"name": "john doe",
"age": "100",
"owned_cars": [{
"make": "ford",
"color": "white"
}, {
"make": "bmw",
"color": "red"
}],
"wished_cars": [{
"make": "honda"
}, {
"make": "toyota",
"style": "sleek"
}, {
"style": "fat"
}]
}]
}
Here's the function:
def remove_all_keys_matching_value(d, keys_to_remove):
if not isinstance(d, (dict, list)):
return d
if isinstance(d, list):
return [remove_all_keys_matching_value(v, keys_to_remove) for v in d]
return {k: remove_all_keys_matching_value(v, keys_to_remove) for k, v in d.items() if k not in keys_to_remove}
If I run the function with these keys to remove keys_to_remove = ('make', 'name') I'll get the following result:
{
"results": [{
"age": "100",
"owned_cars": [{
"color": "white"
}, {
"color": "red"
}],
"wished_cars": [{}, {
"style": "sleek"
}, {
"style": "fat"
}]
}]
}
I want to adjust this code to be more targeted so it doesn't remove all instances of the key but rather takes into account the root value of the key/path if that makes sense.
So for example if I were to pass in a tuple containing (('owned_cars', 'make'), 'name') it would return:
{
"results": [{
"age": "100",
"owned_cars": [{
"color": "white"
}, {
"color": "red"
}],
"wished_cars": [{
"make": "honda"
}, {
"make": "toyota",
"style": "sleek"
}, {
"style": "fat"
}]
}]
}
I know I need to keep track of the root key somehow but am unsure how to fold this in. I would appreciate any help in solving this. I always struggle when the recursion gets this complex and would love to see how someone more experienced would approach it so I can improve.
While I am interested in the solution to this problem, I'm more interested in learning how to approach a problem like this? I understand whats happening at a high level in the recursive method but struggle when I need to start stepping through it. I don't know how to make the leap to adjusting the code to identify the root path.
CodePudding user response:
division of complexity
We could start with a remove function that takes any t and any number of paths -
def remove(t, *paths):
for p in paths:
t = remove1(t, p)
return t
As you can see, it has a simple operation calling remove1(t, p) for all p in the provided paths. The final t is returned. This separates the complexity of removing a single path and removing many paths. We offload the majority of the work to remove1.
remove1
Your original code is pretty close. This remove1 takes any t and a single path.
- If the
pathis empty, returntunmodified - (inductive) the
pathhas at least one element. Iftis a list, applyremove1(e, path)for alleof the listt - (inductive) that
pathhas at least one element andtis not a list. Iftis a dictionary -- If the
pathhas only one element, create a new dictionary withkassigned to the result of the sub-problemremove1(v, path)for allk,vof the dictionaryt, excluding anykmatching the path's element,path[0] - (inductive) the
pathhas at least two elements. Create a new dictionary withkassigned to the result sub-problemremove1(v, path[1:])ifkmatches the first element of thatpathotherwise assignkto the result of the sub-problemremove1(v, path)for allk,vof the dictionaryt.
- If the
- (inductive)
tis a non-list andtis a non-dictionary. Returntunmodified.
def remove1(t, path):
if not path:
return t
elif isinstance(t, list):
return list(remove1(e, path) for e in t)
elif isinstance(t, dict):
if len(path) == 1:
return {k:remove1(v, path) for (k,v) in t.items() if not k == path[0] }
else:
return {k:remove1(v, path[1:]) if k == path[0] else remove1(v, path) for (k,v) in t.items()}
else:
return t
modification to the input data
I added another layer to your data so we can see precisely how remove is working -
data = {
"results": [{
"name": "john doe",
"age": "100",
"owned_cars": [{
"additional_layer": { # <-- additional layer
"make": "foo",
"color": "green"
}
}, {
"make": "ford",
"color": "white"
}, {
"make": "bmw",
"color": "red"
}],
"wished_cars": [{
"make": "honda"
}, {
"make": "toyota",
"style": "sleek"
}, {
"style": "fat"
}]
}]
}
demo
Let's see remove work now -
import json
data = { ... }
new_data = remove(data, ("owned_cars", "make"), ("style",))
print(json.dumps(new_data, indent=2))
This says remove all "make" keys that are any descendant of "owned_cars" keys and remove all "style" keys -
{
"results": [
{
"name": "john doe",
"age": "100",
"owned_cars": [
{
"additional_layer": {
# <-- make removed
"color": "green"
}
},
{
# <-- make removed
"color": "white"
},
{
# <-- make removed
"color": "red"
}
],
"wished_cars": [
{
"make": "honda" # <-- make not removed
},
{
"make": "toyota" # <-- make not removed
# <-- style removed
},
{} # <-- style removed
]
}
]
}
