I have a web scraping application that loops through search_id's and from time to time it will find a duplicate search with a different key field, which is called tree_id. I'm struggling to figure out how to find the correct match using a recursive function. In most situation there will be two to three tree_id's in the json where it will need to be able to pick the correct match from the search which is in a different format.
Below is some example code with my comments, which will highlight the issue:
#original json from the web scraping application for a single example
json = {'status': 'multiple', 'searchResult': None, 'spellingResult': None, 'relatedTree': {'paths': [{'treeid': 'C0.A.01', 'path': 'ALIMENTARY TRACT AND METABOLISM|STOMATOLOGICAL PREPARATIONS'}, {'treeid': 'C0.A.01.A', 'path': 'ALIMENTARY TRACT AND METABOLISM|STOMATOLOGICAL PREPARATIONS|STOMATOLOGICAL PREPARATIONS'}]}, 'tableResult': None, 'synResult': None}
trees = json['relatedTree']['paths']#.replace(".","") this will cause an error because you can't use replace in a list
tree_id0 = json['relatedTree']['paths'][0]['treeid'].replace(".","") #replaces the string in treeid index position 0 to remove all periods.
print(tree_id0)
tree_id1 = json['relatedTree']['paths'][1]['treeid'].replace(".","") #replaces the string in treeid index position 1 to remove all periods.
print(tree_id1)
search = 'A01A' # example would be to search 'A01A' and then also 'A01' and have it pick the correct substring
search1 = 'A01'
if tree_id0.find(search) != -1: # correct while using 'A01' and works with 'A01A'.
print("Found!")
else:
print("Not found!")
if tree_id1.find(search) != -1: # incorrect while using 'A01' but works with 'A01A'. I need it to find the exact string and nothing to the right of the last letter of search
print("Found!")
else:
print("Not found!")
# my attempt at a recursive function to solve the problem, but I get sting indices must be integers and in it's current form I'm not sure if i'm going about the problem the wrong way.
def search_multi(trees: list, search: str) -> dict:
for tree in trees:
if tree['treeid'].replace(".","") == search:
print(tree['treeid'].replace(".",""))
return tree
if tree['treeid'].replace(".",""):
response = search_multi(tree['treeid'].replace(".",""), search)
if response:
return response
searched_multis = search_multi(trees, search)
print(searched_multis)
My desired result if the search was 'A01A' it would pick the tree_id C0.A.01.A and if the search was 'A01' it would pick the tree_id C0.A.01 out of the json.
The if, else statement will show how it should work, but it wont' give the correct result with A01 since it looks past the last letter.
CodePudding user response:
Here's one way. This returns the dict with the search "treeid":
def get_id(d, search):
if isinstance(d, dict):
for k,v in d.items():
if k == 'treeid' and ''.join(v.split('.')[1:]) == search:
yield d
else:
yield from get_id(v, search)
elif isinstance(d, list):
for i in d:
yield from get_id(i, search)
out = next(get_id(json, 'A01'))
Output:
{'treeid': 'C0.A.01',
'path': 'ALIMENTARY TRACT AND METABOLISM|STOMATOLOGICAL PREPARATIONS'}
