Home > Mobile >  How to filter a nested dict by the key of the nested element?
How to filter a nested dict by the key of the nested element?

Time:02-08

I have a nested dictionary of source words, target words, and their frequency counts. It looks like this: src_tgt_dict = {"each":{"chaque":3}, "in-front-of":{"devant":4}, "next-to":{"à-côté-de":5}, "for":{"pour":7}, "cauliflower":{"chou-fleur":4}, "on":{"sur":2, "panda-et":2}}

I am trying to filter the dictionary so that only key-value pairs that are prepositions (including multi-word prepositions) remain. To that end, I've written the following:

tgt_preps = set(["devant", "pour", "sur", "à"]) #set of initial target prepositions

src_tgt_dict = {"each":{"chaque":3}, "in-front-of":{"devant":4}, "next-to":{"à-côté-de":5}, "for":{"pour":7}, "cauliflower":{"chou-fleur":4}, "on":{"sur":2, "panda-et":2}}

new_tgt_preps = [] #list of new target prepositions

for src, d in src_tgt_dict.items(): #loop into the dictionary
    for tgt, count in d.items(): #loop into the nested dictionary
        check_prep = []
        if "-" in tgt: #check to see if hyphen occurs in the target word (this is to capture multi-word prepositions that are not in the original preposition set)
            check_prep.append(tgt[0:(tgt.index("-"))]) #if there's a hyphen, append the preceding word to the check_prep list
            for t in check_prep: 
                if t in tgt_preps: # check to see if the token preceding the hyphen is a preposition
                    new_tgt_preps.append(tgt) #if yes, append the multi-word preposition to the list of new target prepositions

tgt_preps.update(new_tgt_preps) # update the set of prepositions to include the multi-word prepositions

temp_2_src_tgt_dict = {} # create new dict for filtering
for src, d in src_tgt_dict.items(): # loop into the dictionary
    for tgt, count in d.items(): # loop into the nested dictionary
        if tgt in tgt_preps: # if the target is in the set of target prepositions
            temp_2_src_tgt_dict[tgt] = count # add to the new dict with the tgt as the key and the count as the value

When I print the new dict, I get the following:

{'devant': 4, 'pour': 7, 'sur': 2, 'à-côté-de': 5}

And it totally makes sense why I get that, because that's what I told the machine to do. But that's not my intention!

What I want is:

{"in-front-of:{"devant":4}, "for":{"pour":7}, "on":{"sur":2}, {"next-to":{"à-côté-de":5}}

I've tried to instantiate the nested dictionary by writing:

temp_2_src_tgt_dict[tgt][src] = count

but that throws up a Key Error.

I've also tried:

new_tgt_dict = {}
for i in src_tgt_dict.items():  
    for j in tgt_preps:
        if j in list(i[1].keys())[0][:len(j)]:
            new_tgt_dict.update({i[0]: i[1]})

But that outputs {'in-front-of': {'devant': 4}, 'next-to': {'à-côté-de': 5}, 'for': {'pour': 7}, 'on': {'sur': 2, 'panda-et': 2}}, which is correct in format, but the value 'panda-et' should not be included because it does not occur in tgt_preps when updated with new_tgt_preps.

Can anyone provide any suggestions or advice? Thank you in advance for your help.

CodePudding user response:

Maybe something like this:

from collections import defaultdict

new_tgt_dict = defaultdict(dict)
for k, v in src_tgt_dict.items():
  for k1, v1 in v.items():
    k_temp = k1
    if "-" in k1:
      k_temp = k1[0:(k1.index("-"))]
    if k_temp in tgt_preps:
      new_tgt_dict[k].update({k1: v1})
print(dict(new_tgt_dict))
{'in-front-of': {'devant': 4}, 'next-to': {'à-côté-de': 5}, 'for': {'pour': 7}, 'on': {'sur': 2}}
  •  Tags:  
  • Related