Home > OS >  How do I define a function to extract values from nested dictionary for each row in python
How do I define a function to extract values from nested dictionary for each row in python

Time:01-20

I have a column named 'urls' in dataframe 'df' that each row consists of nested dictionaries with a URL and whether it is malicious or not. I'd like to extract only the value of the nested dictionary for each row.

0    {'url example 1': {'malicious': False}}
1    {'url example 2': {'malicious': False}}  

By defining a function, I'd like to use 'apply' function to get the result for each row.

Here's the sample function that I have defined.

def urlconcern(url):
    try:
        r = s.lookup_urls([url]) 
        return r.values()
    except:
        pass

After running this with 'apply' function

df['urls'].apply(urlconcern)

This only gives the result below with round bracket (strangely)

0    ({'malicious': False})
1    ({'malicious': False})

The desired answer would be

False
False

Could there be any way to do so?

CodePudding user response:

Given pandas series s (I'm assuming it's a pandas series)

s = pd.Series([{'url example 1': {'malicious': False}},
               {'url example 2': {'malicious': False}}])

you can use generator expression inside next to look for values of nested dicts.

out = s.apply(lambda url: next((v for d in url.values() for k,v in d.items()), None))

Output:

0    False
1    False
dtype: bool

However, I'm not convinced this is what you're looking for since you're losing the url info here.

CodePudding user response:

Is this a pandas dataframe? Did you instantiate it? You may want to look at how this dictionary is constructed because it should be more like

>>> df = {'url':['url example 1', 'url example 2', 'url example 3'], 'malicious': [False, False, True]}
>>> df = pd.DataFrame(df)
>>> df
             url  malicious
0  url example 1      False
1  url example 2      False
2  url example 3       True

Then do

>>> df[df['malicious'] == False]
             url  malicious
0  url example 1      False
1  url example 2      False

I know this doesn't answer your question exactly, but it's a standard way of working with DataFrames and should help your workflow later down the line.

  •  Tags:  
  • Related