Extract dictionary value from column in data frame with Vaex-CodePudding

I applied on my dataframe the next command

df['date_article'] = df.pagePath.str.extract_regex(pattern='(?P<digit>/\d{4}/\d{2}/\d{2}/)')

And this created the column 'date_article'

pagePath	date_article
'/empresas/2021/10/22/tiendas-no-participan-buen'	{'digit': '/2021/10/22/'}
'/finanzas-personales/2021/10/22/pueden-cobrar-c	{'digit': '/2021/10/22/'}

Now I want to left only the date in 'date_article'.

Expected output

pagePath	date_article
'/empresas/2021/10/22/tiendas-no-participan-buen'	'/2021/10/22/'
/finanzas-personales/2021/10/22/pueden-cobrar-c	'/2021/10/22/'

I tried many things but nothing seems to work

Thank you in advance for help

CodePudding user response：

How about the following:

df['date_article'] = df.apply(lambda x: x['digit'], axis=1)

CodePudding user response：

It appears that extract_regex returns a struct series:

Extract substrings defined by a regular expression using Apache Arrow (Google RE2 library).

Parameters
pattern (str) – A regular expression which needs to contain named capture groups, e.g. ‘letter’ and ‘digit’ for the regular expression
‘(?P[ab])(?Pd)’.

Returns
an expression containing a struct with field names corresponding to capture group identifiers.

So you will need to extract the field you want out of the struct. I'm not a Vaex expert but maybe something like:

struct_series = df.pagePath.str.extract_regex(pattern='(?P<digit>/\d{4}/\d{2}/\d{2}/)')
df['date_article'] = struct_series.struct.get('digit')

CodePudding user response：

Use:

df = pd.DataFrame({'date_article':[{'digit': '/2021/10/22/'}]})
df['date_article'] = df['date_article'].apply(lambda x: x['digit'])

This uses a lambda function which returns the value of digit key on the specified column and assigns it again. Why you do not use just the following:

df = pd.DataFrame({'date_article':['sdfsdf/2021/10/22/']})
df['date_article'] = df['date_article'].str.extract('(/\d{4}/\d{2}/\d{2}/)')