Home > database >  How to indicate the dtype when using map on a series?
How to indicate the dtype when using map on a series?

Time:01-25

I am using map in a Pandas Series to apply a function that extracts any string representing a date or an empty string if there is no date in that string.

import pandas as pd
import dateparser

text_series = pd.Series(data={'label 1':'some text',
                              'label 2':'something happened on 2012-12-31',
                              'label 3':'2013-12-31'})

new_series = text_series.map(lambda x: dateparser.search.search_dates(x)[-1][1] if dateparser.search.search_dates(x) else "")

The code works as expected and I end with a new Series with datetime objects representing the dates in the strings.

label 1          NaT
label 2   2012-12-31
label 3   2013-12-31
dtype: datetime64[ns]

My issue is that I get a warning because map infers datetime from the strings returned by the function and apparently that behaviour is deprecated and type should be indicated explicitely.

FutureWarning: Inferring datetime64[ns] from data containing strings is deprecated and will be removed in a future version. To retain the old behavior explicitly pass Series(data, dtype={value.dtype})

How can I avoid this warning and avoid this code to stop working when the old behaviour stops working?

CodePudding user response:

Took a different approach with regex

import pandas as pd
import regex as re

text_series = pd.Series(data={'label 1':'some text',
                              'label 2':'something happened on 2012-12-31',
                              'label 3':'2013-12-31'})

def make_dt(row):
    x = re.search(r'(\d{4}-\d{2}-\d{2})', row)
    if x:
        return pd.to_datetime(x.group(1))

new_series = text_series.apply(make_dt)

in case doesn't match the length: r'(\d-\d-\d)'

output:
label 1          NaT
label 2   2012-12-31
label 3   2013-12-31
dtype: datetime64[ns]
  •  Tags:  
  • Related