Home > database >  Handle NaN values (zero value) in datetime.strptime in a converter used in pd.read_fwf
Handle NaN values (zero value) in datetime.strptime in a converter used in pd.read_fwf

Time:02-02

I have a source file which is length-delimited. My file looks as follows:

00;12345678;03122019
01;12345678;00000000

My code so far is as follows:

import pandas as pd
from datetime import datetime

col_lengths = {'Column1': range(0, 2), 
               'Column2': range(3, 11), 
               'Datecolumn': range(12, 20),
              }
col_lengths = {k: set(v) for k, v in col_lengths.items()}

pd.read_fwf(r'D:\test.txt', colspecs=[(min(x), max(x) 1) for x in col_lengths.values()], header=None, names=col_lengths.keys(), 
           converters={'Column1':lambda x : str(x),
                       'Column2':lambda x : str(x),
                       'Datecolumn': lambda s: datetime.strptime(s, '%d%m%Y'),
                      }
           )

Problem here is that with the second line of my file I get an error:

ValueError: time data '00000000' does not match format '%d%m%Y'

What is the correct way / a good way to handle this? I have no experience so far with this. The only alternative for me would be to take it as a string and later on try in a pandas dataframe to convert it. But if there would be an elegant way to handle this directly here, I would like to use it. I wondered how I could handle it here on the fly and if this would be a good idea?

CodePudding user response:

You could use pandas.to_datetime as converter:

#converters={...
             'Datecolumn': lambda s: pd.to_datetime(s, format='%d%m%Y', errors='coerce'),
#            }

output:

  Column1   Column2 Datecolumn
0      00  12345678 2019-12-03
1      01  12345678        NaT
alternative

read the column as str and convert afterwards:

df['Datecolumn'] = pd.to_datetime(df['Datecolumn'], format='%d%m%Y', errors='coerce')
  •  Tags:  
  • Related