I have dataframe which has two date formats:
Date Sales
Mar-81 1020.2
Jun-81 889.2
Sep-81 795.0
Dec-81 1003.9
Mar-82 1057.7
this format continues until Dec-00. As the year transition from 2000 to 2001, the format changes as below:
Date Sales
1-Mar 1014.2
1-Jun 952.6
1-Sep 792.4
1-Dec 922.3
2-Mar 1055.9
This is supposed to be March-2001, June-2001, etc.
I am trying to filter this data based on this format and convert the corresponding rows to datetime format:
if df[df['Date'].str.endswith('\d ')]: #endswith digits
df['Date'] = pd.to_datetime(df['Date'], format='%b-%y')
else:
df['Date'] = df['Date'].apply(lambda x: x.zfill(6))
df['Date'] = pd.to_datetime(df['Date'], format='%y-%b')
I get the below error when I run this: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Not sure what I am doing wrong here.
CodePudding user response:
One way using pandas.to_datetime's errors:
s = df["Date"]
s2 = pd.to_datetime(s, format='%b-%y', errors="coerce")
s2.fillna(pd.to_datetime(s.str.zfill(6), format='%y-%b', errors="coerce"))
Output:
0 1981-03-01
1 1981-06-01
2 1981-09-01
3 1981-12-01
4 1982-03-01
5 2001-03-01
6 2001-06-01
7 2001-09-01
8 2001-12-01
9 2002-03-01
Name: Date, dtype: datetime64[ns]
Your method is not working since df[df['Date'].str.endswith('\d ')] (i.e. a subset of dataframe) cannot be directly converted to either True or False.
CodePudding user response:
The if statement is on a data frame which has multiple boolean values. That is what the error means.
Switch the order. Make a function that works on a single value and apply it to the dates.
def to_date(x):
if re.search('\d $', x) is not None:
return pd.to_datetime(x, format='%b-%y')
else:
return pd.to_datetime(x.zfill(6), format='%y-%b')
df['Date'] = df['Date'].apply(to_date)
