I currently have a dataframe of customers, contracts, and contract dates like this ex
Cust Contract Start End
A 123 10/1/2021 11/3/2021
B 987 7/4/2022 8/12/2022
For each row, I want to generate a variable that tells me if it was active during a set range ex: 10/1/2021-12/31/2021.
When I import from the excel file, the 'Start' and 'End' Columns come in as datetime64[ns]
Code I have tried so far is this:
df.loc[df['Start'].dt.strftime('%Y-%m-%d')<='2021-10-31' & df['End'].dt.strftime('%Y-%m-%d')<='2021-10-1', 'Active Flag'] = 'Yes'
When I run this I get the following error
Cannot perform 'rand_' with a dtyped [object] array and scalar of type [bool]
I'm not really sure if I am even on the correct track for solving this, or if there is an easier way. Any help would be appreciated as Python's date time operations are very odd to me.
CodePudding user response:
Here is one approach:
# Convert to `datetime` first
df.Start = pd.to_datetime(df.Start, format="%m/%d/%Y")
df.End = pd.to_datetime(df.End, format="%m/%d/%Y")
target = pd.date_range('2021-10-1', '2021-12-31')
df['is_active'] = df.Start.isin(target) | df.End.isin(target)
Output:
Out[17]:
Cust Contract Start End is_active
0 A 123 2021-10-01 2021-11-03 True
1 B 987 2022-07-04 2022-08-12 False
CodePudding user response:
We can use simple date comparisons to solve this.
There could be two scenarios for these contracts :
- Where the contract is ONLY active within the range
start_range = pd.to_datetime('10/1/2021')
end_range = pd.to_datetime('31/12/2021')
# --- Start Date --- --- End Date ---
df['Active_Only_in_Range'] = (df.Start >= start_range) & (df.End <= end_range)
df['Active_Only_in_Range_Flag'] = np.where(df.Active_Only_in_Range,'Yes','No')
- Where the contract was PARTIALLY active within the range (hence it started in the range or ended in the range - hence it has been partially active in the provided range)
start_range = pd.to_datetime('10/1/2021')
end_range = pd.to_datetime('31/12/2021')
# ----------- Start Date ------------ ----------- End Date ------------
df['Partial_Active'] = ((df.Start >= start_range) & (df.Start <= end_range)) | ((df.End >= start_range) & (df.End <= end_range))
df['Partial_Active_Flag'] = np.where(df.Partial_Active,'Yes','No')
Thus, depending upon your problem you can find which one suits you.
