Home > Enterprise >  Creating a column in a pandas dataframe conditional on dates
Creating a column in a pandas dataframe conditional on dates

Time:02-05

I currently have a dataframe of customers, contracts, and contract dates like this ex

Cust  Contract  Start         End                  
   A   123      10/1/2021     11/3/2021  
   B   987      7/4/2022      8/12/2022

For each row, I want to generate a variable that tells me if it was active during a set range ex: 10/1/2021-12/31/2021.

When I import from the excel file, the 'Start' and 'End' Columns come in as datetime64[ns]

Code I have tried so far is this:

df.loc[df['Start'].dt.strftime('%Y-%m-%d')<='2021-10-31' & df['End'].dt.strftime('%Y-%m-%d')<='2021-10-1', 'Active Flag'] = 'Yes'

When I run this I get the following error

Cannot perform 'rand_' with a dtyped [object] array and scalar of type [bool]

I'm not really sure if I am even on the correct track for solving this, or if there is an easier way. Any help would be appreciated as Python's date time operations are very odd to me.

CodePudding user response:

Here is one approach:

# Convert to `datetime` first
df.Start = pd.to_datetime(df.Start, format="%m/%d/%Y") 
df.End = pd.to_datetime(df.End, format="%m/%d/%Y")

target = pd.date_range('2021-10-1', '2021-12-31')
df['is_active'] = df.Start.isin(target) | df.End.isin(target)

Output:

Out[17]: 
  Cust  Contract      Start        End  is_active
0    A       123 2021-10-01 2021-11-03       True
1    B       987 2022-07-04 2022-08-12      False

CodePudding user response:

We can use simple date comparisons to solve this.

There could be two scenarios for these contracts :

  1. Where the contract is ONLY active within the range
start_range = pd.to_datetime('10/1/2021') 
end_range = pd.to_datetime('31/12/2021')
#                                 --- Start Date ---         --- End Date --- 
df['Active_Only_in_Range'] = (df.Start >= start_range) & (df.End <= end_range) 
df['Active_Only_in_Range_Flag'] = np.where(df.Active_Only_in_Range,'Yes','No')
  1. Where the contract was PARTIALLY active within the range (hence it started in the range or ended in the range - hence it has been partially active in the provided range)
start_range = pd.to_datetime('10/1/2021')
end_range = pd.to_datetime('31/12/2021')
#                       ----------- Start Date ------------                     ----------- End Date ------------
df['Partial_Active'] = ((df.Start >= start_range) & (df.Start <= end_range)) | ((df.End >= start_range) & (df.End <= end_range))
df['Partial_Active_Flag'] = np.where(df.Partial_Active,'Yes','No')

Thus, depending upon your problem you can find which one suits you.

  •  Tags:  
  • Related