Home > Back-end >  pandas find which half year a date belongs to
pandas find which half year a date belongs to

Time:01-18

I have the following pandas data frame with a date column. How can I add a column specifying which half year the date belongs to?

enter image description here

CodePudding user response:

Convert dates to datetimes and then use numpy.where with compare for less or equal:

df['date'] = pd.to_datetime(df['date'], dayfirst=True)

df['half year'] = np.where(df['date'].dt.month.le(6), 'H1', 'H2')
print (df)
        date half year
0 1993-09-09        H2
1 1993-09-11        H2
2 1994-01-23        H1
3 1993-03-18        H1

Solution without numpy with change mask for greater like 6, add 1 and convert to strings:

df['date'] = pd.to_datetime(df['date'], dayfirst=True)

df['half year'] = 'H'   df['date'].dt.month.gt(6).add(1).astype(str)
print (df)
        date half year
0 1993-09-09        H2
1 1993-09-11        H2
2 1994-01-23        H1
3 1993-03-18        H1

CodePudding user response:

Try:

df['half year'] = 'H'   pd.to_datetime(df['date']).dt.month.floordiv(6).add(1).astype(str)
print(df)

# Output
         date half year
0  09-09-1993        H2
1  18-03-1993        H1

CodePudding user response:

This solution uses .apply method.

>>> import pandas as pd
>>>
>>> df = pd.DataFrame({'date': ['09-09-1993', '11-09-1993', '23-01-1994', '18-03-1993']})
>>>
>>> df['date'] = pd.to_datetime(df['date'], format='%d-%m-%Y')
>>> df
        date
0 1993-09-09
1 1993-09-11
2 1994-01-23
3 1993-03-18
>>>
>>> df['half year'] = df.date.dt.month.apply(lambda x: "H1" if x in range(0, 7) else "H2")
>>> df
        date half year
0 1993-09-09        H2
1 1993-09-11        H2
2 1994-01-23        H1
3 1993-03-18        H1

CodePudding user response:

Without numpy, a rather general solution that could easily be modified to get semesters as 'first' or 'second':

  1. build a pandas.DataFrame (not in the provided code)

(Code starts from here)*

  1. convert the 'date' column to datetime
  2. extract month as a pandas.Series from df['date'] using pandas.Series.dt.month
  3. integer divide by 6 to get the semesters as a pandas.Series of int (counting from 0)
  4. get the semester strings as a pandas.Series using pandas.Series.map with the semesters ints as indices on a semester names list
  5. bundling (steps 2. to 4.) with pandas.Categorical to get categorical column (lower space, faster processing), this step is optional
>>> df['date'] = pd.to_datetime(df['date'])
>>> df['half year'] = pd.Categorical((df['date'].dt.month // 6).map(lambda h:['H1','H2'][h]))
>>> df
        date half year
0 1993-09-09        H2
1 1993-09-11        H2
2 1994-01-23        H1
3 1993-03-18        H1
>>> df.dtypes
date         datetime64[ns]
half year          category
dtype: object


  •  Tags:  
  • Related