Home > OS >  Add columns for missing dates
Add columns for missing dates

Time:01-26

I have a dataframe where the column labels are dates for a specific period like this:

30.okt  2.nov   3.nov   4.nov
-------------------------------
123   |  12  |   23   |  43   |

As you can see, the weekends are not included in the dataframe. I need to add columns for the missing days of the weekend and replicate the data from the previous column. My desired output should look like this:

30.okt   31.okt   1.nov   2.nov   3.nov  4.nov
--------------------------------------------------
123    |  123   |  123  |  12   |   23   |  43   |

I have tried several approaches with pandas, but cannot seem to get it right. Does anyone have any tips to how to solve this easily? Thank you for any help.

CodePudding user response:

Setup the base dataframe with missing weekend.

import pandas as pd


data = {'10-29-2020': {'a':10},
        '11-01-2020': {'a':16},
        '11-02-2020': {'a':18}}

df = pd.DataFrame(data)

print(df)

df
Out[2]: 
   10-29-2020  11-01-2020  11-02-2020
a          10          16          18

Transpose to make the dates on the index to later use reindex.

df = df.T

df.index = pd.to_datetime(df.index)
date_range = pd.date_range(df.index[0], df.index[-1], freq='1d')
date_range

Out[5]: 
DatetimeIndex(['2020-10-29', '2020-10-30', '2020-10-31', '2020-11-01',
               '2020-11-02'],
              dtype='datetime64[ns]', freq='D')

Reindex the index with date_range and you will get NaN values on the missing dates, so use ffill() (forward fill to fill them up).

df = df.reindex(date_range)

df = df.ffill()

df
Out[7]: 
               a
2020-10-29  10.0
2020-10-30  10.0
2020-10-31  10.0
2020-11-01  16.0
2020-11-02  18.0

And transpose.

df.T
Out[8]: 
   2020-10-29  2020-10-30  2020-10-31  2020-11-01  2020-11-02
a        10.0        10.0        10.0        16.0        18.0

CodePudding user response:

Try:

import pandas as pd
import locale

df = pd.DataFrame({'30.okt': [123], '2.nov': [12], '3.nov': [23], '4.nov': [43]})

# The pain...
locale.setlocale(locale.LC_TIME, 'de_DE.UTF-8')  # Linux
locale.setlocale(locale.LC_TIME, 'German')  # Windows

dti = pd.to_datetime(df.columns, format='%d.%b')
dti = pd.date_range(dti.min(), dti.max(), freq='D')
df = df.reindex(columns=dti.strftime('%-d.%b').str.lower()).ffill(axis=1)

Output:

>>> df
   30.okt  31.okt  1.nov  2.nov  3.nov  4.nov
0   123.0   123.0  123.0   12.0   23.0   43.0

CodePudding user response:

You could do something like this. Its a lot to do because the date given isn't in a format that Datetime likes. This reads the data into a Series. Converts the given date. Re does the columns/indices. Fills the missing values. Converts the date back to the desired format.

import pandas as pd

switcher1 = {
    'jan': 1,
    'feb': 2,
    'mar': 3,
    'apr': 4,
    'mai': 5,
    'jun': 6,
    'jul': 7,
    'aug': 8,
    'sep': 9,
    'okt': 10,
    'nov': 11,
    'dez': 12
}

switcher2 = {
    1: 'jan',
    2: 'feb',
    3: 'mar',
    4: 'apr',
    5: 'mar',
    6: 'jun',
    7: 'jul',
    8: 'aug',
    9: 'sep',
    10: 'okt',
    11: 'nov',
    12: 'dez'
}

def default():
    return "Incorrect day"

def datetime(x):
    date,month = x.split('.')
    month = switcher1.get(month,default())
    return str(month) '-' str(date) '-2022'

def datetime_rev(x):
    x = x.strftime('%m-%d-%Y')
    month,date,year = x.split('-')
    month = int(month)
    month = switcher2.get(month,default())
    return str(date) '.' str(month)

s = pd.Series({'30.okt': 123,
               '2.nov': 12,
               '3.nov': 13,
               '4.nov': 43})
s.index = s.index.to_series().apply(datetime)
idx = pd.date_range(s.index[0],s.index[-1])
s.index = pd.DatetimeIndex(s.index)
s = s.reindex(idx).ffill()
s.index = s.index.to_series().apply(datetime_rev)
print(s)
  •  Tags:  
  • Related