Home > Back-end >  how can I check a column value=NAN for a datetime row value being the first of the month?
how can I check a column value=NAN for a datetime row value being the first of the month?

Time:01-11

That was the clearest way I could of asked the question I do apologize. I have monthly data like this, with only the first of the month having a data point

city   time        value
London 2000-01-01  5
London 2000-01-02  nan
London 2000-01-03  nan
..
London 2000-01-31  nan
London 2000-02-01  nan
London 2000-02-02  nan
London 2000-02-01  nan
...
London 2000-02-31  nan
London 2000-03-01  3
London 2000-01-01  nan
..

I basically want to do this following statement in pandas form:

If value = nan for timestamps with day = 1, replace that first of the month value with -1. I am struggling with the python sub sectioning notation using a condition as a mask.

So from above I want my data to then look like

city   time        value
London 2000-01-01  5
London 2000-01-02  nan
London 2000-01-03  nan
..
London 2000-01-31  nan
London 2000-02-01  -1
London 2000-02-02  nan
London 2000-02-01  nan
...
London 2000-02-31  nan
London 2000-03-01  3
London 2000-01-01  nan
..

but it obviously continues and there are thousands of rows.

edit-

Below is what I am starting to attempt: So I saw online that I can make a condition and then use df.loc(that condition) to subsection the data so something like

mask = (df.time.dt.day==1)

So I believe this subsections the times for day=1 but I am not sure how to proceed.

CodePudding user response:

Use numpy.where with pd.to_datetime, Series.eq and Series.isna:

In [503]: import numpy as np

# Convert 'time' column into pandas datetime
In [499]: df['time'] = pd.to_datetime(df['time'], format='%Y-%m-%d')

In [504]: df['value'] = np.where(df['time'].dt.day.eq(1) & df['value'].isna(), -1, df['value'])

In [505]: df
Out[505]: 
     city       time  value
0  London 2000-01-01    5.0
1  London 2000-01-02    NaN
2  London 2000-01-03    NaN
3  London 2000-01-31    NaN
4  London 2000-02-01   -1.0
5  London 2000-02-02    NaN
6  London 2000-02-01   -1.0
7  London 2000-03-01    3.0
8  London 2000-01-01   -1.0

OR use df.loc:

In [499]: df['time'] = pd.to_datetime(df['time'], format='%Y-%m-%d')
In [510]: df.loc[df['time'].dt.day.eq(1) & df['value'].isna(), 'value'] = -1
  •  Tags:  
  • Related