I have a data set like this:
dfdict = {
'year' : [2021, 2021, 2021, 2021, 2021, 2022, 2022, 2022, 2022, 2022],
'value' : [1,2,3,4,5,6,7,8,9,10]
}
df = pd.DataFrame(dfdict)
I also have a dictionary whose keys are years and values are the limit values of each year I want to apply a condition:
limitdict = {
'2021' : [2, 4],
'2022' : [7, 8]
}
How can I show the rows of df whose values for each year are either smaller than the lower limit or larger than the upper limit of the limitdict? The result will look like:
year value
0 2021 1
4 2021 5
5 2022 6
8 2022 9
9 2022 10
CodePudding user response:
I suggest splitting the dataframe by year and then using between to filter out values in the range specified in the limitdict. Note that I am using the ~ symbol to filter out values within the range specified in the limitdic: df_year[~df_year.value.between(limitdict[str(year)][0],limitdict[str(year)][1])].
list_of_dataframes = []
for year in df.year.unique():
df_year = df[df.year == year]
list_of_dataframes.append(df_year[~df_year.value.between(limitdict[str(year)][0],limitdict[str(year)][1])])
output_df = pd.concat(list_of_dataframes)
This returns:
year value
0 2021 1
4 2021 5
5 2022 6
8 2022 9
9 2022 10
CodePudding user response:
Another possible solution:
# astype is needed because your dictionary keys are strings
year = df['year'].astype('str')
df[(
df['value'].lt([limitdict[x][0] for x in year]) |
df['value'].gt([limitdict[x][1] for x in year])
)]
Or:
year = df['year'].astype('str')
z1, z2 = zip(*[limitdict[x] for x in year])
df[(df['value'].lt(z1) | df['value'].gt(z2))]
Output:
year value
0 2021 1
4 2021 5
5 2022 6
8 2022 9
9 2022 10
