above25percentile=df.loc[df["order_amount"]>np.percentile(df["order_amount"],25)]
below75percentile=df.loc[df["order_amount"]<np.percentile(df["order_amount"],75)]
interquartile=above25percentile & below75percentile
print(interquartile.mean())
can't seem to get the mean here. any thoughts?
CodePudding user response:
You attempt to compute interquartile as a boolean mask based on the & operator, but its components are Series containing values from the ranges. While the two series are likely to be similar sizes, & will not give you an intersection of their indices. If they were boolean masks, in your subsequent usage, you'd be taking the mean of a bunch of zeros and ones, which is going to be 0.5 (the ratio of data that falls within the IQR as a matter of fact).
First, compute interquartile as a proper mask. Pandas has its own quantile method, which, like np.percentile and siblings, accepts multiple percentiles simultaneously. You can combine that with between to get your mask more efficiently:
interquartile = df['order_amount'].between(*df['order_amount'].quantile([0.25, 0.75]))
You can apply the mask to the column and take the mean like this:
df.loc[interquartile, 'order_amount'].mean()
CodePudding user response:
Try:
above25percentile = df["order_amount"]>np.percentile(df['order_amount'],25)
below75percentile = df['order_amount']<np.percentile(df['order_amount'],75)
print(df.loc[above25percentile & below75percentile, 'order_amount'].mean())
Or you can use between:
df.loc[df['order_amount'].between(np.percentile(df['order_amount'], 25),
np.percentile(df['order_amount'], 75),
inclusive='neither'), 'order_amount'].mean()
Suppose the following dataframe:
df = pd.DataFrame({'order_amount': range(0, 10)})
print(df)
# Output
order_amount
0 0 # Excluded
1 1 # "
2 2 # "
3 3
4 4 # mean <- (3 4 5 6) / 4 = 4.5
5 5
6 6
7 7 # Excluded
8 8 # "
9 9 # "
Output:
>>> df.loc[df['order_amount'].between(np.percentile(df['order_amount'], 25),
np.percentile(df['order_amount'], 75),
inclusive='neither'), 'order_amount'].mean()
4.5
