Home > Back-end >  Inconsistency between pandas.DataFrame.plot(kind='box') and pandas.DataFrame.boxplot()
Inconsistency between pandas.DataFrame.plot(kind='box') and pandas.DataFrame.boxplot()

Time:01-18

I have encountered the following problem when trying to make a boxplot of one column in a pandas.DataFrame vs another one. Here is the code:

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(60))
df.columns = ['Values']
names = ('one','two','three')*int(df.shape[0]/3)
df['Names'] = names

df.plot(x='Names', y='Values', kind='box')
df.boxplot(column='Values', by='Names')

I expect two plot to be the same, but I get:

plots

Is it an expected behavior and if so, how the expression for the first plot should be changed to match the second one?

CodePudding user response:

.boxplot() and .plot(kind='box')/.plot.box() are separate implementations. Problem with .plot(kind='box')/.plot.box() is that although the argument by exists, it is not implemented and therefore ignored (see this issue for example, and they never managed to document it properly), meaning that you won't be able to reproduce the result you get with .boxplot().

Tl;dr .plot(kind='box')/.plot.box() implemented poorly, use .boxplot() instead.

  •  Tags:  
  • Related