Home > Software engineering >  Pandas boxplot compare all data from column with filtered version of same column
Pandas boxplot compare all data from column with filtered version of same column

Time:01-06

This seem like it should be easy, but can not seem to get it working.

data = {'Name':['Tom', 'nick', 'krish', 'jack', 'Tom', 'nick', 'krish', 'jack'],
        'Age':[31, 46, 21, 37, 31, 46, 21, 37],
        'Times':[20, 21, 19, 18, 19, 20, 20, 19]}

df = pd.DataFrame(data)
df

# basic boxplot for 'Times'
df['Times'].plot(kind='box')

# Filtered version
filt = df['Name'] == 'Tom'
df.loc[filt, 'Times'].plot(kind='box')

# comparing two columns is easy but I want to compare the same column with different row filters.
df[['Times', 'Age']].plot(kind='box')

So how to I compare these two versions of the same column side by side? Many thanks

CodePudding user response:

You simply pass a list to plt.boxplot():

box = plt.boxplot([df['Times'], df[df['Name'] == 'Tom']['Times']],
                  labels=['all','Toms'])

enter image description here

CodePudding user response:

I compared Tom, Others, and All

data = {'Name':['Tom', 'nick', 'krish', 'jack', 'Tom', 'nick', 'krish', 'jack'],
        'Age':[31, 46, 21, 37, 31, 46, 21, 37],
        'Times':[20, 21, 19, 18, 19, 20, 20, 19]}

df = pd.DataFrame(data)

print(df)
df.boxplot(column='Times', by='Age')


grouped=df.groupby(['Name','Times']).any().unstack().reset_index().transpose()
df2=pd.DataFrame(grouped)
new_header = df2.iloc[0]
df2 = df2[1:]
df2.columns = new_header
df2.reset_index(inplace=True)
others=[x for x in df2.columns if x not in(['Tom','Times'])]
all=[x for x in df2.columns if x not in(['Times'])]
df2['Others']=df2[others].any(axis=1)
df2['All']=df2[all].any(axis=1)
print(df2.columns)
print(df2)
df2.boxplot(column='Times',by=['Others'])
df2.boxplot(column='Times',by=['Tom'])
df2.boxplot(column='Times',by=['All'])
plt.show()

CodePudding user response:

A similar approach with the accepted answer, no need to hardcode the names

import pandas as pd
import matplotlib.pyplot as plt


data = {'Name':['Tom', 'nick', 'krish', 'jack', 'Tom', 'nick', 'krish', 'jack'],
        'Age':[31, 46, 21, 37, 31, 46, 21, 37],
        'Times':[20, 21, 19, 18, 19, 20, 20, 19]}

df = pd.DataFrame(data)

df_list = [df["Times"]]
labels_list = ["all"]
# if you dont want all, just set them to empty list
#df_list = []
#labels_list = []

grouped_df = df.groupby("Name")

for name in grouped_df.groups.keys():
        labels_list.append(name)
        df_list.append(grouped_df.get_group(name)["Times"])

plt.boxplot(df_list, labels = labels_list)
plt.show()

for name in grouped_df.groups.keys():
        labels_list.append(name)
        df_list.append(grouped_df.get_group(name)["Times"])

plt.boxplot(df_list, labels = labels_list)
plt.show()

here is the result

enter image description here

  •  Tags:  
  • Related