Home > Back-end >  Distribution Graph
Distribution Graph

Time:01-16

I would like to show the distribution of Income based on location and whether that user left or not. For this task which graph should I use. How can I show the distribution of numeric columns according to 2 other categorical columns? enter image description here

CodePudding user response:

You can use enter image description here


Another solution, suggested by @JohanC in the comment, is to use a violinplot, where on x axis you have different locations and on y axis the income, using the hue in order to distinguish users who left and the other for the ones who didn't (moreover violins are splitted by hue in two halves):

fig, ax = plt.subplots()

sns.violinplot(ax = ax, data = df, x = 'Location', y = 'Income', hue = 'Left', split = True)

plt.show()

enter image description here


If you are not allowed to use seaborn, you can achieve a similar result of the first example by using only matplotlib through a loop over different locations:

fig, ax = plt.subplots(1, 2, sharex = 'all', sharey = 'all', figsize = (8, 4))

for location in df['Location'].unique():
    ax[0].hist(x = df[(df['Location'] == location) & (df['Left'] == 0)]['Income'], label = location, alpha = 0.7, edgecolor = 'black')
    ax[1].hist(x = df[(df['Location'] == location) & (df['Left'] == 1)]['Income'], label = location, alpha = 0.7, edgecolor = 'black')

ax[0].set_title('Left = 0')
ax[1].set_title('Left = 1')
ax[0].set_xlabel('Income')
ax[1].set_xlabel('Income')
ax[0].set_ylabel('Count')
ax[1].legend(title = 'Location', loc = 'upper left', bbox_to_anchor = (1.05, 1))

plt.tight_layout()

plt.show()

enter image description here

  •  Tags:  
  • Related