I would like to show the distribution of Income based on location and whether that user left or not. For this task which graph should I use. How can I show the distribution of numeric columns according to 2 other categorical columns?

CodePudding user response:
Another solution, suggested by @JohanC in the comment, is to use a violinplot, where on x axis you have different locations and on y axis the income, using the hue in order to distinguish users who left and the other for the ones who didn't (moreover violins are splitted by hue in two halves):
fig, ax = plt.subplots()
sns.violinplot(ax = ax, data = df, x = 'Location', y = 'Income', hue = 'Left', split = True)
plt.show()
If you are not allowed to use seaborn, you can achieve a similar result of the first example by using only matplotlib through a loop over different locations:
fig, ax = plt.subplots(1, 2, sharex = 'all', sharey = 'all', figsize = (8, 4))
for location in df['Location'].unique():
ax[0].hist(x = df[(df['Location'] == location) & (df['Left'] == 0)]['Income'], label = location, alpha = 0.7, edgecolor = 'black')
ax[1].hist(x = df[(df['Location'] == location) & (df['Left'] == 1)]['Income'], label = location, alpha = 0.7, edgecolor = 'black')
ax[0].set_title('Left = 0')
ax[1].set_title('Left = 1')
ax[0].set_xlabel('Income')
ax[1].set_xlabel('Income')
ax[0].set_ylabel('Count')
ax[1].legend(title = 'Location', loc = 'upper left', bbox_to_anchor = (1.05, 1))
plt.tight_layout()
plt.show()



