Better way to plot Gender count using Python-CodePudding

I am making a graph to plot Gender count for the time series data that look like following data. Each row represent hourly data of each respective patient.

HR	SBP	DBP	Sepsis	Gender	P_ID
92	120	80	0	0	0
98	115	85	0	0	0
93	125	75	1	1	1
95	130	90	1	1	1
102	120	80	0	0	2
109	115	75	0	0	2
94	135	100	0	0	2
97	100	70	1	1	3
85	120	80	1	1	3
88	115	75	1	1	3
93	125	85	1	1	3
78	130	90	1	0	4
115	140	110	1	0	4
102	120	80	0	1	5
98	140	110	0	1	5

This is my code:

gender = df_n['Gender'].value_counts()
plt.figure(figsize=(7, 6))
ax = gender.plot(kind='bar', rot=0, color="c")
ax.set_title("Bar Graph of Gender", y = 1)
ax.set_xlabel('Gender')
ax.set_ylabel('Number of People')
ax.set_xticklabels(('Male', 'Female'))

for rect in ax.patches:
    y_value = rect.get_height()
    x_value = rect.get_x()   rect.get_width() / 2
    space = 1
    label = format(y_value)
    ax.annotate(label, (x_value, y_value), xytext=(0, space), textcoords="offset points", ha='center', va='bottom')    
plt.show()

Now what is happening is the code is calculating total number of instances (0: Male, 1: Female) and plotting it. But I want to plot the total males and females, not the total number of 0s and 1s, as the Same patient is having multiple rows of data (as per P_ID). Like how many patients are male and how many are female?

Can someone help me out? I guess maybe sns.countplot can be used. But I don't know how.

Thanks for helping me out >.<

__________ Udpate ________________

How I can group those Genders that are sepsis (1) or no sepsis (0)?

CodePudding user response：

I have a working example for selecting the unique IDS, it looks ugly so there is probably a better way, but it works...

import pandas as pd
# example of data:
data = {'gender': [0, 0, 1, 1, 1, 1, 0, 0], 'id': [1, 1, 2, 2, 3, 3, 4, 4]}
df = pd.DataFrame(data)
# get all unique ids:
ids = set(df.id)
# Go over all id, get first element of gender:
g = [list(df[df['id'] == i]['gender'])[0] for i in ids]

# count genders, laze way using pandas since the rest of the code also assumes a dataframe for plotting:
gender_counts = pd.DataFrame(g).value_counts()
# from here you can use your plot function.

# Or Counter
from collections import Counter
gender_counts = Counter(g)
# You have to create another method for plotting the gender.

CodePudding user response：

You can group by 'P_ID' and take the first row for each of them (supposing a 'P_ID' has only one gender and only one sepsis). Then you can call sns.countplot on that dataframe, using gender for x and sepsis for hue (or vice versa). You can rename the values in the columns to show their names in the legend and in the tick labels.

import matplotlib.pyplot as plt
import seaborn as sns
from io import StringIO

data_str = '''
HR|SBP|DBP|Sepsis|Gender|P_ID
92|120|80|0|0|0
98|115|85|0|0|0
93|125|75|1|1|1
95|130|90|1|1|1
102|120|80|0|0|2
109|115|75|0|0|2
94|135|100|0|0|2
97|100|70|1|1|3
85|120|80|1|1|3
88|115|75|1|1|3
93|125|85|1|1|3
78|130|90|1|0|4
115|140|110|1|0|4
102|120|80|0|1|5
98|140|110|0|1|5
'''
df_per_PID = df.groupby('P_ID')[['Sepsis', 'Gender']].first()
df_per_PID = df_per_PID.replace({'Gender': {0: 'Male', 1: 'Female'}, 'Sepsis': {0: 'No sepsis', 1: 'Sepsis'}})
ax = sns.countplot(data=df_per_PID, x='Gender', hue='Sepsis', palette='rocket')
ax.legend(title='')  # remove title, as it is clear from the legend items
ax.set_xlabel('')
for bars in ax.containers:
    ax.bar_label(bars)
# ax.margins(y=0.1)  # make some extra space for the labels
ax.locator_params(axis='y', integer=True)
sns.despine()
plt.show()

HR	SBP	DBP	Sepsis	Gender	P_ID
92	120	80	0	0	0
98	115	85	0	0	0
93	125	75	1	1	1
95	130	90	1	1	1
102	120	80	0	0	2
109	115	75	0	0	2
94	135	100	0	0	2
97	100	70	1	1	3
85	120	80	1	1	3
88	115	75	1	1	3
93	125	85	1	1	3
78	130	90	1	0	4
115	140	110	1	0	4
102	120	80	0	1	5
98	140	110	0	1	5

HR	SBP	DBP	Sepsis	Gender	P_ID
92	120	80	0	0	0
98	115	85	0	0	0
93	125	75	1	1	1
95	130	90	1	1	1
102	120	80	0	0	2
109	115	75	0	0	2
94	135	100	0	0	2
97	100	70	1	1	3
85	120	80	1	1	3
88	115	75	1	1	3
93	125	85	1	1	3
78	130	90	1	0	4
115	140	110	1	0	4
102	120	80	0	1	5
98	140	110	0	1	5

HR	SBP	DBP	Sepsis	Gender	P_ID
92	120	80	0	0	0
98	115	85	0	0	0
93	125	75	1	1	1
95	130	90	1	1	1
102	120	80	0	0	2
109	115	75	0	0	2
94	135	100	0	0	2
97	100	70	1	1	3
85	120	80	1	1	3
88	115	75	1	1	3
93	125	85	1	1	3
78	130	90	1	0	4
115	140	110	1	0	4
102	120	80	0	1	5
98	140	110	0	1	5