I am making a graph to plot Gender count for the time series data that look like following data. Each row represent hourly data of each respective patient.
| HR | SBP | DBP | Sepsis | Gender | P_ID |
|---|---|---|---|---|---|
| 92 | 120 | 80 | 0 | 0 | 0 |
| 98 | 115 | 85 | 0 | 0 | 0 |
| 93 | 125 | 75 | 1 | 1 | 1 |
| 95 | 130 | 90 | 1 | 1 | 1 |
| 102 | 120 | 80 | 0 | 0 | 2 |
| 109 | 115 | 75 | 0 | 0 | 2 |
| 94 | 135 | 100 | 0 | 0 | 2 |
| 97 | 100 | 70 | 1 | 1 | 3 |
| 85 | 120 | 80 | 1 | 1 | 3 |
| 88 | 115 | 75 | 1 | 1 | 3 |
| 93 | 125 | 85 | 1 | 1 | 3 |
| 78 | 130 | 90 | 1 | 0 | 4 |
| 115 | 140 | 110 | 1 | 0 | 4 |
| 102 | 120 | 80 | 0 | 1 | 5 |
| 98 | 140 | 110 | 0 | 1 | 5 |
This is my code:
gender = df_n['Gender'].value_counts()
plt.figure(figsize=(7, 6))
ax = gender.plot(kind='bar', rot=0, color="c")
ax.set_title("Bar Graph of Gender", y = 1)
ax.set_xlabel('Gender')
ax.set_ylabel('Number of People')
ax.set_xticklabels(('Male', 'Female'))
for rect in ax.patches:
y_value = rect.get_height()
x_value = rect.get_x() rect.get_width() / 2
space = 1
label = format(y_value)
ax.annotate(label, (x_value, y_value), xytext=(0, space), textcoords="offset points", ha='center', va='bottom')
plt.show()
Now what is happening is the code is calculating total number of instances (0: Male, 1: Female) and plotting it. But I want to plot the total males and females, not the total number of 0s and 1s, as the Same patient is having multiple rows of data (as per P_ID). Like how many patients are male and how many are female?
Can someone help me out? I guess maybe sns.countplot can be used. But I don't know how.
Thanks for helping me out >.<
__________ Udpate ________________
How I can group those Genders that are sepsis (1) or no sepsis (0)?
CodePudding user response:
I have a working example for selecting the unique IDS, it looks ugly so there is probably a better way, but it works...
import pandas as pd
# example of data:
data = {'gender': [0, 0, 1, 1, 1, 1, 0, 0], 'id': [1, 1, 2, 2, 3, 3, 4, 4]}
df = pd.DataFrame(data)
# get all unique ids:
ids = set(df.id)
# Go over all id, get first element of gender:
g = [list(df[df['id'] == i]['gender'])[0] for i in ids]
# count genders, laze way using pandas since the rest of the code also assumes a dataframe for plotting:
gender_counts = pd.DataFrame(g).value_counts()
# from here you can use your plot function.
# Or Counter
from collections import Counter
gender_counts = Counter(g)
# You have to create another method for plotting the gender.
CodePudding user response:
You can group by 'P_ID' and take the first row for each of them (supposing a 'P_ID' has only one gender and only one sepsis). Then you can call sns.countplot on that dataframe, using gender for x and sepsis for hue (or vice versa). You can rename the values in the columns to show their names in the legend and in the tick labels.
import matplotlib.pyplot as plt
import seaborn as sns
from io import StringIO
data_str = '''
HR|SBP|DBP|Sepsis|Gender|P_ID
92|120|80|0|0|0
98|115|85|0|0|0
93|125|75|1|1|1
95|130|90|1|1|1
102|120|80|0|0|2
109|115|75|0|0|2
94|135|100|0|0|2
97|100|70|1|1|3
85|120|80|1|1|3
88|115|75|1|1|3
93|125|85|1|1|3
78|130|90|1|0|4
115|140|110|1|0|4
102|120|80|0|1|5
98|140|110|0|1|5
'''
df_per_PID = df.groupby('P_ID')[['Sepsis', 'Gender']].first()
df_per_PID = df_per_PID.replace({'Gender': {0: 'Male', 1: 'Female'}, 'Sepsis': {0: 'No sepsis', 1: 'Sepsis'}})
ax = sns.countplot(data=df_per_PID, x='Gender', hue='Sepsis', palette='rocket')
ax.legend(title='') # remove title, as it is clear from the legend items
ax.set_xlabel('')
for bars in ax.containers:
ax.bar_label(bars)
# ax.margins(y=0.1) # make some extra space for the labels
ax.locator_params(axis='y', integer=True)
sns.despine()
plt.show()


