Home > Back-end >  Better way to plot Gender count using Python
Better way to plot Gender count using Python

Time:01-13

I am making a graph to plot Gender count for the time series data that look like following data. Each row represent hourly data of each respective patient.

HR SBP DBP Sepsis Gender P_ID
92 120 80 0 0 0
98 115 85 0 0 0
93 125 75 1 1 1
95 130 90 1 1 1
102 120 80 0 0 2
109 115 75 0 0 2
94 135 100 0 0 2
97 100 70 1 1 3
85 120 80 1 1 3
88 115 75 1 1 3
93 125 85 1 1 3
78 130 90 1 0 4
115 140 110 1 0 4
102 120 80 0 1 5
98 140 110 0 1 5

This is my code:

gender = df_n['Gender'].value_counts()
plt.figure(figsize=(7, 6))
ax = gender.plot(kind='bar', rot=0, color="c")
ax.set_title("Bar Graph of Gender", y = 1)
ax.set_xlabel('Gender')
ax.set_ylabel('Number of People')
ax.set_xticklabels(('Male', 'Female'))

for rect in ax.patches:
    y_value = rect.get_height()
    x_value = rect.get_x()   rect.get_width() / 2
    space = 1
    label = format(y_value)
    ax.annotate(label, (x_value, y_value), xytext=(0, space), textcoords="offset points", ha='center', va='bottom')    
plt.show()

Graph

Now what is happening is the code is calculating total number of instances (0: Male, 1: Female) and plotting it. But I want to plot the total males and females, not the total number of 0s and 1s, as the Same patient is having multiple rows of data (as per P_ID). Like how many patients are male and how many are female?

Can someone help me out? I guess maybe sns.countplot can be used. But I don't know how.

Thanks for helping me out >.<

__________ Udpate ________________

How I can group those Genders that are sepsis (1) or no sepsis (0)?

CodePudding user response:

I have a working example for selecting the unique IDS, it looks ugly so there is probably a better way, but it works...

import pandas as pd
# example of data:
data = {'gender': [0, 0, 1, 1, 1, 1, 0, 0], 'id': [1, 1, 2, 2, 3, 3, 4, 4]}
df = pd.DataFrame(data)
# get all unique ids:
ids = set(df.id)
# Go over all id, get first element of gender:
g = [list(df[df['id'] == i]['gender'])[0] for i in ids]

# count genders, laze way using pandas since the rest of the code also assumes a dataframe for plotting:
gender_counts = pd.DataFrame(g).value_counts()
# from here you can use your plot function.

# Or Counter
from collections import Counter
gender_counts = Counter(g)
# You have to create another method for plotting the gender.

CodePudding user response:

You can group by 'P_ID' and take the first row for each of them (supposing a 'P_ID' has only one gender and only one sepsis). Then you can call sns.countplot on that dataframe, using gender for x and sepsis for hue (or vice versa). You can rename the values in the columns to show their names in the legend and in the tick labels.

import matplotlib.pyplot as plt
import seaborn as sns
from io import StringIO

data_str = '''
HR|SBP|DBP|Sepsis|Gender|P_ID
92|120|80|0|0|0
98|115|85|0|0|0
93|125|75|1|1|1
95|130|90|1|1|1
102|120|80|0|0|2
109|115|75|0|0|2
94|135|100|0|0|2
97|100|70|1|1|3
85|120|80|1|1|3
88|115|75|1|1|3
93|125|85|1|1|3
78|130|90|1|0|4
115|140|110|1|0|4
102|120|80|0|1|5
98|140|110|0|1|5
'''
df_per_PID = df.groupby('P_ID')[['Sepsis', 'Gender']].first()
df_per_PID = df_per_PID.replace({'Gender': {0: 'Male', 1: 'Female'}, 'Sepsis': {0: 'No sepsis', 1: 'Sepsis'}})
ax = sns.countplot(data=df_per_PID, x='Gender', hue='Sepsis', palette='rocket')
ax.legend(title='')  # remove title, as it is clear from the legend items
ax.set_xlabel('')
for bars in ax.containers:
    ax.bar_label(bars)
# ax.margins(y=0.1)  # make some extra space for the labels
ax.locator_params(axis='y', integer=True)
sns.despine()
plt.show()

sns.countplot from aggregated dataframe

  •  Tags:  
  • Related