I have a problem. I want to plot a pie chart. But unfortunately only three ids are readable. The other have a just to small part. Is there an option to summary, all the small ones and sum up then with the name remaining? Is there also an option to that automatically? Because I can say the limit is 100, 1000 etc. but is there an option to sum up automatically. I am using in my real dataframe df.value_counts()
Dataframe
id count
0 1 4521
1 2 1247
2 3 962
3 4 12
4 5 6
5 6 5
6 7 4
Code
import pandas as pd
import seaborn as sns
d = {'id': [1, 2, 3, 4, 5, 6, 7],
'count': [4521, 1247, 962, 12, 6, 5, 4],
}
df = pd.DataFrame(data=d)
print(df)
colors = sns.color_palette('GnBu_r')
plt.pie(df['count'],
labels = df['id'], colors = colors)
plt.show()
CodePudding user response:
You can combine rows in your data with a condition: if the 'percentage' is less than a threshold, then sum up those rows:
threshold = 0.1
df['percentage'] = df['count']/df['count'].sum()
remaining = df.loc[df['percentage'] < threshold].sum(axis = 0)
remaining.loc['id'] = 'remaining'
df = df[df['percentage'] >= threshold]
df = df.append(remaining, ignore_index = True)
df['count'] = df['count'].astype(int)
So you get:
id count percentage
0 1 4521 0.669084
1 2 1247 0.184549
2 3 962 0.142371
3 remaining 27 0.003996
Complete code
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
threshold = 0.1
d = {'id': [1, 2, 3, 4, 5, 6, 7],
'count': [4521, 1247, 962, 12, 6, 5, 4]}
df = pd.DataFrame(data = d)
df['percentage'] = df['count']/df['count'].sum()
remaining = df.loc[df['percentage'] < threshold].sum(axis = 0)
remaining.loc['id'] = 'remaining'
df = df[df['percentage'] >= threshold]
df = df.append(remaining, ignore_index = True)
df['count'] = df['count'].astype(int)
colors = sns.color_palette('GnBu_r')
plt.pie(df['count'],
labels = df['id'], colors = colors)
plt.show()


