when I use the hist() from Pandas it produces a series of histograms for all the features in the dataset. I want to know how to extract/select/reference only one of the histograms returned by hist()?
For example, let'say I have the following code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({'X' : np.random.rand(100), 'Y': np.random.rand(100)})
df
df.hist()
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x00000150DAC658C8>,
<matplotlib.axes._subplots.AxesSubplot object at 0x00000150DB29AD48>]],
dtype=object)
I have tried slicing the array of matplotlib axes returned by the hist() method using [] (i.e. df.hist()[0]), but it does not extract only one plot but the two of them.
CodePudding user response:
I believe you can pass in a column name to hist() in order to select one of the histograms.
df.hist(column = column_name)
CodePudding user response:
I want to know how to extract:
df['X'].hist()
I want to know how to reference:
The ndarray of subplots returned is two-dimensional here, so:
axs = df.hist()
axs[0, 0].set_xlim(0, 2) # modify leftmost subplot
or .flatten() first:
axs.flatten()[0].set_xlim(0, 2)
CodePudding user response:
Try this:
h = plt.hist(df)
h
(array([[15., 13., 11., 9., 2., 15., 10., 4., 9., 12.],
[ 6., 14., 10., 9., 9., 13., 11., 7., 8., 13.]]),
array([0.01212735, 0.11088648, 0.20964561, 0.30840474, 0.40716387,
0.50592301, 0.60468214, 0.70344127, 0.8022004 , 0.90095953,
0.99971866]),
<a list of 2 BarContainer objects>)
counts, bins, ax = h
counts
array([[15., 13., 11., 9., 2., 15., 10., 4., 9., 12.],
[ 6., 14., 10., 9., 9., 13., 11., 7., 8., 13.]])
bins
array([0.01212735, 0.11088648, 0.20964561, 0.30840474, 0.40716387,
0.50592301, 0.60468214, 0.70344127, 0.8022004 , 0.90095953,
0.99971866])
ax
<a list of 2 BarContainer objects>
Slice counts and bins as needed.

