noob question
I can't figure out how/if the object output from a pandas data frame .info() call can be sorted like a regular data frame.
example:
import pandas as pd
temp = pd.DataFrame(data={"x":[1, 2, 3, None, 4], "y":[5, 6, 7, None, None]})
temp.info(null_counts=True).sort_values(by="Non-Null Count")
results in:
AttributeError: 'NoneType' object has no attribute 'sort_values'
(context: I have a lot of columns and varying numbers of missing values I want to sort the columns by)
CodePudding user response:
Internally Pandas has a DataFrameInfo class that you can use to get at the info() data programatically. You can turn this into a DataFrame, which can then be sorted.
import pandas as pd
from pandas.io.formats.info import DataFrameInfo
temp = pd.DataFrame(data={"x":[1, 2, 3], "y":[4, 5, 6]})
info = DataFrameInfo(data=temp)
infodf = pd.DataFrame(
{'Column': info.ids,
'Non-Null Count':info.non_null_counts,
'Dtype':info.dtypes})
print(infodf)
Output:
Column Non-Null Count Dtype
x x 3 int64
y y 3 int64
CodePudding user response:
Sort your columns before info:
df[df.notna().sum().sort_values().index].info()
Demo
data = np.random.default_rng(2022).choice([np.nan, 1], (100, 26), p=(.3, .7))
df = pd.DataFrame(data, columns=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ'))
>>> df[df.notna().sum().sort_values().index].info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 26 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 B 61 non-null float64
1 U 63 non-null float64
2 H 63 non-null float64
3 Z 64 non-null float64
4 T 64 non-null float64
5 O 64 non-null float64
6 L 65 non-null float64
7 S 66 non-null float64
8 N 66 non-null float64
9 Y 66 non-null float64
10 K 66 non-null float64
11 P 67 non-null float64
12 A 67 non-null float64
13 I 67 non-null float64
14 D 67 non-null float64
15 W 68 non-null float64
16 M 68 non-null float64
17 R 69 non-null float64
18 J 70 non-null float64
19 F 71 non-null float64
20 G 72 non-null float64
21 Q 73 non-null float64
22 V 73 non-null float64
23 C 74 non-null float64
24 X 74 non-null float64
25 E 79 non-null float64
dtypes: float64(26)
memory usage: 20.4 KB
