My pandas data frame contains several columns, some of them have missing values which show up as a ? sign. I want to run a for loop to print how much ? there is in each columns of the data. I'm doing something like this:
colnames = ['col_1','col_2','col_3']
for i in colnames:
print(f'In the {i} feature, the value - ? - occurs {data.i.value_counts()["?"]} times')
The error I get is :
AttributeError: 'DataFrame' object has no attribute 'i'
So I think that problem is with this part - data.i.value_counts(), I tried data[i].value_counts() but that didn't work eaither..
CodePudding user response:
For count values avoid value_counts, because failed selecting ? if value not exist in column. Simplier is compare values by ? and count Trues by sum:
for i in colnames:
print(f'In the {i} feature, the value - ? - occurs {data[i].eq("?").sum()} times')
CodePudding user response:
Considering that the dataframe is data, if OP wants to use .value_counts(), adjust to the following
colnames = ['col1','col2','col3']
for i in colnames:
print(f'In the {i} feature, the value - ? - occurs {data[i].value_counts()["?"]} times')
Or, if one want to know for all columns of the dataframe data, use
for i in data.columns:
print(f'In the {i} feature, the value - ? - occurs {data[i].value_counts()["?"]} times')
If, on another hand one wants to prevent the KeyError (see first note), one can use .isin with .sum() as follows
for i in colnames:
print(f'In the {i} feature, the value - ? - occurs {data[i].isin(["?"]).sum()} times')
Notes:
- If a specific column doesn't have
?, one will get aKeyError: '?', so it might be more convenient to select the columns that have?, instead of applying to all the dataframe columns.
