I have a dataset:
id1 id2 id3 id4
a1 a2 NaN NaN
a1 a2 a5 NaN
b1 b2 b3 b4
b1 NaN NaN NaN
c1 c2 NaN NaN
I want to calculate maximal number of columns with not NaN for each id1. So desired results is:
id1 depth
a1 3
b1 4
c1 2
How to do that?
CodePudding user response:
I created some mock data and stored it in df. In order to make the depth count for one column we do the following.
import pandas as pd
import numpy as np
# create a sample dataframe
columns = ['A', 'B', 'C', 'D']
data = {
'A': ['E1', 'E2', 'E3', 'E4', 'E5'],
'B': ['Male', 'Female', 'Female', np.nan, 'Male'],
'C': [27, 24, 29, np.nan, 25],
'D': ['Accounting', np.nan, 'Accounting', np.nan, 'Sales']
}
df = pd.DataFrame(data)
ncol = 4
nrow = 5
# for just the first column
print('A', '\t', 'depth')
for i in range(nrow):
count = 0
for col in columns:
if not isinstance(df[col][i], str) and np.isnan(np.asarray([df[col][i]]))[0]:
break
count = 1
print(df['A'][i], '\t', count)
- For each row
- Start a counter at 0
- For each element in that row
- Add 1 to the counter if it is not a nan. If it is a nan, we want to break out of the loop (stop counting).
Output:
A depth
E1 4
E2 3
E3 4
E4 1
E5 4
CodePudding user response:
is this the result you want?
df.groupby('id1').apply(lambda x: len(x.dropna(axis=1,how='all').columns))
'''
id1
a1 3
b1 4
c1 2
dtype: int64
