I want to keep a row that has four same values among of five columns. Also, want to remove the last four columns.
I have the following Dataframe:
>> df
t0 t1 t2 t3 t4
0 16 0 30 30 30
1 7 1 2 0 30
2 5 30 30 30 30
3 1 30 30 30 30
4 18 30 30 30 30
I want to keep only rows 2, 3, and four. The output should look as follow:
>> df
t0
2 5
3 1
4 18
CodePudding user response:
You can also try this:
def find_equal(list_values):
c = Counter(list_values)
for item in c:
if c[item] == 4:
return True
return False
df['keep'] = df.apply(lambda x: find_equal([x.t0, x.t1, x.t2, x.t3, x.t4]), axis=1)
df = df[(df.keep == True)]
df = df.drop('keep', 1)
print(df)
CodePudding user response:
You can use nunique to compute the number of unique values. Among 5 columns, if 4 are identical, there should be 2 unique values.
You can use this information to slice the rows:
df2 = df[df.nunique(axis=1).eq(2)]
output:
t0 t1 t2 t3 t4
2 5 30 30 30 30
3 1 30 30 30 30
4 18 30 30 30 30
To subset only the first column:
df.loc[df.nunique(axis=1).eq(2), ['t0']]
output:
t0
2 5
3 1
4 18
If you want to check if the four values are identical only in columns t1->t4, use:
cols = ['t1', 't2', 't3', 't4']
df2 = df[df[cols].nunique(axis=1).eq(1)]
output:
t0 t1 t2 t3 t4
2 5 30 30 30 30
3 1 30 30 30 30
4 18 30 30 30 30
