Want to keep a row which have four same values Pandas-CodePudding

I want to keep a row that has four same values among of five columns. Also, want to remove the last four columns.
I have the following Dataframe:

>> df
    t0  t1  t2  t3  t4
0   16  0   30  30  30
1   7   1   2   0   30
2   5   30  30  30  30
3   1   30  30  30  30
4   18  30  30  30  30

I want to keep only rows 2, 3, and four. The output should look as follow:

CodePudding user response：

You can also try this:

def find_equal(list_values):
    c = Counter(list_values)
    for item in c:
        if c[item] == 4:
            return True
    return False


df['keep'] = df.apply(lambda x: find_equal([x.t0, x.t1, x.t2, x.t3, x.t4]), axis=1)

df = df[(df.keep == True)]
df = df.drop('keep', 1)

print(df)

CodePudding user response：

You can use nunique to compute the number of unique values. Among 5 columns, if 4 are identical, there should be 2 unique values.

You can use this information to slice the rows:

df2 = df[df.nunique(axis=1).eq(2)]

output:

   t0  t1  t2  t3  t4
2   5  30  30  30  30
3   1  30  30  30  30
4  18  30  30  30  30

To subset only the first column:

df.loc[df.nunique(axis=1).eq(2), ['t0']]

output:

If you want to check if the four values are identical only in columns t1->t4, use:

cols = ['t1', 't2', 't3', 't4']
df2 = df[df[cols].nunique(axis=1).eq(1)]

output:

   t0  t1  t2  t3  t4
2   5  30  30  30  30
3   1  30  30  30  30
4  18  30  30  30  30