Home > Software design >  Better way to do computation over pandas
Better way to do computation over pandas

Time:02-02

Below is my pandas snippet. it works. Given a df, I wish to know if there exist any row that satisfy c1> 10 and C2 and C3 are True. Below code works. I wsh to know if there is any better way to do the same.

import pandas as pd
inp = [{'c1':10, 'c2':True, 'c3': False}, {'c1':9, 'c2':True, 'c3': True}, {'c1':11, 'c2':True, 'c3': True}]
df = pd.DataFrame(inp)

def check(df):
    for index, row in df.iterrows():
        if ((row['c1']>10) & (row['c2']==True)& (row['c3']==True)):
            return True
        else:
            continue

t = check(df)

CodePudding user response:

When using pandas you rarely need to iterate over rows and apply the operations per each row separately. In many cases if you apply the same operation to the whole dataframe or column you get the same or similar result and faster a more readable code. In your case:

(df['c1'] > 10) & df['c2'] & df['c3']

# will lead to a Series:
# 0    False
# 1    False
# 2     True
# dtype: bool

(note that I am calling the operation on the whole df rather than single row

which signifies for which rows the condition holds. If you need to know just if any row satisfies the condition, you can all any:

((df['c1'] > 10) & df['c2'] & df['c3']).any()
# True

So your whole check function would be:

def check(df):
    return ((df['c1'] > 10) & df['c2'] & df['c3']).any()

CodePudding user response:

It is not clear what you want to change or improve about your solution, but you can achieve the same without a separate function and loops as well -

df[(df['c1'] > 10) & (df['c2']) & (df['c3'])].index.size > 0

CodePudding user response:

The condition in question is (df.c1 > 10) & df.c2 & df.c3 You can either check if there are any rows in the dataframe df that satisfies this condition.

>>> print(((df.c1 > 10) & df.c2 & df.c3).any())

True

or , you can check for the length of the dataframe returned from the original dataframe - for this condition (which will be df[(condition)]

>>> print(len(df[((df.c1>10) & df.c2 & df.c2)]) > 0)

True

  •  Tags:  
  • Related