Pandas dataframe rows incorrectly dropping when one condition is met-CodePudding

Background: I have a pandas df which I am trying to remove rows from if the following conditions are all met.

% Ownership (float64) == 100
Ownership Audit Note (object) contains either 'Ignore' or 'ignore'
Entity ID % (float64) == Account # % (object)

Issue - it looks like my code is dropping rows if any conditions is met, rather than all of them. Is there something wrong with my code?

My current code: this is the line of code I wrote -

df = df[(~(df['Ownership Audit Note'].str.contains('ignore|Ignore')))
& (~(df['% Ownership'] == 100))
& (~(df['Entity ID %'] == 'Account # %'))]

Extract from df

df = pd.DataFrame(
    {
        "% Ownership": {0: 100.0, 1: 50.0, 2: 100.0},
        "Ownership Audit Note": {
            0: "[ignore] 100% Ownership ",
            1: "[Ignore] 50% Ownership ",
            2: "[ignored] Recurring 67% as of 2022-01-07 ",
        },
        "Entity ID %": {
            0: 100.0,
            1: 50.0,
            2: 67.0,
        },
        "Account # %": {
            0: 100.0,
            1: 50.0,
            2: 65.0,
        },
    }
)

How my code is currently behaving: using my current code at the db extract as an example.

Index position 0 - the first row should drop (and is) as all conditions are met.
Index position 1 - this row should not drop as % Ownership is not == 100. This row is still dropping.
Index position 2 - this row should not drop as Entity ID % does not == Account # %. This row is still dropping.

CodePudding user response：

You should invert the final result of the and, not each invert each test (or then you'd need to use or). Also there was a mistake, you were comparing df['Entity ID %'] to a fixed string:

df[~(df['Ownership Audit Note'].str.contains('ignore|Ignore')
     & df['% Ownership'].eq(100)
     & df['Entity ID %'].eq(df['Account # %'])
     )]

output:

   % Ownership                       Ownership Audit Note  Entity ID %  Account # %
1         50.0                    [Ignore] 50% Ownership          50.0         50.0
2        100.0  [ignored] Recurring 67% as of 2022-01-07          67.0         65.0

CodePudding user response：

df = df[(df['Ownership Audit Note'].str.contains('ignore', case=False))
&(df['% Ownership'] == 100.0)
&(df['Entity ID %'] == df['Account # %'])]

CodePudding user response：

As you said correctly, the problem is that you are keeping only rows that match every condition. This is because you use the NOT operator (~) on each condition and not on the sum of them all. This will first find the rows in which all conditions are met, and then take the opposite rows:

df = df[(~(df['Ownership Audit Note'].str.contains('ignore|Ignore'))
& (df['% Ownership'] == 100)
& (df['Entity ID %'] == 'Account # %'))]