Home > OS >  Pandas dataframe rows incorrectly dropping when one condition is met
Pandas dataframe rows incorrectly dropping when one condition is met

Time:01-23

Background: I have a pandas df which I am trying to remove rows from if the following conditions are all met.

  • % Ownership (float64) == 100
  • Ownership Audit Note (object) contains either 'Ignore' or 'ignore'
  • Entity ID % (float64) == Account # % (object)

Issue - it looks like my code is dropping rows if any conditions is met, rather than all of them. Is there something wrong with my code?

My current code: this is the line of code I wrote -

df = df[(~(df['Ownership Audit Note'].str.contains('ignore|Ignore')))
& (~(df['% Ownership'] == 100))
& (~(df['Entity ID %'] == 'Account # %'))]

Extract from df

df = pd.DataFrame(
    {
        "% Ownership": {0: 100.0, 1: 50.0, 2: 100.0},
        "Ownership Audit Note": {
            0: "[ignore] 100% Ownership ",
            1: "[Ignore] 50% Ownership ",
            2: "[ignored] Recurring 67% as of 2022-01-07 ",
        },
        "Entity ID %": {
            0: 100.0,
            1: 50.0,
            2: 67.0,
        },
        "Account # %": {
            0: 100.0,
            1: 50.0,
            2: 65.0,
        },
    }
)

How my code is currently behaving: using my current code at the db extract as an example.

  • Index position 0 - the first row should drop (and is) as all conditions are met.
  • Index position 1 - this row should not drop as % Ownership is not == 100. This row is still dropping.
  • Index position 2 - this row should not drop as Entity ID % does not == Account # %. This row is still dropping.

CodePudding user response:

You should invert the final result of the and, not each invert each test (or then you'd need to use or). Also there was a mistake, you were comparing df['Entity ID %'] to a fixed string:

df[~(df['Ownership Audit Note'].str.contains('ignore|Ignore')
     & df['% Ownership'].eq(100)
     & df['Entity ID %'].eq(df['Account # %'])
     )]

output:

   % Ownership                       Ownership Audit Note  Entity ID %  Account # %
1         50.0                    [Ignore] 50% Ownership          50.0         50.0
2        100.0  [ignored] Recurring 67% as of 2022-01-07          67.0         65.0

CodePudding user response:

df = df[(df['Ownership Audit Note'].str.contains('ignore', case=False))
&(df['% Ownership'] == 100.0)
&(df['Entity ID %'] == df['Account # %'])]

CodePudding user response:

As you said correctly, the problem is that you are keeping only rows that match every condition. This is because you use the NOT operator (~) on each condition and not on the sum of them all. This will first find the rows in which all conditions are met, and then take the opposite rows:

df = df[(~(df['Ownership Audit Note'].str.contains('ignore|Ignore'))
& (df['% Ownership'] == 100)
& (df['Entity ID %'] == 'Account # %'))]
  •  Tags:  
  • Related