Iterate through a DataFrame, and if a value is equal to a specified value then check the row above f-CodePudding

I have a DataFrame that looks like this:

     col1    Duplicates   Empty     
0    PID       unique     Empty
1    OBR       unique     notEmpty
2    PID     not unique   notEmpty      
3    OBR     not unique   notEmpty
4    PID     not unique   Empty
5    OBR     not unique   notEmpty
6    PID       unique     Empty
7    OBR     not unique   notEmpty

I need to go through the DataFrame and look at each row starting with OBR in col1 of the DataFrame. If the col1 value is OBR then I need to check if Duplicates is "unique" or not. If the Row starting with OBR is "unique" in the duplicates column append "Unique" to the end of the row.

If the row starting with OBR is "not unique" in the Duplicates column then I need to check the PID row just above to check if it is "Empty" or is "notEmpty" in the Empty column. If the row with OBR is "not unique" and the PID row just above is "notEmpty" then append "Sent with" to the end of the row starting with OBR. If the OBR row is "not unique" and the PID right above is "Empty" then append "Sent Without" to the end of the OBR row.

Ex: row number 5 OBR Duplicates is "not unique" so I would check the PID just above looking in the Empty column, and row 4 in the Empty column is "Empty" so I would like to append "Sent Without" to the end of row 5 in newCol.

Here is what the expected outout would be:

     col1    Duplicates   Empty      newCol
0    PID       unique     Empty      
1    OBR       unique     notEmpty   Unique
2    PID     not unique   notEmpty     
3    OBR     not unique   notEmpty   Sent with
4    PID     not unique   Empty      
5    OBR     not unique   notEmpty   Sent Without
6    PID       unique     Empty      
7    OBR     not unique   notEmpty   Sent without

This is some of the code I have tried already, but I can not seem to get anything to work as I would like.

#x=0

for y,x in df.iterrows():

    aboveRow = x-1

    if x.Duplicates == "not unique" && x.col1 == 'OBR':

        if aboveRow.Empty == "empty":

            df["newCol"] = "Sent without"

        elif aboveRow.Empty == "notEmpty":
            
            df["newCol"] = "Sent with"
    else:
        
        df["newCol"] = "Unique"

Here is the DF as a dict:

data2 = {
    'col1' : ['PID', 'OBR', 'PID', 'OBR', 'PID', 'OBR', 'PID', 'OBR'],
    'Duplicates' : ['unique','unique','not unique','not unique',
                    'not unique','not unique','unique','not unique'],
    'Empty' : ['Empty','notEmpty','notEmpty','notEmpty','Empty',
               'notEmpty','Empty', 'notEmpty']
}

CodePudding user response：

Note that your input and the input used for the expected outcome don't match. I used the below DataFrame (which matches the first columns of your expected outcome):

df = pd.DataFrame({'col1': ['PID', 'OBR', 'PID', 'OBR', 'PID', 'OBR', 'PID', 'OBR'],
                   'Duplicates': ['unique', 'unique', 'not unique', 'not unique', 
                                  'not unique', 'not unique', 'unique', 'not unique'],
                   'Empty': ['Empty', 'notEmpty', 'notEmpty', 'notEmpty', 'Empty', 
                             'notEmpty', 'Empty', 'notEmpty']})

You can use shift to get the previous row's values in the current row and use np.select to assign values:

df['new_col'] = df['Empty'].shift()
df['new_col'] = np.select([df['col1'].eq('OBR') & df['Duplicates'].eq('unique'), 
                           df['col1'].eq('OBR') & ~df['Duplicates'].eq('unique') & df['new_col'].eq('notEmpty'),
                           df['col1'].eq('OBR') & ~df['Duplicates'].eq('unique') & df['new_col'].eq('Empty')
                          ], ['Unique', 'Sent With', 'Sent Without'], '')

Output:

  col1  Duplicates     Empty       new_col
0  PID      unique     Empty              
1  OBR      unique  notEmpty        Unique
2  PID  not unique  notEmpty              
3  OBR  not unique  notEmpty     Sent With
4  PID  not unique     Empty              
5  OBR  not unique  notEmpty  Sent Without
6  PID      unique     Empty              
7  OBR  not unique  notEmpty  Sent Without

For data2, the outcome is:

  col1  Duplicates     Empty       new_col
0  PID      unique     Empty              
1  OBR      unique  notEmpty        Unique
2  PID  not unique     Empty              
3  OBR  not unique  notEmpty  Sent Without
4  PID  not unique  notEmpty              
5  OBR      unique     Empty        Unique
6  PID      unique     Empty              
7  OBR  not unique  notEmpty  Sent Without