I have a DataFrame that looks like this:
col1 Duplicates Empty
0 PID unique Empty
1 OBR unique notEmpty
2 PID not unique notEmpty
3 OBR not unique notEmpty
4 PID not unique Empty
5 OBR not unique notEmpty
6 PID unique Empty
7 OBR not unique notEmpty
I need to go through the DataFrame and look at each row starting with OBR in col1 of the DataFrame. If the col1 value is OBR then I need to check if Duplicates is "unique" or not. If the Row starting with OBR is "unique" in the duplicates column append "Unique" to the end of the row.
If the row starting with OBR is "not unique" in the Duplicates column then I need to check the PID row just above to check if it is "Empty" or is "notEmpty" in the Empty column. If the row with OBR is "not unique" and the PID row just above is "notEmpty" then append "Sent with" to the end of the row starting with OBR. If the OBR row is "not unique" and the PID right above is "Empty" then append "Sent Without" to the end of the OBR row.
Ex: row number 5 OBR Duplicates is "not unique" so I would check the PID just above looking in the Empty column, and row 4 in the Empty column is "Empty" so I would like to append "Sent Without" to the end of row 5 in newCol.
Here is what the expected outout would be:
col1 Duplicates Empty newCol
0 PID unique Empty
1 OBR unique notEmpty Unique
2 PID not unique notEmpty
3 OBR not unique notEmpty Sent with
4 PID not unique Empty
5 OBR not unique notEmpty Sent Without
6 PID unique Empty
7 OBR not unique notEmpty Sent without
This is some of the code I have tried already, but I can not seem to get anything to work as I would like.
#x=0
for y,x in df.iterrows():
aboveRow = x-1
if x.Duplicates == "not unique" && x.col1 == 'OBR':
if aboveRow.Empty == "empty":
df["newCol"] = "Sent without"
elif aboveRow.Empty == "notEmpty":
df["newCol"] = "Sent with"
else:
df["newCol"] = "Unique"
Here is the DF as a dict:
data2 = {
'col1' : ['PID', 'OBR', 'PID', 'OBR', 'PID', 'OBR', 'PID', 'OBR'],
'Duplicates' : ['unique','unique','not unique','not unique',
'not unique','not unique','unique','not unique'],
'Empty' : ['Empty','notEmpty','notEmpty','notEmpty','Empty',
'notEmpty','Empty', 'notEmpty']
}
CodePudding user response:
Note that your input and the input used for the expected outcome don't match. I used the below DataFrame (which matches the first columns of your expected outcome):
df = pd.DataFrame({'col1': ['PID', 'OBR', 'PID', 'OBR', 'PID', 'OBR', 'PID', 'OBR'],
'Duplicates': ['unique', 'unique', 'not unique', 'not unique',
'not unique', 'not unique', 'unique', 'not unique'],
'Empty': ['Empty', 'notEmpty', 'notEmpty', 'notEmpty', 'Empty',
'notEmpty', 'Empty', 'notEmpty']})
You can use shift to get the previous row's values in the current row and use np.select to assign values:
df['new_col'] = df['Empty'].shift()
df['new_col'] = np.select([df['col1'].eq('OBR') & df['Duplicates'].eq('unique'),
df['col1'].eq('OBR') & ~df['Duplicates'].eq('unique') & df['new_col'].eq('notEmpty'),
df['col1'].eq('OBR') & ~df['Duplicates'].eq('unique') & df['new_col'].eq('Empty')
], ['Unique', 'Sent With', 'Sent Without'], '')
Output:
col1 Duplicates Empty new_col
0 PID unique Empty
1 OBR unique notEmpty Unique
2 PID not unique notEmpty
3 OBR not unique notEmpty Sent With
4 PID not unique Empty
5 OBR not unique notEmpty Sent Without
6 PID unique Empty
7 OBR not unique notEmpty Sent Without
For data2, the outcome is:
col1 Duplicates Empty new_col
0 PID unique Empty
1 OBR unique notEmpty Unique
2 PID not unique Empty
3 OBR not unique notEmpty Sent Without
4 PID not unique notEmpty
5 OBR unique Empty Unique
6 PID unique Empty
7 OBR not unique notEmpty Sent Without
