I would like to create a function to go through each unique value in a column and check if another column contains a value and then create a column that shows the result.
For example: for each unique ID in df, check if the stats is A then create a result column:
df:
| ID | Status |
|---|---|
| 1 | A |
| 1 | B |
| 2 | B |
| 2 | C |
new_df:
| ID | Status | Result A? |
|---|---|---|
| 1 | A | YES |
| 1 | B | NO |
I started off with
def function(df):
id = list(df['ID'].unique())
status = ['A']
for i in id:
If anyone can share some thoughts I would much appreciate.
CodePudding user response:
I am not sure if I understand the rules correctly. Should I always take the first occurrence of the ID? Then the second row in your expected output is wrong.
You can use numpy.where
df = pd.DataFrame({'ID': {0: 1, 1: 1, 2: 2, 3: 2}, 'Status': {0: 'A', 1: 'B', 2: 'B', 3: 'C'}})
new_df = df.drop_duplicates(subset=["ID"]).copy()
new_df["Result A?"] = np.where(new_df.Status == "A", "YES", "NO")
to get this:
ID Status Result A?
0 1 A YES
2 2 B NO
CodePudding user response:
df = pd.DataFrame({'ID': {0: 1, 1: 1, 2: 2, 3: 2, 4: 2, 5:3}, 'Status': {0: 'A', 1: 'B', 2: 'B', 3: 'C', 4: 'A', 5:'B'}})
| index | ID | Status |
|---|---|---|
| 0 | 1 | A |
| 1 | 1 | B |
| 2 | 2 | B |
| 3 | 2 | C |
| 4 | 2 | A |
| 5 | 3 | B |
df['Result A?'] = df['Status'].apply(lambda x: x=='A')
df['Result A?'] = df['ID'].map(df.groupby('ID')['Result A?'].any())
df['Result A?'] = df['Result A?'].map({True:'YES', False:'NO'})
| index | ID | Status | Result A? |
|---|---|---|---|
| 0 | 1 | A | YES |
| 1 | 1 | B | YES |
| 2 | 2 | B | YES |
| 3 | 2 | C | YES |
| 4 | 2 | A | YES |
| 5 | 3 | B | NO |
