I have a dataframe that contains columns like this - ['labels', 'labels.1', 'labels.2'] this continues to some number labels.n
One of these columns contains a value I need. I want to grab that value and add it to df['final_label']. If the value is not in the labels columns, the new column should be blank.
The dataframe looks like this:
| Age | City | labels | labels.1 | labels.2 | labels.n |
| ----- | ------ | ------ | -------- | -------- | -------- |
| 27 | city A | value1 | other | other | other |
| 45 | city B | other | other | value2 | |
| 34 | city A | value2 | other | other | other |
| 57 | city D | other | value2 | | |
| 30 | city E | other | other | other | |
I need the final dataframe to look like this:
| Age | City | labels |
| ----- | ------ | ------ |
| 27 | city A | value1 |
| 45 | city B | value2 |
| 34 | city A | value2 |
| 57 | city D | value2 |
| 30 | city E | |
I realize a nested np.where() will have an unknown depth so not sure how to implement that in a simple way.
Here is what I have been trying to do:
labels_cols = [col for col in df.columns if "labels" in col]
values_list = ['value1', 'value2']
### I want to basically do this, but the depth is unknown
df['final_labels'] = np.where(df['labels'].isin(values_list), df['labels'],
np.where(df['labels.1'].isin(values_list), df['labels.1'],
np.where(df['labels.2'].isin(values_list), df['labels.2'],
np.where(df['labels.n'].isin(values_list), df['labels.n'],''))))
### Tried some things like this, but can't get the syntax right
df['final_label'] = df.loc[df.index(values_list), df.columns.str.contains('label')==True]
CodePudding user response:
You can use fillna with another series, wich uses that series to fill the null values. Assuming all columns used to fill the labels column contain "labels.":
for c in df.columns:
if "labels." in c:
df["labels"] = df["labels"].fillna(df[c])
del df[c]
Then df["labels"] contains the result.
CodePudding user response:
Use filter to keep labels columns then replace unwanted values by nan before forward fill values until the last column:
pattern = r'^(?!value1|value2)$'
df['final_label'] = df.filter(like='labels').replace(pattern, np.nan, regex=True) \
.ffill(axis=1).iloc[:, -1].fillna('')
print(df)
# Output
Age City labels labels.1 labels.2 labels.n final_label
0 27 city A value1 other other other other
1 45 city B other other value2 value2
2 34 city A value2 other other other other
3 57 city D other value2 value2
4 30 city E other other other other
