Home > Back-end >  How to assign a value to a cell in dataframe A based on a value in dataframe B, conditional on value
How to assign a value to a cell in dataframe A based on a value in dataframe B, conditional on value

Time:01-27

I'm really amateur-level with both python and pandas, but I'm trying to solve an issue for work that's stumping me.

I have two dataframes, let's call them dfA and dfB:

dfA:

     project_id    Category     Initiative

     10              
     20
     30
     40

dfB:

     project_id     field_id     value  
     10             100          lorem
     10             200          lorem1
     10             300          lorem2

     20             200          ipsum
     20             300          ipsum1
     20             500          ipsum2

Let's say I know "Category" from dfA correlates to field_id "100" from dfB, and "Initiative" correlates to field_id "200".

I need to look through dfB and for a given project_id/field_id combination, take the corresponding value in the "value" column and place it in the correct cell in dfA.

The result would look like this:

dfA:

     project_id    Category     Initiative

     10            lorem        lorem1
     20                         ipsum
     30
     40

Bonus difficulty: not every project in dfA exists in dfB, and not every field_id is used in every project_id.

I hope I've explained this well enough; I feel like there must be a relatively simple way to handle this that I'm missing.

CodePudding user response:

You could do something like this although it's not very elegant, there must be a better way. I had to use try/except because of the cases where the project Id is not available in the dfB. I put NaN values for the missing ones but you can easily put empty strings.

def get_value(row):
    try:
        res = dfB[(dfB['field_id'] == 100) & (dfB['project_id'] == row['project_id'])]['value'].iloc[0]
    except:
        res = np.nan
    row['Categorie'] = res
    try:
        res = dfB[(dfB['field_id'] == 200) & (dfB['project_id'] == row['project_id'])]['value'].iloc[0]
    except:
        res = np.nan
    row['Initiative'] = res
    return row

dfA = dfA.apply(get_value, axis=1)

EDIT: as mentioned in comment, this is not very flexible as some values are hardcoded but you can easily change that with something like the below. This way, if the field_id change or you need to add/remove a column, just update the dictionary.

columns_field = {"Category": 100, "Initiative": 200}

def get_value(row):
    for key, value in columns_fields.items():
        try:
            res = dfB[(dfB['field_id'] == value) & (dfB['project_id'] == row['project_id'])]['value'].iloc[0]
        except:
            res = np.nan
        row[key] = res
    return row

dfA = dfA.apply(get_value, axis=1)
  •  Tags:  
  • Related