How to create new column in dataframe based on condition from other dataframe?-CodePudding

df1 = pd.DataFrame({"DEPTH":[0.5, 1, 1.5, 2, 2.5],
                     "POROSITY":[10, 22, 15, 30, 20],
                     "WELL":"well 1"})

df2 = pd.DataFrame({"Well":"well 1",
                    "Marker":["Fm 1","Fm 2"],
                    "Depth":[0.7, 1.7]})

Hello everyone. I have two dataframes and I would like to create a new column on df1, for example: df1["FORMATIONS"], with information from df2["Marker"] values based on depth limits from df2["Depth"] and df1["DEPTH"].

So, for example, if df2["Depth"] = 1.7, then all samples in df1 with df1["DEPTH"] > 1.7 should be labelled as "Fm 2" in this new column df1["FORMATIONS"].

And the final dataframe df1 should look like this:

DEPTH   POROSITY   WELL   FORMATIONS
0.5     10        well 1     nan
1       22        well 1     Fm 1
1.5     15        well 1     Fm 1
2       30        well 1     Fm 2
2.5     20        well 1     Fm 2

Anyone could help me?

CodePudding user response：

What you're doing here is transforming continuous data into categorical data. There are many ways to do this with pandas, but one of the better known ways is using

CodePudding user response：

Use pandas.merge_asof:

NB. the columns used for the merge need to be sorted first

pd.merge_asof(df1,
              df2[['Marker', 'Depth']].rename(columns={'Marker': 'Formations'}),
              left_on='DEPTH', right_on='Depth')

output:

   DEPTH  POROSITY    WELL Formations  Depth
0    0.5        10  well 1        NaN    NaN
1    1.0        22  well 1       Fm 1    0.7
2    1.5        15  well 1       Fm 1    0.7
3    2.0        30  well 1       Fm 2    1.7
4    2.5        20  well 1       Fm 2    1.7