I have these two dfs. In the first dataframes, some rows are missing "merchant_type" codes. Though from the dataframe one can see that the "merchant_type_desc" with the missing merchant_type codes can also be deduced from the df
df1
| merchant_type | merchant_type_desc |
|---|---|
| 8641 | Associations - Civic, Social, and Frater |
| 8641 | Associations - Civic, Social, and Frater |
| 8220 | Colleges, Universities, Professional Sch |
| 7922 | Theatrical Producers (except Motion Pict |
| 5072 | Hardware Equipment and Supplies |
| 5251 | Hardware Stores |
| 5200 | Home Supply Warehouse |
| 2741 | Miscellaneous Publishing and Printing |
| 5971 | Art Dealers and Galleries |
| 2741 | Miscellaneous Publishing and Printing |
| Miscellaneous Publishing and Printing | |
| Grocery Stores, Supermarkets | |
| Grocery Stores, Supermarkets | |
| 5411 | Grocery Stores, Supermarkets |
| 3715 | Fairfield Inn |
| 3596 | NEW MCC CODE |
| 3771 | NEW MCC CODE |
| 7400 | NEW MCC CODE |
| 0 | Default |
df2
| merchant_type | merchant_type_desc |
|---|---|
| 8641 | Associations - Civic, Social, and Frater |
| 8220 | Colleges, Universities, Professional Sch |
| 7922 | Theatrical Producers (except Motion Pict |
| 5072 | Hardware Equipment and Supplies |
| 5251 | Hardware Stores |
| 5200 | Home Supply Warehouse |
| 2741 | Miscellaneous Publishing and Printing |
| 5971 | Art Dealers and Galleries |
| 5411 | Grocery Stores, Supermarkets |
| 3715 | Fairfield Inn |
| 3596 | NEW MCC CODE |
| 3771 | NEW MCC CODE |
| 7400 | NEW MCC CODE |
| 0 | Default |
The second dataframe has the distinct merchant_type codes and their corresponding descriptions. Note the "NEW MCC CODE" merchant type description.
Merging the two dfs on merchant_type_desc column would loose the different/distinct merchant_types of the "NEW MCC CODE"
To lookup the missing merchant_type codes in df1 In excel I would have used an IF Function to first check the value of the "Merchant_type_description" if it equals "NEW MCC CODE" I would retain the present code, otherwise I would do vlookup on df2 and return the code from it. How do I achive this in Pandas dataframe.
Expected Output:
| merchant_type | merchant_type_desc |
|---|---|
| 8641 | Associations - Civic, Social, and Frater |
| 8641 | Associations - Civic, Social, and Frater |
| 8220 | Colleges, Universities, Professional Sch |
| 7922 | Theatrical Producers (except Motion Pict |
| 5072 | Hardware Equipment and Supplies |
| 5251 | Hardware Stores |
| 5200 | Home Supply Warehouse |
| 2741 | Miscellaneous Publishing and Printing |
| 5971 | Art Dealers and Galleries |
| 2741 | Miscellaneous Publishing and Printing |
| 2741 | Miscellaneous Publishing and Printing |
| 5411 | Grocery Stores, Supermarkets |
| 5411 | Grocery Stores, Supermarkets |
| 5411 | Grocery Stores, Supermarkets |
| 3715 | Fairfield Inn |
| 3596 | NEW MCC CODE |
| 3771 | NEW MCC CODE |
| 7400 | NEW MCC CODE |
| 0 | Default |
CodePudding user response:
you can use
df3 = df2.loc[df2.merchant_type_desc == 'NEW MCC CODE']
to get just the rows with merchant type NEW MCC CODE
then gather all rows
df = pd.concat([df1, df2, df3])
remove the blanks, this will depend on exactly what the blanks are, in this case I've assumed they are empty strings
df = df.loc[~df.merchant_type == '']
then remove duplicate rows
df = df.drop_duplicates()
CodePudding user response:
You can use this if you prefer a one line solution:
df1.loc[df1.merchant_type.isna(), 'merchant_type']=df1.merge(df2[df2.merchant_type_desc!="NEW MCC CODE"], how='left', on='merchant_type_desc').loc[df1.merchant_type.isna(), 'merchant_type_y']
