Hi I'm new to python and pandas dataframes, I've been stuck on the following for a while:
I have two different dataframes: df1 has ~130 rows, df2 has ~380 rows
df1['Symbol_1']has values like 'DASHUSDT' (for example) containing the keyword 'DASH'df2['Symbol_2']has values like 'DASHBTC' (for example) from df2 containing the keyword 'DASH'
I would like to search df2 by keyword from df1, and create a new dataframe (df3 like below).
- Note that all rows from df1 will always have a keyword matched from df2 except 'BTC/USDT' pair.
- To make things more clear: Symbol_1 (for example: DASHUSDT) always ends with 'USDT', and 'DASH' is the keyword. Symbol_2 (for example: DASHBTC) always ends with 'BTC', and 'DASH' is the keyword.
- I am creating a watchlist for USDT pairs, and I would like to check the BTC pairs from the same watchlist easily. BTC is the king, and all altcoins have a BTC pair, DASH/BTC, ETH/BTC, LTC/BTC etc. But BTC will not have a BTC pair like this BTC/BTC, so for the 'BTC/USDT' row, there will not have any matched rows from 2nd dataframe, so we can keep it empty or replace the 'Symbol_2' column with '0' and '24H_Change_2' column with '0'
[df1]
Symbol_1 Price 24H_Change_1 Volume
BTCUSDT 42135.15 -1.565 200.125
DASHUSDT 139.87 10.0390 1.7400
ADAUSDT 1.23060 1.0700 1.1800
C98USDT 2.5328 0.4520 1.1900
1000SHIBUSDT 0.029233 6.589 2.4564
[df2]
Symbol_2 24H_Change_2
1000SHIBBTC 4.412
SNMBTC 5.1235
ADABTC 1.0700
XVGBTC 15.1240
C98BTC 0.4520
SALTBTC 7.149
EOSBTC 5.551
DASHBTC 11.258
Output dataframe I want: (1st option)
Symbol_1 Price 24H_Change_1 Volume Symbol_2 24H_Change_2
BTCUSDT 42135.15 -1.565 200.125
DASHUSDT 139.87 10.0390 1.7400 DASHBTC 11.258
ADAUSDT 1.23060 1.0700 1.1800 ADABTC 1.0700
C98USDT 2.5328 0.4520 1.1900 C98BTC 0.4520
1000SHIBUSDT 0.029233 6.589 2.4564 1000SHIBBTC 4.412
Output dataframe I want: (2nd option)
If 1st option is not possible or too much of a trouble or two complicated to accomplish, I am fine with 2nd option, I can separate 'BTC/USDT' pair from 1st dataframe out as a single dataframe for reference and then add a new dataframe (3rd dataframe like below) for altcoins.
Symbol Price 24H_Change Volume
BTCUSDT 42135.15 -1.565 200.125
Symbol_1 Price 24H_Change_1 Volume Symbol_2 24H_Change_2
DASHUSDT 139.87 10.0390 1.7400 DASHBTC 11.258
ADAUSDT 1.23060 1.0700 1.1800 ADABTC 1.0700
C98USDT 2.5328 0.4520 1.1900 C98BTC 0.4520
1000SHIBUSDT 0.029233 6.589 2.4564 1000SHIBBTC 4.412
CodePudding user response:
Solution:
- first use string-replace operations to get your 'From' currency/token column
- then merge df1, df2 with a pandas merge.
You're making this too hard, you don't need to keep a watchlist, just index both dataframes. (There will be some index labels in df2 missing in df1, merge handles that automatically, see below.)
Given you know df1['Symbol_1'] always ends in '...USDT' and df2['Symbol_2'] always ends in '...BTC', just create a new index or 'From' column with a simple string-replacement to eliminate the destination currency/token, hence: 'BTC', 'DASH', 'ADA'.
df1 = df1.set_index(df1['Symbol_1'].str.replace('(.*)USDT', r'\1', regex=True))
Note we use a regex capture group, so the r'\1' in the replace-expression matches the (.*) in the pattern.
Ditto df2.
Then you can simply create df3 with df1.merge(df2, ...)
See the documentation on merge(..., how='left'/'right'/'outer'/'inner') to figure out which type of merge you want; I think you need 'outer'. (If you get stuck on the merge syntax, edit your question to update it)
You will get NaN rows given that df1 has ~130 rows but df2 has ~380. See the .doc on fillna() for that.
CodePudding user response:
Here are the solution in full details:
# add the keyword as index value to df1
df1 = df1.set_index(df1['Symbol_1'].str.replace('(.*)USDT', r'\1', regex=True))
# add the exact keyword as index value to df2
df2 = df2.set_index(df1['Symbol_2'].str.replace('(.*)BTC', r'\1', regex=True))
now both dataframes share the same index value
# set default index name for df1 to 'Key'
df1.index.name = 'Key'
# set default index name for df2 to 'Key'
df2.index.name = 'Key'
# merge df2 to df1 (search both dataframes' column name = 'Key' that share the same value, add the columns from the returned rows to df1)
df3 = df1.merge(df2, on='Key')
# drop the default index column and re-index with number
df_reindexed=df3.reset_index(drop=True)
# finally, rename the column names if you want, it's easy to do that.
Thanks again to smci for the solution.
Then you can simply create df3 with df1.merge(df2, ...) See the documentation on merge(..., how='left'/'right'/'outer'/'inner') to figure out which type of merge you want; I think you need 'outer'. (If you get stuck on the merge syntax, edit your question to update it) You will get NaN rows given that df1 has ~130 rows but df2 has ~380. See the .doc on fillna() for that.
smci, df1 has 130 rows, df2 has 380 rows, I only need 130 rows from df2. so there won't be any NaN rows at all.
