Home > database >  Filter dataframe based on corresponding rows in another one
Filter dataframe based on corresponding rows in another one

Time:01-29

would like to create df3 where the url would come from df1 and the traffic value from corresponding rows in df2.

Current code:

import pandas as pd 


data = [['http://url1.com'], ['http://url3.com']]
data_2 = [[{'url':'http://url1.com', 'traffic':100}], [{'url':'http://url2.com', 'traffic':200}], [{'url':'http://url3.com', 'traffic':300}]] 

df1 = pd.DataFrame(data=data, columns=['url'])
df2 = pd.DataFrame(data=data_2, columns=['url', 'traffic'])


df3 = pd.merge(left=df1, right=df2, on='url')

Expected output:



                             url traffic
0                 http://url1.com  100
1                 http://url3.com  300

Current output:

ValueError: 2 columns passed, passed data had 1 columns

CodePudding user response:

Regarding https and http, you need to make sure you overwrite the dataframe:

import pandas as pd 

data = [['https://url1.com'], ['https://url3.com']]
data_2 = [[{'url':'http://url1.com', 'traffic':100}], [{'url':'http://url2.com', 'traffic':200}], [{'url':'http://url3.com', 'traffic':300}]] 
df1 = pd.DataFrame(data=data, columns=['url'])
df2 = pd.DataFrame([row[0] for row in data_2])

df1 = df1.replace(to_replace = 'https', value='http', regex=True)
df3 = pd.merge(left=df1, right=df2, on='url')
print(df3)
               url  traffic
0  http://url1.com      100
1  http://url3.com      300
  •  Tags:  
  • Related