would like to create df3 where the url would come from df1 and the traffic value from corresponding rows in df2.
Current code:
import pandas as pd
data = [['http://url1.com'], ['http://url3.com']]
data_2 = [[{'url':'http://url1.com', 'traffic':100}], [{'url':'http://url2.com', 'traffic':200}], [{'url':'http://url3.com', 'traffic':300}]]
df1 = pd.DataFrame(data=data, columns=['url'])
df2 = pd.DataFrame(data=data_2, columns=['url', 'traffic'])
df3 = pd.merge(left=df1, right=df2, on='url')
Expected output:
url traffic
0 http://url1.com 100
1 http://url3.com 300
Current output:
ValueError: 2 columns passed, passed data had 1 columns
CodePudding user response:
Regarding https and http, you need to make sure you overwrite the dataframe:
import pandas as pd
data = [['https://url1.com'], ['https://url3.com']]
data_2 = [[{'url':'http://url1.com', 'traffic':100}], [{'url':'http://url2.com', 'traffic':200}], [{'url':'http://url3.com', 'traffic':300}]]
df1 = pd.DataFrame(data=data, columns=['url'])
df2 = pd.DataFrame([row[0] for row in data_2])
df1 = df1.replace(to_replace = 'https', value='http', regex=True)
df3 = pd.merge(left=df1, right=df2, on='url')
print(df3)
url traffic
0 http://url1.com 100
1 http://url3.com 300
