Editing this to reflect addition work:
Situation
I have 2 pandas dataframes of Twitter search tweets API data in which I have a common data key, author_id.
I'm using the join method.
Code is:
dfTW08 = dfTW07.join(dfTW04uf, on='author_id', how='left', lsuffix='', rsuffix='4')
Results
When I run that, everything comes out as expected, except that all the other dataframe (dfTW04uf) values come in as NaN. Including the values for the other dataframe's author_id column.
Assessment
I'm not getting any error messages, but have to think it's something about the datatypes. The other dataframe is a mix of int64, object, bool, and datetime datatypes. So it seems odd they'd all be unrecognized.
Any suggestions on how to troubleshoot this greatly appreciated.
CodePudding user response:
Couldn't figure out the NaN issue using join, but was able to merge the databases with this:
callingdf.merge(otherdf, on='author_id', how='left', indicator=True)
Then did sort_values and drop_duplicates to get the final list I wanted.
CodePudding user response:
You can use merge instead of join since merge had everything join does but with more "power". (anything you can do with join you can do with merge)
I am assuming the NaN is coming up since the results aren't being discarded when you asked the first join to use on author ID and then include suffixes fo x an y. When you left join with merge you are discarding the non matches without any x and y suffixes.
