I have a "empty" data frame looking as follows:
6807 6809 5341
126293 nan nan nan
126294 nan nan nan
126295 nan nan nan
The column names give me an name_id whereas the index values give me a file_id. Now I want to search for the file_id and the name_id in separate pandas data frames named pro, cont, and neutral which look like this:
file_id name_id
0 126293 7244
1 126293 4978
2 126293 5112
3 126293 6864
If I find the file_idand name_idin the prodataframe I want to fill the empty data frame cell above with 1, when found in cont then -1 when in neutral, then the value entered into the matrix should be 0. Giving me a result like this, e.g.:
6807 6809 5341
126293 1 -1 0
126294 0 -1 0
126295 1 -1 1
Does someone know how to get this done?
CodePudding user response:
Here is one approach, use the intersection of file_id and name_id found in the pro, neutral, and cont DataFrame as an index to set the values you want either 1, 0, or -1. I use the Python set class to perform the intersection. However, it doesn't index well into the DataFrame because it results in a tuple.
pro_idx = set(zip(df.index, df.columns)).intersection(zip(pro['file_id'], pro['name_id']))
neut_idx = set(zip(df.index, df.columns)).intersection(zip(neutral['file_id'], neutral['name_id']))
cont_idx = set(zip(df.index, df.columns)).intersection(zip(cont['file_id'], cont['name_id']))
if any(pro_idx):
for f,n in pro_idx:
df.loc[f,n] = 1
if any(neut_idx):
for f,n in neut_idx:
df.loc[f,n] = 0
if any(cont_idx):
for f,n in cont_idx:
df.loc[f,n] = -1
CodePudding user response:
You can stack your 'empty' df (let's call it df) and merge against a combination of pro, con and neu. Then you can re-arrange it back into a 2d shape
Put the votes together into one dataframe:
votes = pd.concat([pro.assign(v=1), con.assign(v=-1), neu.assign(v=0)])
votes['name_id'] = votes['name_id'].astype(str) # you may or may not have to do this depending on what type your actual df is, as I have no way of knowing. It should match the type from columns in the empty df
votes now look like this (made up numbers by me):
file_id name_id v
0 126293 6807 1
1 126293 4978 1
2 126293 5112 1
3 126293 6864 1
0 126295 6809 -1
0 126294 5341 0
Now we merge it to a stacked df on name_id and file_id:
df1 = (df.unstack()
.reset_index()
.merge(votes, left_on = ['level_0','level_1'],
right_on = [ 'name_id','file_id'], how='left')[['level_0', 'level_1', 'v']]
)
df1 looks like
level_0 level_1 v
0 6807 126293 1.0
1 6807 126294 NaN
2 6807 126295 NaN
3 6809 126293 NaN
4 6809 126294 NaN
5 6809 126295 -1.0
6 5341 126293 NaN
7 5341 126294 0.0
8 5341 126295 NaN
Now unstack it back
df1.set_index(['level_1','level_0']).unstack()
output:
v
level_0 5341 6807 6809
level_1
126293 NaN 1.0 NaN
126294 0.0 NaN NaN
126295 NaN NaN -1.0
You get NaNs where you had no votes in either pro con or neu. The votes in those dfs that are for file_id/name_id not originally present in df are ignored
