How Filter dataframe with dictionary in pandas?-CodePudding

My sample DataFrame is:

file	Voucher_stand
90.txt	SCA
90.txt	SCB
60.txt	WFA
60.txt	WFO
90.txt	SCA
50.txt	SCA
80.txt	SCA
100.txt	SCA

I want to define a dictionary based on which I can select some columns, but each file has its own code in the Voucher_stand column, how can I filter this DataFrame by dictionary based on Voucher_stand and file instead of using the list every time?

For example, define Voucher_Stand and files in a dictionary and then filter my DataFrame based on this dictionary:

code = {'90.txt':['SCA', 'SCB'], '60.txt':['WFA', 'WFO']}

CodePudding user response：

Try this:

pd.DataFrame(df.set_index('file').groupby('file')['Voucher_stand'].apply(list)).T.reset_index(drop=True)

You can get it in a dict form by calling .to_dict():

pd.DataFrame(df.set_index('file').groupby('file')['Voucher_stand'].apply(list)).T.reset_index(drop=True).to_dict()

CodePudding user response：

You could iterate over the dictionary using items and filter the desired rows using equal comparison for the key and isin for the list of Voucher_stand codes. Then, get the row's index as list and save it into a variable. Use iloc to select only those rows that match the criteria.

import pandas as pd

df = pd.read_csv('sample.csv', sep='\s ')
print(df)

CODE = {'90.txt':['SCA', 'SCB'], '60.txt':['WFA', 'WFO']}

filtered_rows = []
for k,v in CODE.items():
    filtered_rows  = df[(df['file'] == k) & df['Voucher_stand'].isin(v)].index.to_list()

result = df.iloc[filtered_rows]
print(result)

Output from result

     file Voucher_stand
0  90.txt           SCA
1  90.txt           SCB
4  90.txt           SCA
2  60.txt           WFA
3  60.txt           WFO