My sample DataFrame is:
| file | Voucher_stand |
|---|---|
| 90.txt | SCA |
| 90.txt | SCB |
| 60.txt | WFA |
| 60.txt | WFO |
| 90.txt | SCA |
| 50.txt | SCA |
| 80.txt | SCA |
| 100.txt | SCA |
I want to define a dictionary based on which I can select some columns, but each file has its own code in the Voucher_stand column, how can I filter this DataFrame by dictionary based on Voucher_stand and file instead of using the list every time?
For example, define Voucher_Stand and files in a dictionary and then filter my DataFrame based on this dictionary:
code = {'90.txt':['SCA', 'SCB'], '60.txt':['WFA', 'WFO']}
CodePudding user response:
Try this:
pd.DataFrame(df.set_index('file').groupby('file')['Voucher_stand'].apply(list)).T.reset_index(drop=True)
You can get it in a dict form by calling .to_dict():
pd.DataFrame(df.set_index('file').groupby('file')['Voucher_stand'].apply(list)).T.reset_index(drop=True).to_dict()
CodePudding user response:
You could iterate over the dictionary using items and filter the desired rows using equal comparison for the key and isin for the list of Voucher_stand codes. Then, get the row's index as list and save it into a variable. Use iloc to select only those rows that match the criteria.
import pandas as pd
df = pd.read_csv('sample.csv', sep='\s ')
print(df)
CODE = {'90.txt':['SCA', 'SCB'], '60.txt':['WFA', 'WFO']}
filtered_rows = []
for k,v in CODE.items():
filtered_rows = df[(df['file'] == k) & df['Voucher_stand'].isin(v)].index.to_list()
result = df.iloc[filtered_rows]
print(result)
Output from result
file Voucher_stand
0 90.txt SCA
1 90.txt SCB
4 90.txt SCA
2 60.txt WFA
3 60.txt WFO
