Home > Software engineering >  Output CSV row only if 2 columns in 2 csv files match with Python-Pandas
Output CSV row only if 2 columns in 2 csv files match with Python-Pandas

Time:01-06

Newbie with Python. I have 2 csv files that both have student last name and student first name (Re-enrolledonline.csv and currentschoolroster.csv - the number of columns and column headings are different but both files contain this information). I then need to create a CSV with a list of students that are NOT found in Re-enrolledonline.csv but ARE in currentschoolroster.csv. I can match one column without an issue with the code below but I can't seem to find a good resource on how to only write to the new file if BOTH first name and last name columns are matched.

Thank you!


import pandas as pd



f1 = pd.read_csv('/users/Desktop/CompareFiles/currentschoolroster.csv')
f2 = pd.read_csv('/users/Desktop/CompareFiles/Re-enrolledonline.csv')

notmatched = f1[~f1.Last_Name.isin(f2.StudentLastName)]

notmatched.to_csv('/users/Desktop/CompareFiles/notmatched.csv')

CodePudding user response:

Pandas uses & for and, | for or in indexing: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-indexing

That should work fine:

notmatched = f1[~f1.Last_Name.isin(f2.StudentLastName) \
                & (~f1.First_Name.isin(f2.StudentFirstName))]

PS. Beware, if the CSVs were big, isin() would be very slow.

  •  Tags:  
  • Related