I am trying to create a .csv containing records that are different between old and new csv files. I have successfully accomplished this with a single such pair using
old_df = 'file1_old.csv'
new_df = 'file1_new.csv'
df1 = pd.read_csv(old_df)
df2 = pd.read_csv(new_df)
df1['flag'] = 'old'
df2['flag'] = 'new'
df = pd.concat([df1, df2])
dups_dropped = df.drop_duplicates(df.columns.difference(['flag']) keep=False)
dups_dropped.to_csv('difference.csv', index=False)
I am struggling to wrap my mind around how to scale this with a loop (?) to output a csv for each new pairing if the new v. old file names input are of the same convention, for instance:
file1_new, file1_new
file2_new, file2_old
file3_new, file3_old
so that the output is
file1_difference.csv
file2_difference.csv
file3_difference.csv
Thoughts? Much appreciated
CodePudding user response:
Using a simple for loop with f-strings to help format the filenames should work:
for i in range(1,11): # replace 11 with the number of files you have 1
old_df = f'file{i}_old.csv'
new_df = f'file{i}_new.csv'
df1 = pd.read_csv(old_df)
df2 = pd.read_csv(new_df)
df1['flag'] = 'old'
df2['flag'] = 'new'
df = pd.concat([df1, df2])
dups_dropped = df.drop_duplicates(df.columns.difference(['flag']) keep=False)
dups_dropped.to_csv(f'difference{i}.csv', index=False)
