Home > Net >  Loop through old and new versions of files
Loop through old and new versions of files

Time:01-28

I am trying to create a .csv containing records that are different between old and new csv files. I have successfully accomplished this with a single such pair using

old_df = 'file1_old.csv'
new_df = 'file1_new.csv'

df1 = pd.read_csv(old_df)
df2 = pd.read_csv(new_df)

df1['flag'] = 'old'
df2['flag'] = 'new'

df = pd.concat([df1, df2])

dups_dropped = df.drop_duplicates(df.columns.difference(['flag']) keep=False)
dups_dropped.to_csv('difference.csv', index=False)

I am struggling to wrap my mind around how to scale this with a loop (?) to output a csv for each new pairing if the new v. old file names input are of the same convention, for instance:

file1_new, file1_new
file2_new, file2_old 
file3_new, file3_old

so that the output is

file1_difference.csv
file2_difference.csv
file3_difference.csv

Thoughts? Much appreciated

CodePudding user response:

Using a simple for loop with f-strings to help format the filenames should work:

for i in range(1,11): # replace 11 with the number of files you have   1
    old_df = f'file{i}_old.csv'
    new_df = f'file{i}_new.csv'

    df1 = pd.read_csv(old_df)
    df2 = pd.read_csv(new_df)

    df1['flag'] = 'old'
    df2['flag'] = 'new'

    df = pd.concat([df1, df2])

    dups_dropped = df.drop_duplicates(df.columns.difference(['flag']) keep=False)
    dups_dropped.to_csv(f'difference{i}.csv', index=False)
  •  Tags:  
  • Related