Home > OS >  how to compare two csv file in python and flag the difference?
how to compare two csv file in python and flag the difference?

Time:01-08

i am new to python. Kindly help me. Here I have two set of csv-files. i need to compare and output the difference like changed data/deleted data/added data. here's my example

file 1:
Sn  Name  Subject   Marks  
1   Ram      Maths     85
2   sita    Engilsh    66
3   vishnu  science    50
4   balaji  social     60

file 2:
Sn  Name    Subject   Marks
1   Ram     computer  85   #subject name have changed
2   sita    Engilsh   66
3   vishnu  science   90   #marks have changed
4   balaji  social    60
5   kishor  chem      99   #added new line

Output - i need to get like this :

Changed Items: 
1   Ram      computer  85
3   vishnu    science  90
Added item:
5   kishor    chem   99
Deleted item:
.................

I imported csv and done the comparasion via for loop with redlines. I am not getting the desire output. its confusing me a lot when flagging the added & deleted items between file 1 & file2 (csv files). pl suggest the effective code folks.

CodePudding user response:

The idea here is to flatten your dataframe with melt to compare each value:

# Load your csv files
df1 = pd.read_csv('file1.csv', ...)
df2 = pd.read_csv('file2.csv', ...)

# Select columns (not mandatory, it depends on your 'Sn' column)
cols = ['Name', 'Subject', 'Marks']

# Flat your dataframes
out1 = df1[cols].melt('Name', var_name='Item', value_name='Old')
out2 = df2[cols].melt('Name', var_name='Item', value_name='New')
out = pd.merge(out1, out2, on=['Name', 'Item'], how='outer')

# Flag the state of each item
condlist = [out['Old'] != out['New'],
            out['Old'].isna(),
            out['New'].isna()]

out['State'] = np.select(condlist, choicelist=['changed', 'added', 'deleted'], 
                         default='unchanged')

Output:

>>> out
     Name     Item      Old       New      State
0     Ram  Subject    Maths  computer    changed
1    sita  Subject  Engilsh   Engilsh  unchanged
2  vishnu  Subject  science   science  unchanged
3  balaji  Subject   social    social  unchanged
4     Ram    Marks       85        85  unchanged
5    sita    Marks       66        66  unchanged
6  vishnu    Marks       50        90    changed
7  balaji    Marks       60        60  unchanged
8  kishor  Subject      NaN      chem    changed
9  kishor    Marks      NaN        99    changed

CodePudding user response:

count, flag = 0, 1
for i, j in zip(df1.values, df2.values):
    if sum(i == j) != 4:
        if flag:
            print("Changed Items:")
            flag = 0
        print(j)
    count  = 1

if count != len(df2):
    print("Newly added:")
    print(*df2.iloc[count:, :].values)

enter image description here

  •  Tags:  
  • Related