Need to compare first column from File1 with the first column of File2. If matches, then compare second column of the two files. If second column is not matching then print the line from File1 and save the output in another file.
files1.txt
80002288 b17
97380002001 b18
97380002220 b17
97380002233 b18
80002333 b17
16501111 b04
16505044 b04
16505042 b04
97316505030 b05
16505043 b04
16505048 b04
Files2.txt
97366630003 a01
97380002288 b17
97380002001 b17
97380002220 b17
97380002233 b17
97380002333 b17
97316501111 b04
97316505044 b04
97316505042 b04
97316505030 b04
97316505043 b04
Desired Output
97380002001 b17
97316505030 b04
CodePudding user response:
Approach 1: without any external library
Use the below code to get the output using only python
with open('files3.txt', 'w') as files3:
with open('files1.txt') as files1:
for line_a in files1.readlines():
words_a = line_a.split()
with open('files2.txt') as files2:
for line_b in files2.readlines():
words_b = line_b.split()
if words_a[0] == words_b[0] and words_a[1] != words_b[1]:
diff_words = ' '.join(words_b)
files3.write(diff_words '\n')
print(diff_words)
Output of above code
97380002001 b17
97380002233 b17
97316505030 b04
Approach 2: using Pandas library
You can use the pandas library of python to achieve this. So first install pandas library like:
pip install pandas
Then run below python code to create the desired file
import pandas as pd
# you can replace files1.txt and files2.txt with the complete path if files aren't in the same folder
df1 = pd.read_csv("files1.txt", sep=r'\s ', names=['c1', 'c2'])
df2 = pd.read_csv("files2.txt", sep=r'\s ', names=['c1', 'c2'])
df3 = pd.merge(df1, df2, on='c1')
df3 = df3[(df3["c2_x"] != (df3["c2_y"]))]
# use below if you want to save values from file 2
print(df3[['c1', 'c2_y']].to_string(index=False, header=False))
df3[['c1', 'c2_y']].to_csv("files3.txt", sep=' ', index=False, header=False)
# use below if you want to save values from file 1
# print(df3[['c1', 'c2_x']].to_string(index=False, header=False))
# df3[['c1', 'c2_x']].to_csv("Files3.txt", sep=' ', index=False, header=False)
# use below code to save values from both files
# print(df3.to_string(index=False, header=False))
# df3.to_csv("Files3.txt", sep=' ', index=False, header=False)
Output of above code
97380002001 b17
97380002233 b17
97316505030 b04
CodePudding user response:
Either of these is probably what you want but your posted expected output doesn't match either interpretation of your requirements. Using any awk in any shell on every Unix box:
To print the lines from file1:
$ awk 'NR==FNR{a[$1]=$2; next} ($1 in a) && (a[$1] != $2)' file2 file1
97380002001 b18
97380002233 b18
97316505030 b05
To print the lines from file2 just swap the input file names:
$ awk 'NR==FNR{a[$1]=$2; next} ($1 in a) && (a[$1] != $2)' file1 file2
97380002001 b17
97380002233 b17
97316505030 b04
