Home > Software engineering >  Find string that are common or matched in a column of multiple txt files
Find string that are common or matched in a column of multiple txt files

Time:01-13

I have 336 txt files and each txt file has 4 columns. I need help to find string that are common or matched in a column 2 (Gene) in all txt files and extract that information in new txt file.

For example: how many times “kdpDE beta” present and if it is present then print ‘1’ in the next column of output txt file if “kdpDE beta” is absent then print ‘0’.

Thank you for your help.

File_1.txt
Name     Gene              Family                Class 
KB2908   kdpE beta         aminoglycoside        lactamase
KB2908   ugd               peptide               transferase

File_2.txt
Name    Gene              Family                  Class 
KB2909  kdpE beta         aminoglycoside          lactamase
KB2909  ugd               peptide                 transferase
KB2909  PmrF              macrolide               phosphotransferase

CodePudding user response:

You can use grep with wc to get a count of a certain string within a file. You can loop through it with a script to do this for every file in a directory. The following will loop through the directory, count the number of times <search term> appears in each file, and output it to a file called output.txt.

for FILE in *; do
  echo $FILE >> output.txt
  grep -o -i '<search term>' $FILE | wc -l >> output.txt
  echo >> output.txt
done
  •  Tags:  
  • Related