I have 336 txt files and each txt file has 4 columns. I need help to find string that are common or matched in a column 2 (Gene) in all txt files and extract that information in new txt file.
For example: how many times “kdpDE beta” present and if it is present then print ‘1’ in the next column of output txt file if “kdpDE beta” is absent then print ‘0’.
Thank you for your help.
File_1.txt
Name Gene Family Class
KB2908 kdpE beta aminoglycoside lactamase
KB2908 ugd peptide transferase
File_2.txt
Name Gene Family Class
KB2909 kdpE beta aminoglycoside lactamase
KB2909 ugd peptide transferase
KB2909 PmrF macrolide phosphotransferase
CodePudding user response:
You can use grep with wc to get a count of a certain string within a file. You can loop through it with a script to do this for every file in a directory. The following will loop through the directory, count the number of times <search term> appears in each file, and output it to a file called output.txt.
for FILE in *; do
echo $FILE >> output.txt
grep -o -i '<search term>' $FILE | wc -l >> output.txt
echo >> output.txt
done
