I have a file with 4 columns and >1 million rows with scores between 0 and 100. I am capable of filtering the file to keep rows where at least one column has a minimum value of 20 with a code that looks like the one below, using awk and OR operators (In reality my file has 50 columns so the code I use goes all the way to $50 >= 20).
awk '{if (($1 >= 20) || ($2 >=20) || ($3 >= 20) || ($4 >= 20)) print $0 }' file
But I would now like to filter rows where at least TWO columns have a minimum value of 20. I cannot think of an AND operator that would satisfy this criteria. Could someone please recommend any way to achieve this? Perhaps there is a solution that does not use awk?
Thanks!
CodePudding user response:
Perhaps there is a solution that does not use awk?
perl to the rescue!
$ cat input.txt
1 2 3 4
30 5 6 60
7 8 9 10
11 100 12 120 13
$ perl -ane 'print if (grep { $_ >= 20 } @F) >= 2' input.txt
30 5 6 60
11 100 12 120 13
Split each line into fields based on whitespace like awk does, then filter those fields to just those that are greater than or equal to 20. If there's at least 2 such fields, print the entire line.
CodePudding user response:
With your shown samples, please try following awk code. Fair warning tested with small samples only, didn't test it on huge file or so.
awk '
{
for(i=1;i<=NF;i ){
if($i>=20) { count }
if(count==2){ print;
count=0
fflush()
next
}
}
}' Input_file
