I would like to implement a regular expression in bash that allows me to verify a series of characteristics on a dataset. A sample is attached below:
id, date of birth, grade, explusion, serious misdemeanor
123,2005-01-01,5.36,1,1
582,1999-05-12,8.51,0,1
9274,2001-25-12,9.65,0,0
21,2006-14-05,0.53,4,1
id is required to have only 3 digits, date of birth less than 2000, minimum grade point average is 5.60 with the second decimal place being other than 0, and at least one expulsion or serious misconduct.
The result of executing the regular expression should be:
582, 1999-05-12, 8.51, 0, 1
I have tried to implement the following regular expression and it does not give me any result.
grep -E "^\d{0,3},[0-2][0-9][0-9][0-9].*,[1-5].[0-5][1-9],[1-9],[1-9]$"
Any idea?
CodePudding user response:
If it is mandatory to use grep, would you please try:
grep -E '^[0-9]{1,3},1[0-9]{3}(-[0-9]{2}){2},(5\.[6-9][1-9]|[6-9]\.[0-9][1-9]|[1-9][0-9] \.[0-9][1-9]),([1-9][0-9]*,[0-9] |[0-9] ,[1-9][0-9]*)$' input_file
Result:
582,1999-05-12,8.51,0,1
[0-9]{1,3}matches ifidhas 1-3 digits. (I have interpretedonly 3 digitslike that. If it means differently, tweak the regex accordingly.)1[0-9]{3}(-[0-9]{2}){2}matches if thebirth yearis before 200 exclusive.(5\.[6-9][1-9]|[6-9]\.[0-9][1-9]|[1-9][0-9] \.[0-9][1-9])matches ifgradeis greater than 5.60 with the second decimal place being other than 0.([1-9][0-9]*,[0-9] |[0-9] ,[1-9][0-9]*)matches if either or both ofexplusionandserious misdemeanorhave non-zero value.
CodePudding user response:
Regular expressions do not understand numeric values, and they certainly do not understand boolean logic. All it knows is text. You'll need to use an actual programming language like Awk or Perl to do this.
Here's an example:
$ perl -l -a -F, -E'say if length($F[0])>3 || $F[2] < 5.60' foo.txt
123,2005-01-01,5.36,1,1
9274,2001-25-12,9.65,0,0
21,2006-14-05,0.53,4,1
This call to perl splits apart the fields on commas, and then prints the line if the length of the first column is over 3, or the value of the third column is less than 5.60.
This is just a starting point, but this is the direction to go.
