I want to find a regex that will allow me to match uppercase, lowercase and spaces in between.
That is, below you can see a sample of what I want to collect.
id,name,continent
1,Louise,Latin America
2,Sasha,Asia
3,Mike,North America
What I am doing is that inside a while I check if the records comply with the regex. But I have found that those that have a space in between are not picked up (such as North America or Latin America). You can see my code here
while read line; do
if [["$line"=~^.*,.*,[a-zA-Z ]*
I've also tried [a-zA-Z\n]* but does not work.
Any idea?
CodePudding user response:
You can use
rx='^[0-9]*,[^,]*,[[:alpha:][:space:]]*$'
while read -r line; do
if [[ "$line" =~ $rx ]]; then
// Do something
fi
done < file
Details:
^- string start[0-9]*- zero or more digits (looks like yourIDcolumn can only contain digits),- a comma[^,]*- any zero or more chars other than,(.*is too generic and matches any text, thus it will report valid if the line contains more than three columns),- a comma[[:alpha:][:space:]]*- zero or more letters or spaces$- end of string.
See the online demo:
#!/bin/bash
s='id,name,continent
1,Louise,Latin America
2,Sasha,Asia
3,Mike,North America'
rx='^[0-9]*,[^,]*,[[:alpha:][:space:]]*$'
while read -r line; do
if [[ "$line" =~ $rx ]]; then
echo "$line: Valid"
else
echo "$line: Invalid"
fi
done <<< "$s"
Output:
id,name,continent: Invalid
1,Louise,Latin America: Valid
2,Sasha,Asia: Valid
3,Mike,North America: Valid
