I'm trying to extract locations from France. Here is a sample:
1#Tunisia#TS#TS#34#9#TS;4#Virsac, Aquitaine, France#FR#FR97#45.0333#-0.45#-1477568;4#Gironde, Aquitaine, France#FR#FR97#44.584#-0.089244#-1429418
It's basically a city, its region and its country. Hence, I did this:
^[2-5]#(.*?)#FR#
The result is:
Gironde, Aquitaine, France
This extracts correctly the city/region/country but it will extract only one of them. Is it possible to extract multiple entries ? The expected result would be:
Virsac, Aquitaine, France
Gironde, Aquitaine, France
Thanks in advance,
CodePudding user response:
Building off your current pattern, you need to replace the ^ anchor with a word boundary construct (to make sure the 2, 3, 4, or 5 are matched as standalone numbers) and replace .*? with [^#]* to disallow matching rightmost occurrence of the trailing delimiter pattern.
That is, you can use
\b[2-5]#([^#]*)#FR#
See the regex demo. Details:
\b- a word boundary[2-5]- a digit from2to5#- a#char([^#]*)- Group 1: zero or more chars other than##FR#- a#FR#string.
