So, i got some string that i want to get a pattern, the string has slight variation that can be string1 or string2
string1 = """
Rak penyimpanan berbentuk high chest dengan gaya American Country. Cocok digunakan untuk menyimpan
segala keperluan hunian Anda! Dibuat dengan rangka kayu mahoni, papan mdf dan finishing cat duco berkualitas. Kualitas ekspor akan menjamin kepuasan
Anda. Dikirim jadi, tanpa perakitan. Panjang 76 cm Kedalaman 40 cm Tinggi 120 cm
"""
string2 = """
Rak penyimpanan berbentuk high chest dengan gaya American Country. Cocok digunakan untuk menyimpan
segala keperluan hunian Anda! Dibuat dengan rangka kayu mahoni, papan mdf dan finishing cat duco berkualitas. Kualitas ekspor akan menjamin kepuasan
Anda. Dikirim jadi, tanpa perakitan. P 76 cm L 40 cm T 120 cm
"""
What i want to achieve is to capture group pattern and get (51, 23, 47-89) What i have done is create a pattern like this
pattern = (\bP|Panjang\b). (\d) . (\bL|Kedalaman\b). (\d) . (\bT|Tinggi\b). (\d) .[cm]
i have tried it in https://regexr.com/ but the group only capture the last digit such as (1,3,9) What am i missing, cause i already put after the \d in every group ?
CodePudding user response:
Regex
"(?:P|Panjang)\s(?P<P>\d )\scm\s(?:L|Kedalaman)\s(?P<L>\d )\scm\s(?:T|Tinggi)\s(?P<T>\d )\scm"g
About Regex:
- See Regex 101
- captures three groups:
P,LandT - groups should have the digits match.
CodePudding user response:
You can:
- change the
.to be more specific like\scm\sor\s - You can just match
cminstead of using a character class[cm]that might also matchccc - If you only want the digits, you can omit the capture groups around the names
For example
\bP(?:anjang)?\s(\d )\scm\s(?:L|Kedalaman)\s(\d )\scm\sT(?:inggi)?\s(\d )\scm\b
Explanation
\bA word boundary to prevent a partial word matchP(?:anjang)?\sMatchPand optionallyanjang(\d )\scm\sCapture 1 digits in group 1, and matchcm(?:L|Kedalaman)\sMatchLorKedalaman(\d )\scm\sCapture 1 digits in group 2 and matchcmT(?:inggi)?\sMatchTand optionallyinggi(\d )\scmCapture 1 digit in group 3 and matchcm\bA word boundary
