My goal is to split the string into groups. The problem is the current regex fails to recognize the part correctly.
The regex:
^(?: {2,})?(?P<TANGGAL>[0-9]{2}/[0-9]{2}){0,1}(?: {2,})?(?P<KETERANGAN1>[\w-/:] (?: [\w-/:] )*){0,1}(?: {2,})?(?P<KETERANGAN2>[\w-/:] (?: [\w-/:] )*){0,1}(?: {2})?(?P<SALDO>[\d,.] ){0,1}
The string:
01/07 SALDO AWAL 1,000.00
The problem: The regex captures:
1from the string1,000.00as GroupKETERANGAN2instead of GroupSALDO.,000.00as GroupSALDOinstead of capturing the whole1,000.00.
CodePudding user response:
You can change optional capturing groups into obligatory and move them into the optional non-capturing groups that match the column delimiters:
^(?: {2,}(?P<TANGGAL>[0-9]{2}/[0-9]{2}))?(?: {2,}(?P<KETERANGAN1>[\w/:-] (?: [\w/:-] )*))?(?: {2,}(?P<KETERANGAN2>[\w/:-] (?: [\w/:-] )*))?(?: {2,}(?P<SALDO>[\d,.] ))?$
See the regex demo.
Note the added $ end of string anchor, it is necessary to make sure the whole line is matched.
Details:
^- start of string(?: {2,}(?P<TANGGAL>[0-9]{2}/[0-9]{2}))?- an optional non-capturing group matching two or more spaces and then capturing into Group "TANGGAL" two digits,/, two digits(?: {2,}(?P<KETERANGAN1>[\w/:-] (?: [\w/:-] )*))?- an optional non-capturing group matching two or more spaces and then capturing into Group "KETERANGAN1" one or more word,/,:or-chars and then zero or more sequences of a space and then one or more word,/,:,-chars(?: {2,}(?P<KETERANGAN2>[\w/:-] (?: [\w/:-] )*))?- an optional non-capturing group matching two or more spaces and then capturing into Group "KETERANGAN2" one or more word,/,:or-chars and then zero or more sequences of a space and then one or more word,/,:,-chars(?: {2,}(?P<SALDO>[\d,.] ))?- an optional non-capturing group matching two or more spaces and then capturing into Group "SALDO" one or more digits,,or.chars$- end of string.
