I ACE | AA33cc55BB44 | | | I
I | AAAAAA-BB2CC-4424-1-22 | 11.113 | 10.09.2022 | bCa0111.XAC I
I | | | | I
I BAC | Aa315c5cab44 | | | I
I | 5564aa-BB2CC-44gd-1-22 | 21.334 | 10.09.2022 | Aba0221.XAC I
I | | | | I
I CAC | aacccc54BB44 | | | I
I | AAAAAA-BB2CC-aaaa-1-22 | 61.222 | 10.09.2022 | bCa0232.XAC I
I | | | | I
I DAC | ii2ii2ii2664 | | | I
I | BBBBBB-BB2CC-4424-1-22 | 81.888 | 10.09.2022 | Aba0243.XAC I
I have used this pattern:
\| (.*) \| \d{2}\.\d{3} \| \d{1,2}\.\d{1,2}\.\d{4} \| (.*) \I
Attributes that I want to grab:
Group I:
AA33cc55BB44
AAAAAA-BB2CC-4424-1-22
bCa0111.XAC
Group II:
Aa315c5cab44
5564aa-BB2CC-44gd-1-22
Aba0221.XAC
Group III:
aacccc54BB44
AAAAAA-BB2CC-aaaa-1-22
bCa0232.XAC
Group IV:
ii2ii2ii2664
BBBBBB-BB2CC-4424-1-22
Aba0243.XAC
Can anyone help me how I can get only these attributes from this text?
CodePudding user response:
You can use
(?m)^[^|\n]*\|[ \t]*([^\s|] ).*\n[^|\n]*\|[ \t]*(\S )\s*(?:\|[^|\n]*){2}\|[ \t]*(\S )
See the regex demo. Details:
(?m)-RegexOptions.Multilineoption on^- start of a line[^|\n]*- zero or more chars other than a newline and|\|- a|char[ \t]*- zero or more spaces or TABs (you may use[\p{Zs}\t]*here to match any Unicode horizontal whitespaces)([^\s|] )- Group 1: one or more chars other than whitespace and|.*- the rest of the line\n- a newline char[^|\n]*\|[ \t]*- zero or more chars other than a newline and|, then a|char and zero or more spaces or TABs(\S )- Group 2: one or more non-whitespace chars\s*- zero or more whitespaces(?:\|[^|\n]*){2}- two sequences of|and then zero or more chars other than|and whitespace\|- a|char[ \t]*- zero or more spaces or TABs(\S )- Group 3: one or more non-whitespace chars.
In C#:
var pattern = @"^[^|\n]*\|[ \t]*([^\s|] ).*\n[^|\n]*\|[ \t]*(\S )\s*(?:\|[^|\n]*){2}\|[ \t]*(\S )";
var matches = Regex.Matches(text, pattern, RegexOptions.Multiline);
for (Match m in matches)
{
Console.WriteLine("--- New match ---");
Console.WriteLine(m.Groups[1].Value);
Console.WriteLine(m.Groups[2].Value);
Console.WriteLine(m.Groups[3].Value);
}
