I need a regex that replaces the pieces of a URI that would create a high cardinality situation.
Basically if the segment of a URI contains any non a-zA-Z characters (other than /), replace it with an *
Example:
$ replace("/first/12ab34/B1a234/12B3a/1234/second/A789B-89d", r'(?i)[a-z]*\d (?i)[a-z]*',"*")
results in: "/first/**/**/**/*/second/*-*"
That's close, but I need "/first/*/*/*/*/second/*"
Multiple replaces are acceptable. Any regex masters out there willing to help? This is for vrl (vector.dev) written in Rust. VRL does not support look-around of any kind.
CodePudding user response:
For the example data, you might use
(?i)[a-z]*\d[\da-z]*(?:-[\da-z] )*
(?i)Inline modifier for case insensitive[a-z]*Match optional chars a-z\dMatch a single digit[\da-z]*Match optional digits or chars a-z(?:-[\da-z] )*optionally repeat a-and 1 times either a digit or a-z
CodePudding user response:
Use
[^/\d]*\d[^/]*
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
[^/\d]* any character except: '/', digits (0-9) (0
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
[^/]* any character except: '/' (0 or more times
(matching the most amount possible))
