I am trying to parse Kubernetes Pod names for logging.
My pod names always look like this
<deployment>-<replicaset>-<uid>
<job>-<uid> <-- if created by a job
Here are some samples
events-worker-7c9b7bdc55-f7sgc
notification-585f6b94b8-t4jjc
report-generator-749ccf648d-gd9j7
static-content-8445d7f556-wbxvp
init-database-fm44h <-- if created by a job
What I am trying to get is the <deployment/job> part. For above samples this would be
events-worker
notification
report-generator
static-content
init-database
I started with something like this
(?<role_name>.*)(?:-[a-z0-9]{8,10})(?:-[a-z0-9] )
and ended with this
(?:(?<role_name>[a-z0-9] (?:-[a-z0-9] )*))-(?<=-)[a-z0-9] -(?:(?<=-)[a-z0-9] )
But I am unable to match both cases (when the name has a replicaset and if it has none)
It either does not match init-database-fm44h at all or only captures init instead of init-database.
Any help would be greatly appreciated
CodePudding user response:
You can use
\b(?<!-)(?<role_name>[a-z0-9] (?:-[a-z0-9] )*?)(?:-([a-f0-9]{10}))?-([a-z0-9] )\b(?!-)
See the regex demo.
Details:
\b(?<!-)- a word boundary not immediately preceded with-(?<role_name>[a-z0-9] (?:-[a-z0-9] )*?)- Group "role_name" with ID 1: one or more letters or digits and then zero or more sequences of-and one or more letters/digits as few times as possible(?:-([a-f0-9]{10}))?- an optional non-capturing group matching a-and then ten hex chars captured into Group 2-- a hyphen([a-z0-9] )- Group 3: one or more letters or digits\b(?!-)- a word boundary not immediately followed with-.
