Question
I'm trying to match PowerShell dash comments (# ...) but not inline comments (<# .. #>) in same regex. How can I achieve it?
Goal
Match
I'd like to match PowerShell comments (using hashtag comment syntax). So simply everything after # is commented out. I use #(.*$)/gm for it.
Test-cases where the regex match is written inside brackets [..]:
Write-Host "Hello world" [# comment here][# A line with only comment]Comment without whitespace[#before][Comment with whitespace [#after ]
Do not match
However what I'd like to use here is have an exception for "inline comments syntax". Inline comments in PowerShell looks like lorem <# inline comment #> ipsus.
So here I'm looking for exclusions for:
Write-Host "Hello world" <# inline comment here #><# A line with only inline comment #>Comment without whitespace<#no whitespace#>aroundInline comment <# in middle #> of lineComment with whitespace #comment with >Comment with whitespace #comment with <Comment with whitespace #comment with <# test #>
What I tried
I tried to use [^<>] for something like #[^<>](.*[^<>]$) but it did not work for all cases given in the above.
My progress on regex101 until I got stuck.
Why
I'm parsing PowerShell in JavaScript/TypeScript runtime to be able to inline them to run them in batch (cmd) for a community driven open-source project. I know there will be exceptions to this (like strings with dashes inside) but I trade off simple regex parsing for robustness.
Thank you!
CodePudding user response:
I suggest checking for < before a # char and convert all negated character classes into negative lookarounds to avoid crossing over line boundaries:
#(?<!<#)(?![<>])(.*)$(?<![<>])
// Or, to also check for #> after <# use
#(?<!<#(?=.*#>))(?![<>])(.*)$(?<![<>])
See the regex demo. Remove (?<![<>]) negative lookbehind if you do not want to fail the match if the line ends with < or >.
Details:
#- a#char(?<!<#)- no<#allowed immediately to the left of the current location (note this check is only triggered after#, so that the regex engine could check only the positions after#, not every position in the string ((?<!<#(?=.*#>))lookbehind with a nested lookahead makes sure the#matched is not the second char of a<#...#>substring)(?![<>])- immediately on the right, there must be no<and>(.*)- Group 1: any zero or more chars other than line break chars, as many as possible$- end of string(?<![<>])- at the end of string, there must be no<and>chars.
