I have a bunch of markdown files, where I want to search for Ruby's double colon :: outside of some code formatting (e.g. where I forgot to apply proper markdown). For example
`foo::bar`
hello `foo::bar` test
` example::with::whitespace `
```
Proper::Formatted
```
```
Module::WithIndendation
```
```
Some::Nested::Modules
```
```ruby
CodeBlock::WithSyntax
```
# Some::Class
## Another::Class Heading
some text
The regex only should match Some::Class and Another::Class, because they miss the surrounding backticks, and are also not within a multiline code fence block.
I have this regex, but it also matches the multi line block
[\s] [^`] (::)[^`] [\s]?
Any idea, how to exclude this?
EDIT:
It would be great, if the regex would work in Ruby, JS and on the command line for grep.
CodePudding user response:
For the original input, you may use this regex in ruby to match :: string
not preceded by a
`andnot preceded by
`followed a white-space:
Regex:
(?<!`\s)(?<!`)\b\w ::\w
RegEx Breakup:
(?<!\s): Negative lookbehind to assert that <code>and whitespace is not at preceding position(?<!): Negative lookbehind to assert that <code>is not at preceding position\b: Match word boundary\w: Match 1 word characters::: Match a::\w: Match 1 word characters
You can use this regex in Javascript:
(?<!`\w*\s*|::)\b\w (?:::\w )
For gnu-grep, consider this command:
grep -ZzoP '`\w*\s*\b\w ::\w (*SKIP)(*F)|\b\w ::\w ' file |
xargs -0 printf '%s\n'
Some::Class
Another::Class
