- I want to look for various types of matches on the word "car" but not if its preceded by "Jane, Jane's, Janes, and Jane(s).
the following 2 regex partially work for exclusion and inclusion, but I can't get the other variants to work
- (?<!\bJane) car
- Jane car
for example
- the car is red - Match
- here is Jane car is red -> None
- here is Janes car is red -> None
- here is Jane's car is red -> None
I also want to find the cases Jane is in the phrase
- the car is red - None
- here is Jane car is red - Match
- here is Janes car is red - Match
- here is Jane's car is red - Match
and where car is not preceding by Jane(s)
- here Jane(s) car is red - None
- and of course the opposite
- here is Jane(s) car is red - Match
Edit
If I have a document with "red car\n and Janes car" this should be a Match as there is a reference to "car" without the word Jane/Janes/Jane's/etc. in front of it.
In fact, for additional clarity. I will be doing a re.Findall for all the occurrences of "car" without the word Jane in front of them.
CodePudding user response:
If you want to match it where the different forms of Jane should not occur, you can exclude the match with a negative lookahead, and then still match car
^(?!.*\bJane(?:'?s|\(s\))? car\b).*\bcar\b.*
^Start of string(?!Negative lookahead.*\bJane(?:'?s|\(s\))?MatchJaneJanesJane'sJane(s)car\bMatch a space and the word car
)Close the lookahead.*\bcar\b.*Match the whole line with the wordcarbetween word boundaries
If the different forms of Jane followed by car should be there, you can match it:
^.*\bJane(?:'?s|\(s\))? car\b.*
To matching all occurrences of car without the ones that have Jane in front of it, you can match what you don't want to keep and capture what you do want to keep.
Then in Python you can use re.findall which will return the capture group values and remove the empty entries from the result.
\bJane(?:'?s|\(s\))? car\b|\b(car)\b
