I tried this code:
string Input = TextBox1.Text;
string[] splitX = Regex.Split(Input, @"(?<=[|if|and|but|so|when|])");
Often this regular expression is applied @"(?<=[.?!])") to split a text into sentences. But I need to use words as a delimiter to split the text..
CodePudding user response:
It looks like you're trying to use a character set when you should be using a capture group with multiple possible matches. The [] characters indicate a character set which matches any of the enclosed characters. For example, in the other regex you provided, [.?!] matches either ., ?, or ! (though you probably want to escape the period with \. because . will match any character except newline). Thus, your regex is trying to match the characters |, i, f, and so on. I'm not sure what happens if you specify duplicate characters in a character set like you have (two ns and multiple |s), but the point is that this is the wrong regex construct to use.
The solution it simple: replace your square brackets with parenthesis. This turns that section of the regex into a capture group, which matches the contained regex and can have multiple possible matches separated by |. You should also only put the | between matches, so remove the first and last one. The correct regex would be:
(?<=(if|and|but|so|when))
CodePudding user response:
Since the question isn't specifically tagged on RegEx, nor do you specifically say that you need to perform the split within a RegEx operation..
But I need to use words as a delimiter to split the text..
Multiple words can be used as delimiters to identify where you want to split up your string like so:
string[] delimiters = {"if", "and", "but", "so", "when" };
var parts = srcString.Split(delimiters, StringSplitOptions.RemoveEmptyEntries);
So perhaps this approach gets you where you need, or perhaps there is a combination of approaches, (regex first, then apply this string split technique.... )
