I want to remove Specific words with dot and without dot like (Pvt. ,Ltd. ,Pvt ,Ltd ,Pte. ,Pte ,Co., Co, Private Limited, Inc. , Incorporated) from the string and it should capture rest of the data available.
I have tried using
"\(|\)|-|\.|Pvt|Ltd|Incorporated|Pte|Inc|Co|Private|\s"
but it's not working.
Example text:
0.5Bn FinHealth Pvt. Ltd.Inc. Pte.Co.Private Limited Incorporated,
0.5Bn FinHealth Ltd.,
1MG Technologies Pvt. Ltd.,
I need help to improve the regex.
CodePudding user response:
Maybe give the following pattern a try:
(?:\s*\b(?:(?:Pvt|Ltd|Pte|Co)\.?|Inc\.|Incorporated|Private Limited))
See an online demo
(?:- Open 1st non-capture group;\s*- 0 (Greedy) whitespace characters;\b- A word-boundary;(?:- Open a nested 2nd non-capture group;(?:Pvt|Ltd|Pte|Co)- A 3rd nested non-capture group with the alternatives that can have optional dot behind;\.?- An optional literal dot;|- Or;Inc\.- Literally match 'Inc.';|- Or;Incorporated- Literally match 'Incorporated';|- Or;Private Limited- Literally match 'Private Limited';))- Close non-capture groups and match the 1st one 1 times.
Replace matches with empty string.
Note: I was unsure what you meant to do with \(|\)|-|\. but my guess is you want to replace certain stand-alone characters. If so, you can include a character-class, for example: [().-] to replace these in another alternation.
