I have a regex code written in C# that basically adds a space between a number and a unit with some exceptions:
dosage_value = Regex.Replace(dosage_value, @"(\d)\s ", @"$1");
dosage_value = Regex.Replace(dosage_value, @"(\d)%\s ", @"$1%");
dosage_value = Regex.Replace(dosage_value, @"(\d (\.\d )?)", @"$1 ");
dosage_value = Regex.Replace(dosage_value, @"(\d)\s %", @"$1% ");
dosage_value = Regex.Replace(dosage_value, @"(\d)\s :", @"$1:");
dosage_value = Regex.Replace(dosage_value, @"(\d)\s e", @"$1e");
dosage_value = Regex.Replace(dosage_value, @"(\d)\s E", @"$1E");
Example:
10ANYUNIT
10:something
10 : something
10 %
40 e-5
40 E-05
should become
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
Exceptions are: %, E, e and :.
I have tried, but since my regex knowledge is not top-notch, would someone be able to help me reduce this code with same expected results?
Thank you!
CodePudding user response:
For your example data, you might use 2 capture groups where the second group is in an optional part.
In the callback of replace, check if capture group 2 exists. If it does, use is in the replacement, else add a space.
(\d (?:\.\d )?)(?:\s*([%:eE]))?
(Capture group 1\d (?:\.\d )?match 1 digits with an optional decimal part
)Close group 1(?:Non capture group to match a as a whole\s*([%:eE])Match optional whitespace chars, and capture 1 of%:eEin group 2
)?Close non capture group and make it optional
string[] strings = new string[]
{
"10ANYUNIT",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
string pattern = @"(\d (?:\.\d )?)(?:\s*([%:eE]))?";
var result = strings.Select(s =>
Regex.Replace(
s, pattern, m =>
m.Groups[1].Value (m.Groups[2].Success ? m.Groups[2].Value : " ")
)
);
Array.ForEach(result.ToArray(), Console.WriteLine);
Output
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
As in .NET \d can also match digits from other languages, \s can also match a newline and the start of the pattern might be a partial match, a bit more precise match can be:
\b([0-9] (?:\.[0-9] )?)(?:[\p{Zs}\t]*([%:eE]))?
CodePudding user response:
I think you need something like this:
dosage_value = Regex.Replace(dosage_value, @"(\d (\.\d*)?)\s*((E|e|%|:) )\s*", @"$1$3 ");
Group 1 - (\d (\.\d*)?)
Any number like 123 1241.23
Group 2 - ((E|e|%|:) )
Any of special symbols like E e % :
Group 1 and Group 2 could be separated with any number of whitespaces.
If it's not working as you asking, please provide some samples to test.
CodePudding user response:
For me it's too complex to be handled just by one regex. I suggest splitting into separate checks. See below code example - I used four different regexes, first is described in detail, the rest can be deduced based on first explanation :)
using System.Text.RegularExpressions;
var testStrings = new string[]
{
"10mg",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
foreach (var testString in testStrings)
{
Console.WriteLine($"Input: '{testString}', parsed: '{RegexReplace(testString)}'");
}
string RegexReplace(string input)
{
// First look for exponential notation.
// Pattern is: match zero or more whitespaces \s*
// Then match one or more digits and store it in first capturing group (\d )
// Then match one ore more whitespaces again.
// Then match part with exponent ([eE][- ]?\d ) and store it in second capturing group.
// It will match lower or uppercase 'e' with optional (due to ? operator) dash/plus sign and one ore more digits.
// Then match zero or more white spaces.
var expForMatch = Regex.Match(input, @"\s*(\d )\s ([eE][- ]?\d )\s*");
if(expForMatch.Success)
{
return $"{expForMatch.Groups[1].Value}{expForMatch.Groups[2].Value}";
}
var matchWithColon = Regex.Match(input, @"\s*(\d )\s*:\s*(\w )");
if (matchWithColon.Success)
{
return $"{matchWithColon.Groups[1].Value}:{matchWithColon.Groups[2].Value}";
}
var matchWithPercent = Regex.Match(input, @"\s*(\d )\s*%");
if (matchWithPercent.Success)
{
return $"{matchWithPercent.Groups[1].Value}%";
}
var matchWithUnit = Regex.Match(input, @"\s*(\d )\s*(\w )");
if (matchWithUnit.Success)
{
return $"{matchWithUnit.Groups[1].Value} {matchWithUnit.Groups[2].Value}";
}
return input;
}
Output is
Input: '10mg', parsed: '10 mg'
Input: '10:something', parsed: '10:something'
Input: '10 : something', parsed: '10:something'
Input: '10 %', parsed: '10%'
Input: '40 e-5', parsed: '40e-5'
Input: '40 E-05', parsed: '40E-05'
