from my past experience and stackoverflow, I learned that String.ToLower() is bad in performance. Now I have the following issue, I need to filter out or do a specific action when specific words are in a giant list.
Old approach, which I want to fix:
if (input.Any(i => i.ToLower() == "alle" || i.ToLower() == "all" || i.ToLower() == "none")
{
// do something
}
I was thinking of using a hashset, but I am questioning the performance and how it handles the case sensitivity, I basically dont care about the case sensitivity. Does it make sense for me to use the hashset?
my current suggestion as a solution:
var unwantedInputsSet = new HashSet<string> {"alle", "all", "none"};
if (input.Any(i => i => unwantedInputsSet.Contains(i)))
{
// do something
}
Is there any better alternative to this or not. Do you have any ideas how to approach this better?
CodePudding user response:
You can pass comparer to the HashSet, for example StringComparer.InvariantCultureIgnoreCase:
var unwantedInputsSet = new HashSet<string>(StringComparer.InvariantCultureIgnoreCase) {"alle", "all", "none"};
if (input.Any(i => unwantedInputsSet.Contains(i)))
{
// do something
}
Or, as suggested in comments, use pattern matching:
if (input.Any(i => i.ToLower() is "alle" or "all" or "none")
{
// do something
}
Which should be turned by compiler into code similar to yours (though ToLower should be called once).
As for performance - it can be highly depended on the actual data and you should measure it using expected datasets. For small search set HashSet can be performing worse than several comparisons like:
var cmpr = StringComparison.InvariantCultureIgnoreCase;
if (input.Any(i => string.Equals(i, "alle", cmpr) || string.Equals(i, "all", cmpr) || string.Equals(i, "none", cmpr)))
{
// do something
}
For such benchmarking I recommend looking into BenchmarkDotNet.
CodePudding user response:
If you want the code to be maximally performant, do the string comparisons using string.Equals() with a StringComparison.OrdinalIgnoreCase parameter.
If you want to make the code more readable and you don't care so much about performance, you can use a simple extension method to compare a string against a number of target strings:
public static class StringExt
{
public static bool EqualsAnyOf(this string value, params string[] targets)
{
return targets.Any(target => target.Equals(value, StringComparison.OrdinalIgnoreCase));
}
}
Then you could write your code like this:
if (input.Any(item => item.EqualsAnyOf("alle", "all", "none")))
{
// ...
}
If you wanted to get really fancy you could also write an AnyEqualsAnyOf() extension method:
public static class StringExt
{
public static bool EqualsAnyOf(this string value, params string[] targets)
{
return targets.Any(target => target.Equals(value, StringComparison.OrdinalIgnoreCase));
}
public static bool AnyEqualsAnyOf(this IEnumerable<string> sequence, params string[] targets)
{
return sequence.Any(item => item.EqualsAnyOf(targets));
}
}
And then your code would just be:
if (input.AnyEqualsAnyOf("alle", "all", "none"))
{
// ...
}
I personally wouldn't think it's worth doing that unless you find yourself writing this sort of code fairly often, but it's certainly an option.
CodePudding user response:
You can specify a IEqualityComparer<T> when declaring a HashSet<T>.
static readonly HashSet<string> unwanted =
new(StringComparer.OrdinalIgnoreCase) { "alle", "all", "none" };
Also, I would make this a static readonly field to a void repeated creation.
The advantage of the HashSet solution is that it can be easily extended to more cases. It also allows you to read the unwanted words from a configuration file.
Since C# 9.0 you can use Target-typed new expressions. If you are using a version prior to C# 9.0:
static readonly HashSet<string> unwanted =
new HashSet<string>(StringComparer.OrdinalIgnoreCase) { "alle", "all", "none" };
Usage:
if (input.Any(i => unwanted.Contains(i)))
{
// do something
}
without ToLower()!
