Home > Enterprise >  What is a better alternative to comparing string.tolower?
What is a better alternative to comparing string.tolower?

Time:02-02

from my past experience and stackoverflow, I learned that String.ToLower() is bad in performance. Now I have the following issue, I need to filter out or do a specific action when specific words are in a giant list.

Old approach, which I want to fix:

if (input.Any(i => i.ToLower() == "alle" || i.ToLower() == "all" || i.ToLower() == "none") 
{
    // do something
}

I was thinking of using a hashset, but I am questioning the performance and how it handles the case sensitivity, I basically dont care about the case sensitivity. Does it make sense for me to use the hashset?

my current suggestion as a solution:

var unwantedInputsSet = new HashSet<string> {"alle", "all", "none"};
if (input.Any(i => i => unwantedInputsSet.Contains(i))) 
{
    // do something
}

Is there any better alternative to this or not. Do you have any ideas how to approach this better?

CodePudding user response:

You can pass comparer to the HashSet, for example StringComparer.InvariantCultureIgnoreCase:

var unwantedInputsSet = new HashSet<string>(StringComparer.InvariantCultureIgnoreCase) {"alle", "all", "none"};
if (input.Any(i => unwantedInputsSet.Contains(i))) 
{
    // do something
}

Or, as suggested in comments, use pattern matching:

if (input.Any(i => i.ToLower() is "alle" or "all" or "none") 
{
    // do something
}

Which should be turned by compiler into code similar to yours (though ToLower should be called once).

As for performance - it can be highly depended on the actual data and you should measure it using expected datasets. For small search set HashSet can be performing worse than several comparisons like:

var cmpr = StringComparison.InvariantCultureIgnoreCase;
if (input.Any(i => string.Equals(i, "alle", cmpr) || string.Equals(i, "all", cmpr) || string.Equals(i, "none", cmpr)))
{
    // do something
}

For such benchmarking I recommend looking into BenchmarkDotNet.

CodePudding user response:

If you want the code to be maximally performant, do the string comparisons using string.Equals() with a StringComparison.OrdinalIgnoreCase parameter.

If you want to make the code more readable and you don't care so much about performance, you can use a simple extension method to compare a string against a number of target strings:

public static class StringExt
{
    public static bool EqualsAnyOf(this string value, params string[] targets)
    {
        return targets.Any(target => target.Equals(value, StringComparison.OrdinalIgnoreCase));
    }
}

Then you could write your code like this:

if (input.Any(item => item.EqualsAnyOf("alle", "all", "none")))
{
    // ...
}

If you wanted to get really fancy you could also write an AnyEqualsAnyOf() extension method:

public static class StringExt
{
    public static bool EqualsAnyOf(this string value, params string[] targets)
    {
        return targets.Any(target => target.Equals(value, StringComparison.OrdinalIgnoreCase));
    }

    public static bool AnyEqualsAnyOf(this IEnumerable<string> sequence, params string[] targets)
    {
        return sequence.Any(item => item.EqualsAnyOf(targets));
    }
}

And then your code would just be:

if (input.AnyEqualsAnyOf("alle", "all", "none"))
{
    // ...
}

I personally wouldn't think it's worth doing that unless you find yourself writing this sort of code fairly often, but it's certainly an option.

CodePudding user response:

You can specify a IEqualityComparer<T> when declaring a HashSet<T>.

static readonly HashSet<string> unwanted =
    new(StringComparer.OrdinalIgnoreCase) { "alle", "all", "none" };

Also, I would make this a static readonly field to a void repeated creation.

The advantage of the HashSet solution is that it can be easily extended to more cases. It also allows you to read the unwanted words from a configuration file.

Since C# 9.0 you can use Target-typed new expressions. If you are using a version prior to C# 9.0:

static readonly HashSet<string> unwanted =
    new HashSet<string>(StringComparer.OrdinalIgnoreCase) { "alle", "all", "none" };

Usage:

if (input.Any(i => unwanted.Contains(i))) 
{
    // do something
}

without ToLower()!

  •  Tags:  
  • Related