The Microsoft documentation I found did not clearly answer my question because it did not clearly indicate what ToHashSet() would do with duplicates.
Are they included?, removed? or does it trigger an error?
In my searches and testing, it appears that duplicates are silently stripped. The other questions on StackOverflow assume that duplicates should be stripped.
This appears to be true for all objects as long as the .Equals() and .GetHashCode() are correctly overridden.
Am I correct or did I miss something? Assumptions create bugs.
A simple answer of Yes with a link to documentation is all I need.
Two triggers for this question are:
The documentation for ToDictionary specifically indicates that it will raise an error on duplicates.
Seeing code that does a
.Distinct()or.GroupBy()before calling.ToHashSet(). That implies that the developer either did not understand.ToHashSet()or that they were afraid of creating a bug.
CodePudding user response:
A set cannot contain duplicate elements, by definition.
.NET methods document which exception they can raise. ToHashSet() does not list any exception, so we can safely assume that it does not throw any.
There really are two possible implementations for ToHashSet():
- It delegates the work to this
HashSetconstructor. The documentation's "Remarks" section states:
If
collectioncontains duplicates, the set will contain one of each unique element. No exception will be thrown. Therefore, the size of the resulting set is not identical to the size of collection.
Therefore, identical elements are simply skipped. This is the implementation that is indeed used if you read the source.
- It enumerates the source sequence and repeatedly calls
Addon an initially empty set.Adddoes not throw, since it states to return:
trueif the element is added to theHashSet<T>object; false if the element is already present.
In any case, no exception will ever be thrown. Duplicates are ignored, and the resulting set contains only the unique elements.
