James Michael Hare

...hare-brained ideas from the realm of software development...
posts - 137 , comments - 1099 , trackbacks - 0

My Links

News

Welcome to my blog! I'm a Sr. Software Development Engineer in Seattle, WA. I've been doing C++/C#/Java development for over 18 years, but have definitely learned that there is always more to learn!

All thoughts and opinions expressed in my blog and my comments are my own and do not represent the thoughts of my employer.

Blogs I Read

MCC Logo MVP Logo

Follow BlkRabbitCoder on Twitter

Tag Cloud

Archives

Post Categories

C#/.NET Toolbox: Adding a ToHashSet() Extension Method

This post is another in a series that will just contain little GUCs (Generic Utility Classes) I’ve developed along the way. If these already exist as part of the framework and I’ve overlooked them, feel free to let me know! And if you know of a better way to implement them, do the same! I’m never too old to learn something new (I hope!).

I've blogged in the past about how useful the overlooked HashSet class can be (here) and about the very useful ToDictionary(), ToList(), and ToLookup() LINQ extension methods (here and here).  Unfortunately, the .NET Framework doesn't provide a ToHashSet() method, so I set out to create one of my own.

Yes, you can always call the form of the constructor of HashSet which takes an IEnumerable<T> to add in the set, but this lacks some of the grace and fluidity that makes ToList(), ToDictionary(), and ToLookup() so easy to read and use.

A Quick Recap of Sets

Remember that the set is a very useful data structure for keeping track of a unique collection of (typically) unordered items.  In addition, with set arithmetic you can perform intersections, unions, differences, subset, superset, and membership operations.  The HashSet, in particular, is very efficient at performing these operations because it uses hash codes (much like the Dictionary) to achieve an amortized O(1) efficiency for lookups.

There are many times I've seen folks use a Dictionary to store the same key and value just so they can efficiently check membership (that is, is item A in the set {A, B, C, D, … }), when in reality this is exactly what the HashSet is geared for.  It has the same efficient O(1) lookup, but is more efficient because it only has a value component and not a key and value pair component.

A Quick Recap of ToDictionary()

Remember with the ToDictionary() extension method (in the System.Linq namespace) you can take any IEnumerable<T> and store it in a Dictionary given several options.  For the purposes of all future usage examples, we'll assume this simple POCO exists…

   1: public sealed class Product
   2: {
   3:     public int Id { get; set; }
   4:     public string Name { get; set; }
   5:     public string Category { get; set; }
   6:     public double Value { get; set; }
   7: }

Now that we've defined that POCO, let's say it's loaded up like so:

   1: var products = new List<Product>
   2:     {
   3:         new Product { Id = 10521, Name = "LCD TV", Category = "Electronics", Value = 735.99 },
   4:         new Product { Id = 19254, Name = "DVD Player", Category = "Electronics", Value = 239.15 },
   5:         new Product { Id = 13241, Name = "Treadmill", Category = "Fitness", Value = 390.92 },
   6:         new Product { Id = 91312, Name = "Bread Machine", Category = "Kitchen", Value = 192.14 },
   7:         new Product { Id = 91342, Name = "Toaster", Category = "Kitchen", Value = 39.14 }
   8:     };

So given this sample data set, we could use ToDictionary() to store the contents of the list in a key/value lookup structure:

   1: // creates a dictionary where the key is ID (int) and value is the Product
   2: var productsById = products.ToDictionary(p => p.Id);
   3:  
   4: // creates a dictionary where the key is ID (int) and the value is the Name (string)
   5: var productNamesById = products.ToDictionary(p => p.Id, p => p.Name);
   6:  
   7: // creates a dictionary where the key is Name (case insensitive) and value is product
   8: var productsByName = products.ToDictionary(p => p.Name, 
   9:     StringComparer.InvariantCultureIgnoreCase);
  10:  
  11: // creates a dictionary where key is Name and value is Value
  12: var productValuesByName = products.ToDictionary(p => p.Name, p => p.Value, 
  13:     StringComparer.InvariantCultureIgnoreCase);

Notice that ToDictionary() has four overloads.  You always have to at least specify a key selector, but you can optionally also specify a value selector or IEqualityComparer or both (or neither). 

Developing the ToHashSet() Extension Method

The ToHashSet() will be very similar to ToDictionary(), but also somewhat simplified since that we won't have a separate key selector and value selector (since the key and value of a set are one and the same).

So for our ToHashSet() let's define the following overloads:

  1. ToHashSet() - takes nothing, this will use the default comparer and identity delegate (the item will be the set member).
  2. ToHashSet(selector) - takes a selector which converts the item to a different item for set, default comparer.
  3. ToHashSet(comparer) - takes a comparer which determines equality and hash code, uses identity delegate (the item will be set member).
  4. ToHashSet(selector, comparer) - takes both a selector to determine the set member from the item, and a comparer which determines equality and hash code.

Because I'm a firm believer in cross-calling method overloads to avoid duplicating logic, let's start with the most complex overload first.  All others are a general case of this method, so it will be easy to set up the cross call.  Now, all four of these methods will be defined in the same static class, but I'll omit the others while discussing each one for brevity -- obviously the final class will contain all of the four overloads together.

   1: public static class EnumerableExtensions
   2: {
   3:     // creates HashSet from IEnumerable given selector and comparer
   4:     public static HashSet<TElement> ToHashSet<TSource, TElement>(this IEnumerable<TSource> source, 
   5:         Func<TSource, TElement> elementSelector, IEqualityComparer<TElement> comparer)
   6:     {
   7:         if (source == null) throw new ArgumentNullException("source");
   8:         if (elementSelector == null) throw new ArgumentNullException("elementSelector");
   9:     
  10:         // you can unroll this into a foreach if you want efficiency gain, but for brevity...
  11:         return new HashSet<TElement>(source.Select(elementSelector), comparer);
  12:     }
  13: }

Okay, now that we have the most specific version, we can create the other overloads as simplifications of this call.  Note that  we must always have a selector even if it's identity (that is, item => item) but we can pass a null comparer to HashSet<T>, in which case HashSet<T> will use EqualityComparer<T>.Default.

Now let's look at the simplest call where everything is defaulted -- in this one the we pass a comparer of null and the selector of the identity function to the most specific overload.

   1: public static class EnumerableExtensions
   2: {
   3:     // other ToHashSet() overloads...
   4:     
   5:     // Creates a HashSet of TSource from an IEnumerable of TSource using the identity 
   6:     // selector and default equality comparer.
   7:     public static HashSet<TSource> ToHashSet<TSource>(this IEnumerable<TSource> source)
   8:     {
   9:         // key selector is identity fxn and null is default comparer
  10:         return source.ToHashSet<TSource, TSource>(item => item, null);
  11:     }
  12:  
  13:     // other ToHashSet() overloads...
  14: }

And then we'll have the overload with just the comparer and the selector is the identity function.

   1: public static class EnumerableExtensions
   2: {
   3:     // other ToHashSet() overloads...
   4:  
   5:     // Creates a HashSet of TSource from an IEnumerable of TSource using the identity 
   6:     // selector and specified equality comparer.
   7:     public static HashSet<TSource> ToHashSet<TSource>(this IEnumerable<TSource> source, 
   8:         IEqualityComparer<TSource> comparer)
   9:     {
  10:         return source.ToHashSet<TSource, TSource>(item => item, comparer);
  11:     }
  12:     
  13:     // other ToHashSet() overloads...
  14: }

And finally the overload with just the selector and the default equality comparer.

   1: public static class EnumerableExtensions
   2: {
   3:     // other ToHashSet() overloads...
   4:  
   5:     // Creates a HashSet of TElement from an IEnumerable of TSource using the specified 
   6:     // element selector and default equality comparer.
   7:     public static HashSet<TElement> ToHashSet<TSource, TElement>(this IEnumerable<TSource> source, 
   8:         Func<TSource, TElement> elementSelector)
   9:     {
  10:         return source.ToHashSet<TSource, TElement>(elementSelector, null);
  11:     }
  12: }

Using the New ToHashSet() Extension Method

Now that we have these extension methods in our library, we can use them in much the same way as the LINQ ToDictionary() extension method:

   1: // creates a set with just product IDs
   2: var productIds = products.ToHashSet(p => p.Id);
   3:  
   4: // create a set containing the product names, case insensitive
   5: var productNames = products.ToHashSet(p => p.Name, StringComparer.InvariantCultureIgnoreCase);
   6:  
   7: // or for queries where you already have the item narrowed down and want to store item itself
   8: var namesLongerThanFive = products.Select(p => p.Name).Where(n => n.Length > 5).ToHashSet();
   9:  
  10: // or for queries where you have it narrowed down, want to store item itself, but with 
  11: // non-default equality comparer
  12: var caseInsensitiveNames = products.Select(p => p.Category + p.Name)
  13:     .ToHashSet(StringComparer.InvariantCultureIgnoreCase);

Typically, you'll use the two overloads that do not take a selector when you want the type and value of the HashSet to be the same type held by the IEnumerable.  It is possible you'd use a selector even if the types are identical if you want to put a different computed value in the set.  For example, you could have an IEnumerable<string> to put in a HashSet<string> but want the HashSet to have a substring instead of the original value.  As for the comparer, typically you'll use the default unless you are putting custom classes in the HashSet that don't correctly implement Equals() and GetHashCode(), or if the default implementation of them is not what you need (for example if you want case-insensitive string equality, etc.).

Summary

If you love using the ToDictionary(), ToList(), and ToLookup() extension methods and use the HashSet consider adding this extension method class to your toolbox!  It can come in handy for creating quick O(1) lookup sets for checking membership and other set operations!

Click here to download the source for the EnumerableExtensions.ToHashSet

Print | posted on Thursday, March 31, 2011 6:21 PM | Filed Under [ My Blog C# Software .NET Toolbox ]

Feedback

Gravatar

# re: C#/.NET Toolbox: Adding a ToHashSet() Extension Method

What about suggesting this extension method be added to the BCL? Surely they should be able to make it to .NET 5. And although I know it's pretty simply code, there's no reason for everybody to reproduce the same implementation all over.
4/1/2011 5:40 AM | Anders Borum
Post A Comment
Title:
Name:
Email:
Comment:
Verification:
 

Powered by: