James Michael Hare

...hare-brained ideas from the realm of software development...
posts - 137 , comments - 1100 , trackbacks - 0

My Links

News

Welcome to my blog! I'm a Sr. Software Development Engineer in Seattle, WA. I've been doing C++/C#/Java development for over 18 years, but have definitely learned that there is always more to learn!

All thoughts and opinions expressed in my blog and my comments are my own and do not represent the thoughts of my employer.

Blogs I Read

MCC Logo MVP Logo

Follow BlkRabbitCoder on Twitter

Tag Cloud

Archives

Post Categories

C#/.NET Fundamentals: Returning Zero or One Item As IEnumerable<T>

The C#/.NET Fundamentals series is geared towards examining fundamental concepts in using C# (and .NET in general) to produce effective solutions. 

There are times when we are writing a method that returns a sequence of items, that it occasionally becomes necessary in base-class, interface implementation, error, or default conditions to return a sequence of only one or even zero items.  There are many ways to do this, of course, which begs the question of which way is best, in terms of readability, maintainability, and performance. 

The Situation

For our situation, we are talking about returning a sequence from a method that returns IEnumerable<T>.  If your method directly returns List<T> or T[] then your choice on how to do that will be a bit obvious, but in the case of just returning a sequence of, IEnumerable<T> we have a lot of flexibility.

There are many reasons we may want to do this, perhaps we have a method that returns IEnumerable<T> in a virtual method, and perhaps in some sub-class implementations we want it to return an empty (or 1-length) sequence. 

Similarly, we could be implementing an interface that has a method that returns a sequence, and our implementation needs to return only a default empty (or 1-length) sequence. 

Alternatively, perhaps we have a method that returns a sequence of data from a database, but that it’s possible that the database may be offline or the data in the table is optional, in which case we don’t want to throw, but want to return an empty (or 1-length) sequence.

Regardless of the use case, there are times when a zero or one-length sequence is desirable.  For the purposes of the examples below and in the interest of brevity, let’s assume that we are dealing with the former case where we want to provide an empty (or 1-length) sequence for an interface implementation, such as the one below:

   1: public interface IParser
   2: {
   3:     // a sequence of items that can start a comment
   4:     IEnumerable<string> CommentStarters { get; }
   5:  
   6:     ...
   7: }

In the interface above, perhaps there is a property called CommentStarters that returns a sequence of characters that can tell the parser to ignore the line.  For the purposes of our example, we’ll hard code the sequences, but obviously the sequences of one item could be determined at run-time as necessary.

Returning an Empty Sequence

So, for whatever reason in our design we’ve decided that if certain situations exist in our code, we want to return an empty sequence instead of an error, etc.   Of course, one could argue for returning null, though sometimes this option isn’t always desirable because it adds a level of checking that must be done every time you wish to query the contents. 

Array:

So how can we create empty collections to represent an IEnumerable<T>?  Perhaps one of the easiest ways is to simply return a zero-length array:

   1: public class Parser
   2: {
   3:     // a sequence of items that can start a comment
   4:     public IEnumerable<string> CommentStarters 
   5:     {
   6:         get
   7:         {
   8:             return new string[0];
   9:         }
  10:     }
  11:  
  12:     ...
  13: }

The nice thing about returning zero-length arrays is that it’s very easy to do, and relatively performant.  In addition, a zero-length array is immutable by nature, it can not have its size or contents changed.

List<T>:

Of course, we could just as easily return a List<T> of zero length:

   1: public class Parser
   2: {
   3:     // a sequence of items that can start a comment
   4:     public IEnumerable<string> CommentStarters 
   5:     {
   6:         get
   7:         {
   8:             return new List<string>();
   9:         }
  10:     }
  11:  
  12:     ...
  13: }

This works as well, but List<T> is not immutable and can be changed if the caller happens to know it’s a List<T> and casts it back: 

   1: // if we know it returns a List<string>, we can cast it back...
   2: var commentStarters = (List<string>)p.CommentStarters;
   3:  
   4: // which lets us mutate, this could be a problem...
   5: commentStarters.Add("///");

Also, the List<T> is a heavier construct, which would add more overhead than needed. 

Iterator:

Another way to return zero items in a sequence is to use an iterator.  Thus a simple method that returns no items would be a simple yield break:

   1: public class Parser
   2: {
   3:     // a sequence of items that can start a comment
   4:     public IEnumerable<string> CommentStarters 
   5:     {
   6:         get
   7:         {
   8:             yield break;
   9:         }
  10:     }
  11:  
  12:     ...
  13: }

This is very clean, and is immutable, but it does create an iterator, which can be a bit heavier.  In addition, if you are wanting to return zero item sequence inside a method that also returns a sequence directly, this will not work since you can’t mix iterator returns and non iterator returns in a method.

Enumerable.Empty<T>():

Finally, there is a handy static method off of the Enumerable class called Empty<T>() which returns a singleton stand-in for an empty enumerable sequence of the given type.  This is nice in that it will not allocate memory every time you need to return an empty sequence:

   1: public class Parser
   2: {
   3:     // a sequence of items that can start a comment
   4:     public IEnumerable<string> CommentStarters
   5:     {
   6:         get
   7:         {
   8:             return Enumerable.Empty<string>();
   9:         }
  10:     }
  11: }

This has the benefit of being immutable, and being a singleton.  In other words, the caller can’t modify what you return, and it will not allocate any new memory every time this is called (it creates the empty singleton for the sequence of type T once).

Results:

So how do these measure up, performance wise?  Let’s perform all of these over 1 billion iterations:

Array List Iterator Empty
6,654 ms 11,107 ms 9,522 ms 4,926 ms

So as you can see, the fastest solution is to use the Enumerable.Empty<T>(), which also has the most benefits:

  • It’s the fastest.
  • It doesn’t generate as much memory overhead.
  • It is extremely readable in it’s intention.

As always, keep in mind that these time measurements are over 1 billion iterations, and thus the actual time for any single one of these calls is somewhat negligible.  However, they can be useful for comparing the relative performance.  In the end, I’d say stick with Enumerable.Empty<T>() not just because it is the fastest, but also because it returns an immutable sequence and is easy to read and understand.

Returning a Sequence of One

Now, what if our method is to return a sequence that contains only one item?  Unfortunately, there’s no implicit way to just convert a single item into a sequence of one item, but there are several options, very similar to producing empty sequences…

Array:

Just as we can easily create an empty array, we can also just as easily create an array of a single item:

   1: public class Parser
   2: {
   3:     // a sequence of items that can start a comment
   4:     public IEnumerable<string> CommentStarters 
   5:     {
   6:         get
   7:         {
   8:             return new[] { "//" };
   9:         }
  10:     }
  11:  
  12:     ...
  13: }

We don’t have to explicitly type the array, since it can tell from the initialization list that the type is string.  This is a very simple way to create a single-item sequence, however it does have an issue in that the contents of the first item are mutable if cast back to an array:

   1: var p = new Parser();
   2:  
   3: // if we know it returns a string[], we can cast it back...
   4: var commentStarters = (string[])p.CommentStarters;
   5:  
   6: // which lets us mutate, this could be a problem...
   7: commentStarters[0] = "/*";

This mutability of the array contents could be an issue, so let’s explore other options…

List<T>:

Of course, we could just as easily return a List<T> of one item:

   1: public class Parser
   2: {
   3:     // a sequence of items that can start a comment
   4:     public IEnumerable<string> CommentStarters 
   5:     {
   6:         get
   7:         {
   8:             return new List<string>(1) { "//" };
   9:         }
  10:     }
  11:  
  12:     ...
  13: }

Note that for efficiency, we pre-size the List<string> to 1 in the constructor to avoid unnecessary over-allocation.  The main problems with List<T> for this purpose is that it’s heavier than T[] and a little less safe, because once again if they can cast it back to List<T> then they can not only change the contents at an index, but they can add or remove items:

   1: // if we know it returns a List<string>, we can cast it back...
   2: var commentStarters = (List<string>)p.CommentStarters;
   3:  
   4: // which lets us mutate, this could be a problem...
   5: commentStarters[0] = "/*";
   6: commentStarters.Add("///");

Iterator:

Just as you can use an iterator to return no items in the sequence, you can also use it to return a sequence of only one item by performing a yield return on the item directly:

   1: public class Parser
   2: {
   3:     // a sequence of items that can start a comment
   4:     public IEnumerable<string> CommentStarters 
   5:     {
   6:         get
   7:         {
   8:             yield return "//";
   9:         }
  10:     }
  11:  
  12:     ...
  13: }

This is very clean, and is completely immutable, but it does create an iterator, which can be a bit heavier.  In addition, as mentioned before you cannot mix yield return and return in a method, so if you wanted to use this trick in a method that also has a return you’d need to create a helper method to return a singleton sequence, like:

   1: public static class EnumerableExtensions
   2: {
   3:     public static IEnumerable<T> AsEnumerable<T>(this T value)
   4:     {
   5:         yield return value;
   6:     }
   7: }

With this utility extension method, you could take advantage of this solution even in methods that use return.

Enumerable.Repeat<T>():

While there isn’t a direct way in the Enumerable class to generate a single-item sequence, the Repeat<T>() method comes very close when you pass a repeat value of 1:

   1: public class Parser
   2: {
   3:     // a sequence of items that can start a comment
   4:     public IEnumerable<string> CommentStarters
   5:     {
   6:         get
   7:         {
   8:             return Enumerable.Repeat("//", 1);
   9:         }
  10:     }
  11: }

Once again, since this uses iterators under the covers, it is a bit heavier than the arrays, but it is immutable.  As a side note, though, using Repeat() with a count of 1 is probably less readable and could be seen as more of a “trick” than an improvement.

Results:

So how do these measure up, performance wise? Let’s perform all of these over 1 billion iterations:

Array List Iterator Repeat
7,093 ms 14,502 ms 9,848 ms 10,847 ms

So here, the fastest solution is the T[] array, however, it is also mutable which is a concern.  Given that I would probably hedge towards using the very close second and third place iterator or Enumerable.Repeat() to gain the safety of immutability.  Of all of these, it seems like the iterator is the most readable, especially if used with something like the AsEnumerable() extension method proposed.

What About Creating a Singleton?

Now, you could create a singleton for a single-item sequence.  The concern with doing this is that you really have to make the singleton immutable or altering it becomes even worse, as it allows you to alter what every subsequent call produces.  You could make the singleton immutable by returning it by iterator (via Skip(0)), but this is slower than just using the iterator directly. 

You could also wrap the singleton in a ReadOnlyCollection<T> class, but this starts to get less maintainable, though it would prevent future allocations and thus would probably have an overall smaller memory footprint and be very fast (timed at 3,153 ms over 1 billion iterations).

In either case, this does not help in the case where you want to return single-item sequences that can vary at run-time.  That is, all of the examples above used a fixed, hard-coded sequence for simplicity, however it’s possible to have cases where the single-item sequence to return may vary based on the call or other logic.

Summary

So what does this all mean?  Really I was just experimenting to find out which way would be best, and really there’s a ton of options.  It seems quite clear the Enumerable.Empty<T>() is the best choice for zero-item sequences in terms of readability, performance, etc.

However it’s less clear-cut on one-item sequences.  In that case I’d probably say create an extension method like AsEnumerable() that takes advantage of the iterator method (or invoke yield return directly), though you can certainly keep a singleton if the value of the one-item list will always be the same.

Whether or not mutability of the returned sequence is a concern to you is purely a matter for your design consideration.  It’s much more a consideration if you are caching singletons for these sequences, but if you are generating the sequences at every call, it may not be as much of a concern.

Technorati Tags: , , , , , , , , , , , ,

Print | posted on Thursday, December 8, 2011 6:32 PM | Filed Under [ My Blog C# Software .NET Fundamentals ]

Feedback

Gravatar

# re: C# Fundamentals: Returning Zero or One Item As IEnumerable<T>

If you have a few moments to run your test again I'd be interested to see the difference in performance if you used a cached empty array in the empty case as opposed to creating a new one each time. I'd still use Empty<T> afterwards though. 8oP
12/9/2011 4:25 AM | Kevin Watkins
Gravatar

# re: C# Fundamentals: Returning Zero or One Item As IEnumerable<T>

You've missed one option on the single-element sequence, and that's creating a dedicated class which implements IEnumerable<T>. I've put up an example test with this approach at https://gist.github.com/1451071

The timings on my laptop for this are:

Custom class: 00:00:00.4003948 (10,000,000)
Iterator : 00:00:00.5607078 (10,000,000)
Array : 00:00:02.1428748 (10,000,000)
12/9/2011 4:54 AM | Mark Rendle
Gravatar

# re: C# Fundamentals: Returning Zero or One Item As IEnumerable<T>

I think you'll find that Enumerable.Empty uses a lazy-instantiated singleton typed array which is why you'll see the perf difference!
12/9/2011 5:13 AM | Richard Hopton
Gravatar

# re: C# Fundamentals: Returning Zero or One Item As IEnumerable<T>

Good feedback guys!

@Kevin: the cached empty array would be a good choice too, since it's immutable. I had originally created my own EmptyEnumerable class that actually used an empty array singleton as well.

@Mark: Sure, you can create a custom single-element sequence (might be a good re-usable data struct). I'll run it next to mine for comparissons. I'm not sure why your array is so much higher than an iterator though. Are you using Stopwatch, or DateTime to mark your times? That seems high on the Array side.

@Richard: I think that's correct but I'll have to check when I get into work. I have Reflector there and can peek at the actual implementation.

Thanks again for all the feedback!
12/9/2011 7:23 AM | James Michael Hare
Gravatar

# re: C# Fundamentals: Returning Zero or One Item As IEnumerable<T>

Cool, thanks for the tip. I'm a fan of the null object pattern in general, and certainly so for enumerations or lists or any other type of aggregate. I generally would default to returning a new List, and never gave a lot of thought to the mutability and performance angles. I like the static Enumerable solutions for IEnumerable -- I think I'm going to adopt that. :)
12/9/2011 10:15 AM | Erik
Gravatar

# re: C# Fundamentals: Returning Zero or One Item As IEnumerable<T>

Thanks Man for your tips and towards for all your experiments. It's really helpful to us in development always :)
12/9/2011 1:18 PM | Parwej
Gravatar

# re: C# Fundamentals: Returning Zero or One Item As IEnumerable<T>

You might find this post and library interesting. It has a Maybe<T> structure that more concisely represents 0 or 1 items (values).

http://blog.jordanterrell.com/post/Maybe-The-Uh-Stuff-That-Dreams-Are-Made-Of.aspx

http://nuget.org/packages/iSynaptic.Commons
12/10/2011 9:14 PM | Jordan Terrell
Gravatar

# re: C#/.NET Fundamentals: Returning Zero or One Item As IEnumerable<T>

"This is nice in that it will not allocate memory every time you need to return an empty sequence"

But it WILL allocate memory every time you call Array.GetEnumerator(). Furthermore, it is a function call, when a static readonly will do. If you need to return an enumerable as opposed to an array, make a singleton class (new it up in a public static readonly field) that implements both IEnumerable<T> and IEnumerator<T> and return this for GetEnumerator() and false for MoveNext(). No new heap allocations after that, whatsoever.
5/17/2012 4:47 PM | Joshua A. Schaeffer
Gravatar

# re: C#/.NET Fundamentals: Returning Zero or One Item As IEnumerable<T>

@Joshua: You are correct, GetEnumerator() does allocate memory (true in Enumerable<T>.Empty, plain Array of zero, etc.). And yes, we could create a class to represent both and return false for MoveNext(), though I think that's more of a micro-optimization as the goal isn't to remove *all* memory allocations.

The main point was to avoid allocating the memory for the same empty structure that you really don't care about using handy built-in classes. In fact, it's highly possible in the future they could further optimize Enumerable.Empty() to do just that.

Thus I feel it's sufficient for folks needing to do so to use the simple, built-in Enumerable.Empty() construct and then consider going further into a custom empty enumerator if it turns out the memory from that enumerator is an issue.


5/17/2012 6:48 PM | James Michael Hare
Post A Comment
Title:
Name:
Email:
Comment:
Verification:
 

Powered by: