C#/.NET Little Wonders: Five Easy Sequence Aggregators

<< C#/.NET Little Wonders: The ReferenceEquals() method | Home | C#/.NET Little Wonders: An Oft Overlooked String Constructor >>

C#/.NET Little Wonders: Five Easy Sequence Aggregators

Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders post can be found here.

Today we will look at five easy ways to aggregate sequences. Often times when we’re looking at a sequence of objects, we want to do perform some sort of aggregation across those sequences to find a calculated result from the sequence.

The methods we will be looking at are LINQ extension methods from the Enumerable static class which do just that. Like most of the LINQ extension methods discussed before, these operate on IEnumerable<TSource> sequences.

Sum() – calculate the total value across sequence

So, as you’d expect, the Sum() method in the Enumerable static class is useful for computing the total of values in a sequence. This can be done in one of two ways depending on the form of the extension method invoked (of course, ignoring the implicit source parameter):

Sum()
- Sums the values of the sequence.
- Source type must be one of the following types: int, long, single, double, decimal or a Nullable wrapped variant.
Sum(Func<TSource, X> projection)
- Sums the results of a projection on items in the sequence.
- Technically, X must be one of the following types: int, long, single, double, decimal or Nullable wrapped variant.
- However, if the projection is a lambda or anonymous delegate it can infer int from narrower numeric types.

Notice a few things here. First of all, even though many types in C# support addition, the Sum() method – no projection - only supports int, long, single, double, and decimal.

       1: // works for any sequences of allowed numeric types

       2: double[] data = { 3.14, 2.72, 1.99, 2.32 };

       3: var result = data.Sum();

       4:  

       5: // does NOT work for sequences of disallowed numeric types

       6: short[] shortData = { 1, 2, 5, 7 };

       7:  

       8: // Compiler ERROR: no form of Sum() exists for short, uint, etc

       9: var shortResult = shortData.Sum();

Also notice that you can operate on the Nullable versions of the allowed types. Now we know that nullable math can be a tricky thing in .NET, but with Sum() we need not worry, because all null values are excluded from the summation:

       1: // notice a nullable sequence of int

       2: var data = new List<int?> { 1, 3, 9, 13, null, 7, 12, null };

       3:  

       4: // answer will be 45 since nulls disregarded

       5: var result = data.Sum();

The second, projection form is a bit more interesting. Sometimes, it’s possible we do have sequences of numbers to add up directly, but often times we have sequences of more complex objects, and we want to sum up a particular field. This is where the second form that takes a projection comes in handy. By specifying a projection, we are telling Sum() how to extract the field we want to sum from each item in the sequence.

To illustrate, let’s assume a simple sequence of a simple POCO called Employee:

       1: public sealed class Employee

       2: {

       3:     public string Name { get; set; }

       4:     public double Salary { get; set; }

       5:     public short Dependents { get; set; }

       6:  

       7:     // etc...

       8: }

Now, let’s say that we have a sequence of these employees:

       1: var employees = new List<Employee>

       2:     {

       3:         new Employee { Name = "Bob", Salary = 35000.00, Dependents = 0 },

       4:         new Employee { Name = "Sherry", Salary = 75250.00, Dependents = 1 },

       5:         new Employee { Name = "Kathy", Salary = 32000.50, Dependents = 0 },

       6:         new Employee { Name = "Joe", Salary = 17500.00, Dependents = 2 },

       7:         // etc

       8:     };

We can then extract the Salary property using a projection and take a sum as follows:

       1: // once again, the projection result type MUST be a supported type

       2: var totalSalary = employees.Sum(e => e.Salary);

While the projection form of Sum() on the surface seems to be restricted to projections of the same restricted types (int, long, single, double, decimal), the projection form WILL allow projections of smaller types if we use lambda expressions or anonymous delegates.

       1: // these work because the lambda and anonymous delegate return type are inferred

       2: employees.Sum(e => e.Dependents);

       3: employees.Sum(delegate(Employee e) { return e.Dependents; });

This is because the result of the lambda expression and anonymous delegates can be inferred from usage, and the inference will automatically widen small numeric types (like short) to int. If we tried to do this with a method group or delegate variable defined with a non-supported return type, however, we would get an error because there is no inference involved, and hence there is no suitable match.

Average() – returns average value in sequence

The Average() method works just like its Sum() counterpart, except that it takes the sum and divides it by the number of items involved in the sum. What do I mean by involved? Remember that Sum() does not include null values – and Average() does exactly the same. That is, Average() takes the average of all non-null values, so those null values do not count as part of the total nor the number of items the total is divided by. For example:

       1: var intList = new int?[] { 10, 20, 30, null };

       2:  

       3: // returns 20, not 15 since only 3 non-null values

       4: Console.WriteLine(intList.Average());

The behavior for Average() in terms of exceptions and null returns exactly matches the Sum() method. So we won’t dive into it in any more detail.

Min() – returns smallest value in sequence

The Min() extension method is useful for examining a sequence and returning the smallest value from it. The basic format of Min() looks like (ignoring, again, the implicit source parameter):

Min()
- Finds the smallest value in the sequence.
- Throws if no object in sequence implements IComparable or IComparable<T>.
- Throws exception if sequence is empty and source type is a value type.
- Returns null if sequence is empty and X is reference type or Nullable wrapped value type.
Min(Func<TSource, X> projection)
- Finds the smallest value in the results of a projection on the sequence.
- Throws if no result of the projection to X implements IComparable or IComparable<T>.
- Throws exception if sequence is empty and X is value type.
- Returns null if sequence is empty and X is a reference type or Nullable wrapped value type.

The Min() method largely works largely as you’d expect it to, except with a few quirks. First of all Min() tries to support virtually any type as long as that type implements IComparable or IComparable<T>. Thus it is not restricted to numeric types, like Sum() is, and can be used on any comparable objects (including value types like DateTime, TimeSpan):

       1: var shortList = new short[] { 1, 3, 7, 9, -9, 33 };

       2:  

       3: // returns -9

       4: var smallest = shortList.Min();

       5:  

       6: // find smallest number of dependents 

       7: var minDependents = employees.Min(e => e.Dependents);

Also, Min() does not use a generic constraint to limit the type parameters to those that support the IComparable interfaces. Instead it throws a run-time exception – but only if the sequence is non-empty and no object in it implements one of the IComparable interfaces. Thus, given our definition of Employee above, the first call below would return null (sequence is empty), and the second call will throw (sequence non-empty but contains no IComparable objects).

       1: // Succeeds: empty sequences yield null for reference and Nullable wrapped value types

       2: var result1 = Enumerable.Empty<Employee>().Min();

       3:  

       4: // Throws: employees is non empty, and Employee does not implement IComparable

       5: var result2 = employees.Min();

The same thing is true for the result of projections. If a projection results in no values that are IComparable, and the sequence is not empty it will throw. Similarly if a projection results in a value type but the sequence is empty, it will throw.

Finally, note that for value types, if the sequence is empty an exception will be thrown, since it can’t determine a valid minimum from no items and can’t return null:

       1: // throws, int is comparable, but since sequence empty can't determine min value.

       2: var result3 = Enumerable.Empty<int>().Min();

       3:  

       4: // also would throw, even though Employee is reference type, result of projection is 

       5: // value type and the sequence is empty.

       6: var result4 = Enumerable.Empty<Employee>().Min(e => e.Dependents);

Max() – returns largest value in the sequence

The Max() behaves exactly like Min() except that it returns the largest value as opposed to the smallest value. So, we can use these to get the largest value in a sequence or the largest value from a projection over a sequence:

       1: // returns 33

       2: var biggestShort = shortList.Max();

       3:  

       4: // returns 75250.0

       5: var highestSalary = employees.Max(e => e.Salary);

The behavior for Max() in terms of exceptions and null returns exactly matches the Min() method, so let’s not dally on it but instead move on to a really interesting one…

Aggregate() – returns a custom aggregation over a sequence

So what happens if you have a sequence of values, and you want to perform your own custom aggregation, and none of the other four aggregation methods will do?

The answer is the Aggregate() method. This handy extension method lets you perform a custom aggregation upon a sequence.

There are three forms of Aggregate():

Aggregate(Func<TSource, TSource, TSource> function)
- Applies a function that takes an accumulator value and a next value and returns the result.
- Both the value and sequence type are same.
- The seed value is the first value in the sequence.
Aggregate(TAccumulate seed, Func<TAccumulate, TSource, TAccumulate> function)
- Applies a function that takes an accumulator value and the next item in a sequence and returns a result.
- The value and sequence type can be different or same.
- A seed value must be supplied to initialize the result.
Aggregate(TAccumulate seed, Func<TAccumulate, TSource, TAccumulate> function, Func<TAccumulate, TResult> resultProjection)
- Same as above but takes in addition an extractor for the result from the accumulator.

This all probably looks quite complex. Just remember the basic premise here: the delegate being performed on the sequence always takes a current total and the next value, and returns the new total. This is then applied iteratively down the sequence.

For example, if we wanted to do a product (multiplication) of all numbers in a sequence:

       1: var numbers = new int[] { 1, 3, 9, 2 };

       2:  

       3: // the function takes the current total and multiplies it by next to get new total.

       4: var product = numbers.Aggregate((total, next) => total * next);

Notice that in the code above, our delegate’s first parameter gives us the current total, which we then multiply against the next value and return. This return value becomes the new total that will be passed into the same delegate called on the next value and so forth. Also note that the seed value in this form of the aggregate is the first value in the sequence (in our case 1).

But what if the sequence we are using contains more complex objects and we want to perform an aggregation off of one of its fields? We can use the 2nd or 3rd form where the accumulator type and item type can differ. These versions do not use a projection like the other aggregation methods, but instead just give you the whole item and it’s up to you to decide how that item is combined with the total.

       1: // perhaps we want to get a total value equal to the employees

       2: // salary divided by the number of dependents (and self) they have

       3: var weirdCalculation = employees.Aggregate(0.0, 

       4:     (result, next) => result + next.Salary / (next.Dependents + 1));

So you see, we can do quite complex aggregate calculations across a sequence if we choose! The key thing to remember is that the function you provide leaves you in charge of taking the next “item” and applying it to the running “total”.

In fact, I usually like to name my lambda parameters result and next or total and item to help distinguish what each parameter is instead of relying on single letter lambda names, since this is definitely one of the more confusing aggregate methods in terms or readability.

Summary

So there you have it, five very easy aggregation methods. Well, four easy and one that may be a little more complex, yet quite powerful! These methods make it easy to perform aggregations on your sequences so that you don’t have to code the loops and calculations yourself. They’re quick, they’re easy to use, they’re easy to read, and they’re fully tested. Enjoy!

Technorati Tags: C#,CSharp, .NET, Little Wonders, LINQ, Aggregate, Sum, Min, Max, Average

Share This Post:
Short Url: http://wblo.gs/c9g

Print | posted on Thursday, August 25, 2011 9:05 PM | Filed Under [ My Blog C# Software .NET Little Wonders ]

James Michael Hare

My Links

News

Blogs I Read

Tag Cloud

Archives

Post Categories

Image Galleries

.NET

CSharp

Little Wonders

Little Wonders

vNext