|
|
Thursday, May 17, 2012
Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders posts can be found here. We’ve talked about the Select() and Where() LINQ extension methods before. The Select() method lets you project from the source type to a new type, and the Where() method lets you filter the list of items to the ones you are interested in. Most people know of these methods in their simplest form, where they simply take a projection and predicate respectively that operates on just an element. However, there are overloads for both of these methods that take a delegate that operates on both the element and the index of the element. So let’s take a look at these and see what we can do with them. Select() – Projects elements The Select() method is responsible for projecting a sequence into a new sequence which may or may not be different types and/or values. As an extension method, it’s most common form is: - Select(Func<T, TResult> projection)
- Projects the source sequence into a new resulting sequence consisting of the results of each item passed through the projection delegate.
As you know, this gives us a lot of power, we can use it to change values (and keep types the same): 1: // an array of 1 to 10
2: var numbers = Enumerable.Range(1, 10).ToList();
3:
4: // converts to an array of 2, 4, 6, ..., 20
5: var doubles = numbers.Select(i => i*2).ToList();
So that’s an example of a projection that changes the values, you can also project to a different type:
1: // the numbers from 1 to 10
2: var numbers = Enumerable.Range(1, 10).ToList();
3:
4: // project to list of strings "1: ", "2: ", ...
5: var lineNumbers = numbers.Select(i => i.ToString() + ": ").ToList();
Now that we’ve reviewed the basic form, let’s look at a lesser-used overload of Select():
- Select(Func<T, int, TResult> projection)
- Projects the source sequence into a new resulting sequence consisting of the results of each item and its index in source passed through the projection delegate.
Note the difference here: in this overload the delegate takes not just an element, but an element and the index of that element in the source sequence this method is invoked upon.
Thus, if you had a list of items and wanted to take advantage of the item’s index as well in the projection, you can:
1: // say that racers consists of the names of racers in the order of finish:
2: IEnumerable<string> racers = ...;
3:
4: // get a projection of new objects consisting of placing and name
5: var finishers = racers.Select((r, index) => new { Place = index, Name = r }).ToList();
The code snippet above will take a sequence of string and convert it into a list of anonymous objects representing the racer’s name and their place. So given this form of the Select(), you can use the index as part of the projection and either store it, or use it to build a new value.
One very important note, the index provided to the projection is the index from the sequence the Select() was immediately called from. This is an important distinction because you can chain together multiple operations which may alter the number of items in the sequence (or their order) before the Select() is invoked. This is very similar to the way that Skip() and Take() (click here for a discussion of these two methods) work with their index overloads.
For example:
1: // fibs under 100
2: var numbers = new[] { 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89 };
3:
4: // even fibs and their indexes?
5: var evenFibs = numbers
6: .Where(f => (f % 2) == 0)
7: .Select((f, index) => new { Number = f, Index = index })
8: .ToList();
So looking at this code, you might think you’d get a list that contains Number = 2 at Index = 2, Number = 8 at Index 5, and Number = 34 at Index 8. But instead you’ll get Number = 2 at Index = 0, Number = 8 at Index = 1, and Number = 34 at Index = 2.
Why? Because remember that the Where() filters the sequence from (1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89) down to just a sequence of the evens (2, 8, 34) and that is the sequence the Select() is operating on, thus that sequence is the one who’s indexes are used. Make sense? The Where() returns a new, shorter sequence, and that is the sequence that Select() uses for its elements and what it bases the indexes from.
So, how would you get the results with the original indexes? Well, one way would be to perform the Select() first, and then filter the results with the Where() second:
1: // fibs under 100
2: var numbers = new[] { 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89 };
3:
4: // yes, this works...
5: var evenFibs = numbers
6: .Select((f, index) => new { Number = f, Index = index })
7: .Where(r => (r.Number % 2) == 0)
8: .ToList();
So this one first projects the sequence into a sequence of an anonymous type holding both the number and the index, and then it filters down to only the those that have an even number.
Where() – filters elements
As you know, Where() filters based on a predicate. It’s basic form is:
- Where(Func<T, bool> predicate)
- Filters the source sequence to a new sequence that contains only elements that return true from the predicate applied to each element.
So, of course, as you’ve seen in the examples above, we can use Where() to filter a sequence of numbers to just the evens:
1: // 1 through 10
2: var numbers = Enumerable.Range(1, 10);
3:
4: // 2, 4, 6, 8, 10
5: var evens = numbers.Where(n => (n % 2) == 0).ToList();
Just like Select(), the Where() extension method also has an overload that takes passes an index to the predicate:
- Where(Func<T, int, bool> predicate)
- Filters the source sequence to a new sequence that contains only elements that return true from the predicate applied to each element and its index in source. F
For example, let’s say you wanted to pull every other item in a sequence, you could do this with the overload:
1: // fibs under 100
2: var numbers = new[] { 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89 };
3:
4: // get every other fib number
5: var everyOtherFib = numbers.Where((n, index) => (index % 2) == 0).ToList();
The resulting sequence from the above will yield (1, 2, 5, 13, 34, 89). These may seem like trivial uses, but you could also use them to coordinate across sequences. For example, let’s say you kick off multiple service requests in various threads, and have an array of bool representing whether the results came back successfully, we could filter the requests to get the list of failed requests:
1: bool[] wasSuccessful = new ...;
2:
3: // get the list of requests that had bad return codes:
4: var failedRequests = requests.Where((r, index) => !wasSuccessful[index]).ToList();
So the code above goes through the requests, takes the index of each request, and checks to see if wasSuccessful at that same index is false.
Finally, once again it is important to note that the index passed to the Where() clause is the index of the item in the sequence Where() is immediately called upon. So, once again, if you need the index of the item in the original sequence, make sure that none of the clauses before your indexed Select(), Where(), Skip(), or Take() alter the sequence by re-ordering, adding, or removing items.
Sidebar: Naming style for index?
When you use the form of Select(), Where(), Skip(), or Take() where you provide a lambda expression that takes an index, it is often useful to name the lambda variables such that it’s clear which item is the index.
1: // Ummm, is j the number and k the index? or vice-versa...?
2: var evenFibs = numbers
3: .Where((j, k) => (k % 2) == 0)
4: .ToList();
In the code above, because j and k don’t really have any derivable meaning from the context, we have to know from experience (or Intellisense) that k is the index. However, many folks reading this code may not know that.
This is why typically, and this is just my personal style, I like to name the index variable explicitly index, or at least x so it’s fairly clear that we’re talking about an index and not the value:
1: // Ah... it was the index!
2: var evenFibs = numbers
3: .Where((j, index) => (index % 2) == 0)
4: .ToList();
Summary
The Select() and Where() LINQ extension methods provide powerful ways to manipulate lists. The Select() clause is great for projecting from one sequence to another sequence containing different types/values, and the Where() clause is useful for filtering a sequence down to only those that match a given predicate. Both of these extension methods allow you to not only project and filter based on each element in the sequence, but also based on the element’s index in the sequence. Care should be taken to realize that the index of the item may be altered by chained expressions.
Thursday, May 03, 2012
Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders posts can be found here. So last week we covered the Enumerable.Range() method in the System.Linq namespace, which gives us a handy way to generate a sequence of integers to either use directly, or to feed into a more complex expression. Today we’re going to look at another static method called Enumerable.Repeat() that allows us to repeat an element the specified number of times. Using Repeat() to generate a sequence of a repeated value Again, if we take a peek into the Enumerable static class in the System.Linq namespace, we see the vast array of LINQ extension methods, as well as a few static methods. One of which is a method called Repeat() whose purpose is to generate a sequence of repeated occurrences of a given element. It’s syntax looks like this: - Repeat(element, count)
- Repeats the given element for the specified count.
So given this, if we wanted to create an array of 10 occurrences of the same string, we could do so by coding: 1: // Creates a sequence of 5 occurrences of "Hello World!" and then stores in array
2: var repeatedValues = Enumerable.Repeat("Hello World!", 5).ToArray();
The main thing to notice in Repeat() is that it repeats the given element, it does not (at least not directly) repeatedly call an expression to generate multiple separate values.
For example, consider the following code:
1: var ran = new Random();
2:
3: // does this generate 100 random numers?
4: // Or repeat first random number 100 times?
5: var repeatedRandom = Enumerable.Repeat(ran.Next(), 100);
What do we get? Do we get a sequence of 100 random numbers? Or a sequence of the first random number repeated 100 times?
The answer is the latter. The ran.Next() is resolved to get the value of the parameter, and then that result is the element that is repeated 100 times. That is, if the first random number generated was 103242, then the array would contain { 103242, 103242, 103242, …, 103242 }.
So, we can repeat a single element several times, right? Obviously if that’s what you needed to do then it’s right up your alley, but once again if we look past the simple examples, we see some more interesting other things that this also enables us to do.
Using Repeat() to turn a single item into a sequence of one item
One of the nice things about Repeat() is it allows us to easily represent a single item as a sequence of a single item. For example, if you have a method that takes in an IEnumerable<T>, but you only have one T to give it, Repeat() makes it easy to pass in a sequence of length 1.
For example, if we had this method that takes an IEnumerabe<T> sequence of string:
1: // method that takes a sequence of string
2: public void AddElements(IEnumerable<string> elementSequence)
3: {
4: // ...
5: }
But we just wanted to pass the single string “EOF” into the method, we can use Repeat() to do so:
1: // makes "EOF" into a sequence of 1 containing "EOF"
2: AddElements(Enumerable.Repeat("EOF", 1));
So that’s a handy feature, though we could also, of course, do this by creating an explicit string[1] array or a List<string> as well. In fact I have a blog post on Returning Zero or One Item As IEnumerable<T> (here) where I talk about this in more detail, including weighing the performance and mutability ramifications.
We could also create a sequence of size length 0, though this is a less interesting use of Repeat() as the element value would be unused and the Enumerable.Empty<T>() singleton generator is more efficient if we know the sequence is intended to be empty (again, see the above mentioned blog post for more details).
Using Repeat() to repeat a generator to create sequences
We said before that Repeat() can be used to repeat a given element a specific number of times. This makes it sound like this can only be used to generate sequences of a repeated value. While this is generally true when you consider Repeat() by itself, this is not necessarily true if you think beyond the element representing a simple value.
For example, let’s revisit the idea of creating a sequence of 100 random numbers, except this time, we will make the element a Func<int> that returns a random number. So we will repeat the Func<int> 100 times, well wait, how does that help us? Won’t we just get the same reference to the delegate repeated 100 times? Well yes, but that’s not where we’ll end, we’ll then use Select() to invoke that delegate and return the result, giving us 100 random numbers:
1: var ran = new Random();
2:
3: // create a delegate from the method group,
4: // repeat that delegate 100 times,
5: // for each delegate, project it to its result,
6: // convert to array.
7: var results = Enumerable.Repeat<Func<int>>(ran.Next, 100)
8: .Select(f => f())
9: .ToArray();
Notice that we have to tell Repeat() that the type of element is a Func<int>, this is because it’s harder for the compiler to correctly infer the type Func<int> from a method group or delegate. Also notice our element is ran.Next which is a method group and roughly equivalent to the lambda () => ran.Next().
So, that delegate reference is in a sequence 100 times, and for each item, the Select() projects it to the result from executing the delegate, which gives us a nice sequence of 100 random values.
Note that we could have gotten a similar result by using Enumerable.Range() as follows:
1: var ran = new Random();
2:
3: // create a sequence from 0 to 999 (1000 items),
4: // for each int, project to the next random number,
5: // convert to array
6: var results = Enumerable.Range(0, 1000)
7: .Select(i => ran.Next())
8: .ToArray();
So in this method, we create a sequence of 1000 items (from 0 to 999) which we really don’t want to use, we’re more just using them to drive the Select() projection to return 1000 random numbers. For more information on Range() see last week’s post on The Enumerable.Range() Static Method (here).
Both of these are similar ways to get the same results. Some may prefer the first because Repeat() seems a more natural idiom than Range() for repeating a delegate, but it does have the baggage of creating two delegates (one for the generator, one for the projection) instead of the one for Range() (only requires the one for projection).
Which is better? Execution-wise, the use of Range above is a hair faster (but really, the difference is very minor and of course that’s subject to change as the framework changes). Other than that, it’s really up to you which way you prefer as they both accomplish the same goal.
Summary
The Enumerable.Repeat() method performs the simple task of creating a sequence by repeating an element a specific number of times. While this is, in of itself, a trivial need, it can also be used to drive more useful results such as repeating a generator delegate, or creating sequences out of single items.
Thursday, April 26, 2012
Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders posts can be found here. Thanks for all of your patience while I’ve been dealing with other matters these last few weeks. I didn’t want to let my post slide a third week, so I decided to say a few words about a small static method in the Enumerable class from System.Linq. Using Range() to generates a sequence of consecutive integers So, if we look in the Enumerable static class, where most of the linq extension methods are defined, we will also see a static method called Range() whose purpose is to generate a range of integers from a given start value and for a given count. It’s syntax looks like: - Range(int start, int count)
- Returns a sequence of int from start to (start + count – 1).
So, for example, the code snippet below will create an List<int> of the numbers from 1 to 10: 1: var numbers = Enumerable.Range(1, 10).ToList();
So, this seems simple enough, right? Well, yes, it is a handy way to create a sequence of consecutive int values that you can use directly, but when coupled with other constructs, it has many other uses as well.
Using Range() to feed a more complex LINQ expression
For example, if we wanted a list of the first 5 even numbers, we could start with the number of expected items and multiply up by our step factor in a Select():
1: // take sequence 0, 1, 2, 3, 4 and multiply each by two...
2: var evens = Enumerable.Range(0, 5).Select(n => n*2).ToList();
Or, you could create a range over the total range of values and use Where() to filter it down:
1: // generates sequence from 0..9, but only selects even ones
2: var odds = Enumerable.Range(0, 10).Where(n => (n % 2) == 0).ToList();
But the great thing about Range() is you don’t have to use it to just produce numbers, you can use the sequence it generates either directly or as the starting point for a more complex LINQ expression.
For example, if we wanted to generate a series of strings for font sizes we want to allow in a windows form, we could do that easily:
1: // takes the range from 1 to 10 and multiples by 10 and puts % on end.
2: var percentages = Enumerable
3: .Range(1, 10)
4: .Select(i => (i * 10) + " pt")
5: .ToArray();
This would give us an array of strings containing “10 pt”, “20 pt”, “30 pt”, … “100 pt”.
So Range() comes in handy for any expression you can create that starts with a simple sequence of integers. But what else?
Using Range() to generate multiple complex objects
One other use of Range() that can come in handy is for repeating some action multiple times. For example, say we want to create and start several Task instances for some parallel work.
We could do this using a standard array allocation and for loop:
1: // first construct the array of the appropriate size
2: Task[] tasks = new Task[NumConsumers];
3:
4: // then loop through each index and create the new instance
5: for (int i = 0; i < NumConsumers; i++)
6: {
7: tasks[i] = TaskFactory.StartNew(SomeAction);
8: }
Or we could do it in one LINQ expression:
1: // constructs an array of NumConsumers tasks that will each
2: // perform SomeAction in parallel
3: Task[] tasks = Enumerable.Range(1, NumConsumers)
4: .Select(i => TaskFactory.StartNew(SomeAction))
5: .ToArray();
Now, you may be wondering why we can’t do this instead:
1: // Enumerable.Repeat() repeats a value the specified number of times
2: Task[] tasks = Enumerable.Repeat(TaskFactory.StartNew(SomeAction), NumConsumers)
3: .ToArray();
The answer is that Enumerable.Repeat() repeats the given value the specified number of times, so in the example above, it would call TaskFactory.StartNew() once and then repeat the resulting reference NumConsumers times, which is clearly not what we want! There are ways to do similar things with Repeat(), but they’re not quite as straightforward as using Range().
Summary
The Enumerable.Range() method performs a very simple function, but it’s results can be used to drive much more complex LINQ expressions. Feel free to use it for generating simple int sequences all the way to generating sequences of a repeated action. Either way, it’s a good tool to keep handy.
Thursday, April 12, 2012
I won't be able to create a post this week due to an upcoming medical issue with a member of the family next week, and prepping a contributions spreadsheet since receipt of an MVP award last year in the hope that I'll get my MVP renewed this year <crossesFingers/>.
Happy coding!
Thursday, March 29, 2012
Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders posts can be found here. I’ve covered many valuable methods from System.Linq class library before, so you already know it’s packed with extension-method goodness. Today I’d like to cover two small families I’ve neglected to mention before: Skip() and Take(). While these methods seem so simple, they are an easy way to create sub-sequences for IEnumerable<T>, much the way GetRange() creates sub-lists for List<T>. Skip() and SkipWhile() The Skip() family of methods is used to ignore items in a sequence until either a certain number are passed, or until a certain condition becomes false. This makes the methods great for starting a sequence at a point possibly other than the first item of the original sequence. The Skip() family of methods contains the following methods (shown below in extension method syntax): - Skip(int count)
- Ignores the specified number of items and returns a sequence starting at the item after the last skipped item (if any).
- SkipWhile(Func<T, bool> predicate)
- Ignores items as long as the predicate returns true and returns a sequence starting with the first item to invalidate the predicate (if any).
- SkipWhile(Func<T, int, bool> predicate)
- Same as above, but passes not only the item itself to the predicate, but also the index of the item.
For example: 1: var list = new[] { 3.14, 2.72, 42.0, 9.9, 13.0, 101.0 };
2:
3: // sequence contains { 2.72, 42.0, 9.9, 13.0, 101.0 }
4: var afterSecond = list.Skip(1);
5: Console.WriteLine(string.Join(", ", afterSecond));
6:
7: // sequence contains { 42.0, 9.9, 13.0, 101.0 }
8: var afterFirstDoubleDigit = list.SkipWhile(v => v < 10.0);
9: Console.WriteLine(string.Join(", ", afterFirstDoubleDigit));
Note that the SkipWhile() stops skipping at the first item that returns false and returns from there to the rest of the sequence, even if further items in that sequence also would satisfy the predicate (otherwise, you’d probably be using Where() instead, of course).
If you do use the form of SkipWhile() which also passes an index into the predicate, then you should keep in mind that this is the index of the item in the sequence you are calling SkipWhile() from, not the index in the original collection.
That is, consider the following:
1: var list = new[] { 1.0, 1.1, 1.2, 2.2, 2.3, 2.4 };
2:
3: // Get all items < 10, then
4: var whatAmI = list
5: .Skip(2)
6: .SkipWhile((i, x) => i > x);
For this example the result above is 2.4, and not 1.2, 2.2, 2.3, 2.4 as some might expect. The key is knowing what the index is that’s passed to the predicate in SkipWhile(). In the code above, because Skip(2) skips 1.0 and 1.1, the sequence passed to SkipWhile() begins at 1.2 and thus it considers the “index” of 1.2 to be 0 and not 2. This same logic applies when using any of the extension methods that have an overload that allows you to pass an index into the delegate, such as SkipWhile(), TakeWhile(), Select(), Where(), etc.
It should also be noted, that it’s fine to Skip() more items than exist in the sequence (an empty sequence is the result), or even to Skip(0) which results in the full sequence. So why would it ever be useful to return Skip(0) deliberately? One reason might be to return a List<T> as an immutable sequence.
Consider this class:
1: public class MyClass
2: {
3: private List<int> _myList = new List<int>();
4:
5: // works on surface, but one can cast back to List<int> and mutate the original...
6: public IEnumerable<int> OneWay
7: {
8: get { return _myList; }
9: }
10:
11: // works, but still has Add() etc which throw at runtime if accidentally called
12: public ReadOnlyCollection<int> AnotherWay
13: {
14: get { return new ReadOnlyCollection<int>(_myList); }
15: }
16:
17: // immutable, can't be cast back to List<int>, doesn't have methods that throw at runtime
18: public IEnumerable<int> YetAnotherWay
19: {
20: get { return _myList.Skip(0); }
21: }
22: }
This code snippet shows three (among many) ways to return an internal sequence in varying levels of immutability. Obviously if you just try to return as IEnumerable<T> without doing anything more, there’s always the danger the caller could cast back to List<T> and mutate your internal structure. You could also return a ReadOnlyCollection<T>, but this still has the mutating methods, they just throw at runtime when called instead of giving compiler errors. Finally, you can return the internal list as a sequence using Skip(0) which skips no items and just runs an iterator through the list. The result is an iterator, which cannot be cast back to List<T>.
Of course, there’s many ways to do this (including just cloning the list, etc.) but the point is it illustrates a potential use of using an explicit Skip(0).
Take() and TakeWhile()
The Take() and TakeWhile() methods can be though of as somewhat of the inverse of Skip() and SkipWhile(). That is, while Skip() ignores the first X items and returns the rest, Take() returns a sequence of the first X items and ignores the rest.
Since they are somewhat of an inverse of each other, it makes sense that their calling signatures are identical (beyond the method name obviously):
- Take(int count)
- Returns a sequence containing up to the specified number of items. Anything after the count is ignored.
- TakeWhile(Func<T, bool> predicate)
- Returns a sequence containing items as long as the predicate returns true. Anything from the point the predicate returns false and beyond is ignored.
- TakeWhile(Func<T, int, bool> predicate)
- Same as above, but passes not only the item itself to the predicate, but also the index of the item.
So, for example, we could do the following:
1: var list = new[] { 1.0, 1.1, 1.2, 2.2, 2.3, 2.4 };
2:
3: // sequence contains 1.0 and 1.1
4: var firstTwo = list.Take(2);
5:
6: // sequence contains 1.0, 1.1, 1.2
7: var underTwo = list.TakeWhile(i => i < 2.0);
The same considerations for SkipWhile() with index apply to TakeWhile() with index, of course.
Using Skip() and Take() for sub-sequences
A few weeks back, I talked about The List<T> Range Methods and showed how they could be used to get a sub-list of a List<T>. This works well if you’re dealing with List<T>, or don’t mind converting to List<T>. But if you have a simple IEnumerable<T> sequence and want to get a sub-sequence, you can also use Skip() and Take() to much the same effect:
1: var list = new List<double> { 1.0, 1.1, 1.2, 2.2, 2.3, 2.4 };
2:
3: // results in List<T> containing { 1.2, 2.2, 2.3 }
4: var subList = list.GetRange(2, 3);
5:
6: // results in sequence containing { 1.2, 2.2, 2.3 }
7: var subSequence = list.Skip(2).Take(3);
I say “much the same effect” because there are some differences. First of all GetRange() will throw if the starting index or the count are greater than the number of items in the list, but Skip() and Take() do not. Also GetRange() is a method off of List<T>, thus it can use direct indexing to get to the items much more efficiently, whereas Skip() and Take() operate on sequences and may actually have to walk through the items they skip to create the resulting sequence.
So each has their pros and cons. My general rule of thumb is if I’m already working with a List<T> I’ll use GetRange(), but for any plain IEnumerable<T> sequence I’ll tend to prefer Skip() and Take() instead.
Summary
The Skip() and Take() families of LINQ extension methods are handy for producing sub-sequences from any IEnumerable<T> sequence. Skip() will ignore the specified number of items and return the rest of the sequence, whereas Take() will return the specified number of items and ignore the rest of the sequence.
Similarly, the SkipWhile() and TakeWhile() methods can be used to skip or take items, respectively, until a given predicate returns false.
|