Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders post can be found here.
Today I’m going to look at 5 different ways of combining two sequences together using LINQ extension methods. For the purposes of this discussion, I will split these 5 methods into homogeneous and heterogeneous methods (for lack of better terms).
- “Homogeneous” combinations (“same” type):
- “Heterogeneous” combinations (“different” types):
To be sure, I use the terms homogeneous and heterogeneous very loosely here. The purpose isn’t to say that the heterogeneous combinations can’t be used to combine two sequences of the same type – they can – but the intent was to indicate that they do support combining sequences of completely unrelated types.
Similarly, the homogeneous combination methods can be used to combine some types that are different as long as they are directly related. This comes about because of the covariance of IEnumerable<T>. Because IEnumerable<T> is covariant, that means an IEnumerable<YourSubClass> can be returned or passed where an IEnumerable<YourBaseClass> is expected. In our case, this means you could combine an IEnumerable<Dog> and an IEnumerable<Animal> and get the result of an IEnumerable<Animal>.
Concat() – Concatenate sequences
The simplest sequence combination method is Concat() which simply takes two sequences and returns a result of the items from the first sequence followed by the items of the second sequence:
1: var healthFoods = new List<string> { "fruits", "vegetables", "grains", "proteins" };
2: var myFoods = new List<string> { "grains", "proteins", "M&Ms", "soda" };
3:
4: // returns sequence containing fruits, vegetables, grains, proteins, grains, proteins, M&Ms, soda
5: var healthyFirst = healthFoods.Concat(myFoods);
6:
7: // returns sequence containing grains, proteins, M&Ms, soda, fruits, vegetables, grains, proteins
8: var mineFirst = myFoods.Concat(healthFoods);
Note that order of Concat() matters! Each way the sequence contains the exact same elements, but the order is dependent on which sequence comes first as it simply iterates through the first sequence and then the seconds resulting in one sequence.
Union() – Concatenate sequences without duplicates
I talked about Union() before in my post on the LINQ Set Operations (here), but it’s worth repeating in brief here because it is also a valid way to combine sequences.
The Union() is a set-theory operation that combines two sets with no duplications. This works well for any two sequences as well. It will combine the two sequences as the first sequence, followed by the second sequence with no duplicates. Basically, any time there is a duplicate, it will appear if its the first appearance in this order, and then be removed after that:
1: // returns sequence containing fruits, vegetables, grains, proteins, M&Ms, soda
2: var healthyFirst = healthFoods.Union(myFoods);
3:
4: // returns sequence containing grains, proteins, M&Ms, soda, fruits, vegetables
5: var mineFirst = myFoods.Union(healthFoods);
Note that in each case, the first occurrence for any duplicated item is the one that will end up in the resulting sequence, and any remaining duplicates are removed.
Duplicates are determined by using an IEqualityComparer<T>, if you do not provide one then the default equality comparer for that type is used. Note that if T is a custom reference type, this means you need both a valid Equals() and GetHashCode() definition that fits your type’s notion of equality or provide a custom IEqualityComparer<T> implementation.
Zip() – Simple one-for-one combination of sequences
The Zip() method is the simplest of the heterogeneous combination methods. So given any two sequences of elements, it will merge the first item of the first sequence with the first item of the second sequence, then the second item of the first sequence with the second item of the second sequence, and so on. If one of the sequences is shorter than the other, it will stop as soon as it reaches the end of the shorter sequence.
To use Zip() all we need do is supply the sequences, and a result selector which will combine elements from each sequence to form a new element in the resulting sequence.
Let’s say, for instance, that we had the following classes defined:
1: public class Employee
2: {
3: public int Id { get; set; }
4: public string Name { get; set; }
5: public double Salary { get; set; }
6: }
7:
8: public class Seat
9: {
10: public int Id { get; set; }
11: public double Cost { get; set; }
12: }
13:
And then we defined the following sequences:
1: var employees = new List<Employee>
2: {
3: new Employee { Id = 13, Name = "John Doe", Salary = 13482.50 },
4: new Employee { Id = 42, Name = "Sue Smith", Salary = 98234.13 },
5: new Employee { Id = 99, Name = "Jane Doe", Salary = 32421.12 }
6: };
7:
8: var seats = new List<Seat>
9: {
10: new Seat { Id = 1, Cost = 42 },
11: new Seat { Id = 2, Cost = 42 },
12: new Seat { Id = 3, Cost = 100 },
13: new Seat { Id = 4, Cost = 100 },
14: new Seat { Id = 5, Cost = 125 },
15: new Seat { Id = 6, Cost = 125 },
16: };
We could zip these together to provide each employee with a seat:
1: // note i'm using an anonymous type here, but could use an existing type as well
2: var seatingAssignments = employees.Zip(seats, (e, s) => new
3: { EmployeeId = e.Id, SeatId = s.Id });
4:
5: foreach (var seat in seatingAssignments)
6: {
7: Console.WriteLine("Employee: " + seat.EmployeeId + " has seat " + seat.SeatId);
8: }
Which results in:
1: Employee: 13 has seat 1
2: Employee: 42 has seat 2
3: Employee: 99 has seat 3
Notice that the results stop once it runs out of employees.
Join() – Logical combinations of sequences
The Join() method can be used to combine sequences of any two types by matching on a value returned by a key selector for each sequence (you can kind of think of Zip() as a simple case of join where the key is the index of the item).
To use Join(), we specify a key extractor for each sequence to get the item key we want to compare from each sequence, and then – like Zip() – we specify a result selector that combines the two elements into one element for the result sequence.
Matches are determined by using an IEqualityComparer<T>, on the key fields returned from the key extractors. If this is not specified, the default equality comparer is used (which is sufficient for nearly all the built in types, and any custom structs or enums you write. If you are trying to match two custom reference types, however, you should either ensure your class has valid implementations of Equals() and GetHashCode() for your definition of equality, or provide a custom IEqualityComparer<T>.
Let’s use our Employee class from the previous example, and add a Badge class to represent a security badge assigned to each employee:
1: public class Badge
2: {
3: public int EmployeeId { get; set; }
4: public int BadgeNumber { get; set; }
5: }
Now let’s imagine these sequences were defined:
1: var employees = new List<Employee>
2: {
3: new Employee { Id = 13, Name = "John Doe", Salary = 13482.50 },
4: new Employee { Id = 42, Name = "Sue Smith", Salary = 98234.13 },
5: new Employee { Id = 99, Name = "Jane Doe", Salary = 32421.12 }
6: };
7:
8: var badges = new List<Badge>
9: {
10: new Badge { EmployeeId = 10, BadgeNumber = 1 },
11: new Badge { EmployeeId = 13, BadgeNumber = 2 },
12: new Badge { EmployeeId = 20, BadgeNumber = 3 },
13: new Badge { EmployeeId = 25, BadgeNumber = 4 },
14: new Badge { EmployeeId = 42, BadgeNumber = 5 },
15: new Badge { EmployeeId = 10, BadgeNumber = 6 },
16: new Badge { EmployeeId = 13, BadgeNumber = 7 },
17: };
We can then do a join on these two sequences to find the names of the people assigned to each badge:
1: // note i'm using an anonymous type here, but could use an existing type as well
2: var badgeAssignments = employees.Join(badges, e => e.Id, b => b.EmployeeId,
3: (e, b) => new { e.Name, b.BadgeNumber });
4:
5: foreach (var badge in badgeAssignments)
6: {
7: Console.WriteLine("Name: " + badge.Name + " has badge " + badge.BadgeNumber);
8: }
Which returns:
1: Name: John Doe has badge 2
2: Name: John Doe has badge 7
3: Name: Sue Smith has badge 5
This gives us the logical combinations we would expect exactly if we were performing an inner join on two database tables. Notice a few things here:
- Join() will have an entry for every match – since John Doe was assigned two badges he appears in the resulting sequence twice.
- Join() will not have any items that didn’t match – since Jane Doe had no matching badge number she doesn’t appear in the resulting sequence.
This comes in very handy for 1:1 matches or if you don’t mind multiple results for 1:many matches.
GroupJoin() – Logical join for sequnences with one-to-many matches
So what happens if you do have those 1:many matches and you want those to be grouped together instead of as multiple results? Well, if that’s the case then GroupJoin() is your weapon of choice. With GroupJoin() you specify a set of key extractors for each sequence as before, but now your results selector passes you the item from the first sequence with a list of all matches from the right sequence and you can do with them what you want:
1: // note i'm using an anonymous type here, but could use an existing type as well
2: var badgeAssignments = employees.GroupJoin(badges, e => e.Id, b => b.EmployeeId,
3: (e, bList) => new { Name = e.Name, Badges = bList.ToList() });
4:
5: foreach (var assignment in badgeAssignments)
6: {
7: Console.WriteLine(assignment.Name + " has badges:");
8:
9: if (assignment.Badges.Count > 0)
10: {
11: foreach (var badge in assignment.Badges)
12: {
13: Console.WriteLine("\tBadge: " + badge.BadgeNumber);
14: }
15: }
16: else
17: {
18: Console.WriteLine("\tNo badges.");
19: }
20: }
Which results in:
1: John Doe has badges:
2: Badge: 2
3: Badge: 7
4: Sue Smith has badges:
5: Badge: 5
6: Jane Doe has badges:
7: No badges.
Notice how for each item in the first sequence we get a list of matching items from the right sequence – even if there are no matches (in which case we get an empty sequence).
Summary
These five simple sequence combining methods can be used to combine two sequences of items in different ways.
The two “homogeneous” methods (Concat() and Union()) can be used to concatenate two like sequences together, with or without duplicates respectively.
The three “heterogeneous” methods can be used to combine two sequences of like or dislike types either on a one-for-one basis (Zip()) or by combining based on a logical match of key fields from each item in each sequence (Join() and GroupJoin()).
Join() is simpler for 1:1 matches or when you don’t care if there are duplicate results for multiple matches of single items. GroupJoin(), in contrast, returns all elements from the first sequence with the sequence of items from the second sequence that match it.
In all of these methods where equality is of consideration (Union(), Join(), GroupJoin()) you may specify an optional IEqualityComparer implementation to perform equality checks if the default equality comparison for the type in question is insufficient.