James Michael Hare

...hare-brained ideas from the realm of software development...
posts - 137 , comments - 1099 , trackbacks - 0

My Links

News

Welcome to my blog! I'm a Sr. Software Development Engineer in Seattle, WA. I've been doing C++/C#/Java development for over 18 years, but have definitely learned that there is always more to learn!

All thoughts and opinions expressed in my blog and my comments are my own and do not represent the thoughts of my employer.

Blogs I Read

MCC Logo MVP Logo

Follow BlkRabbitCoder on Twitter

Tag Cloud

Archives

Post Categories

C#/.NET Little Wonders: The String Split() and Join() methods

Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders post can be found here.

This post continues a series of Little Wonders in the BCL String class.  Yes, we all work with strings in .NET daily, so perhaps you already know most of these.  However, there are a lot of little fun things that the String class can do that often get overlooked.

Today we are going to look at a pair of String method families to Split() and Join() string data. 

Background

Many times when dealing with string data – especially data coming to or from files – we may need to divide a string based on a set of delimiters (like commas in CSV files) or join a sequence of strings together using a delimiter. 

Many people know about the Split() method for busting apart strings, but fewer tend to know of its counterpart, the Join() method for putting a sequence of strings together.  So let’s look at both.

Split() – split string into parts based on delimiters

Many people know about the Split() method, but it actually has some interesting options we’ll discuss for a bit before heading to Join().

The Split() method is useful for taking a string that is logically divided by delimiters and busting it into an array of the strings between the delimiters.  The resulting array will have one entry for every string separated by the delimiters, and all the delimiters will be removed.

The delimiters to split can logically be one of several forms:

  • An array of 1 character:
    • For example ‘,’ or new [] { ‘,’ } as in: “A,B,C,D,E,F”
  • An array of many characters:
    • For example, ‘,’, ‘-‘, ‘*’ or new [] { ‘,’ ‘-‘ ‘*’ } as in: “A,B-C,D*E,F”
  • An array of one string:
    • For example, new [] { “=>” } as in: “A=>B=>C=>D=>E=>F”
  • An array of many strings:
    • For example, new [] { “=>”, “<=” } as in: “A=>B<=C=>D<=E=>F”

There’s a couple of quick things to notice here.  First of all you typically pass an array to Split(), though on the form of Split() that just takes a params char[], you can pass the char(s) directly.  Secondly, calling Split() on { ‘=’, ‘>’ } is not the same as calling Split() on { “=>” }.  The former will consider either ‘=’ or ‘>’ alone as delimiters, whereas the latter only considers the combination of “=>” together as a delimiter.

So let’s look at these in action.  When splitting on a single char:

   1: string testString = "James Hare,1001 Broadway Ave,St. Louis,MO,63101";
   2:  
   3: // can pass the array explicitly...
   4: string[] results = testString.Split(new[] { ',' });
   5:  
   6: // or pass it implicitly since its a params array as long as you are not
   7: // using the overloads that take string[], int, or StringSplitOptions
   8: results = testString.Split(',');

We’d get results split only on the ‘,’ character:

   1: Element 0: "James Hare"
   2: Element 1: "1001 Broadway Ave"
   3: Element 2: "St. Louis"
   4: Element 3: "MO"
   5: Element 4: "63101"
Whereas, if we split on both ‘,’ and ‘ ‘ (space):
   1: string testString = "James Hare,1001 Broadway Ave,St. Louis,MO,63101";
   2:  
   3: // once again, you can pass the array of char explicitly
   4: string[] results = testString.Split(new[] { ',', '-' });
   5:  
   6: // or implicitly since its a params array, as long as you are not using
   7: // the overloads that take string[], int, or StringSplitOPtions
   8: results = testString.Split(',', '-');

We’d get the results split on both ‘,’ and ‘ ‘ (space):

   1: Element 0: "James"
   2: Element 1: "Hare"
   3: Element 2: "1001"
   4: Element 3: "Broadway"
   5: Element 4: "Ave"
   6: Element 5: "St."
   7: Element 6: "Louis"
   8: Element 7: "MO"
   9: Element 8: "63101"
And as you can imagine, the Split() with string delimiters works much the same, except that the string has to appear all together:
   1: string testString = "James Hare,,1001 Broadway Ave,St. Louis,MO,63101";
   2:  
   3: // all the string[] methods do not have a params parameter, so the array
   4: // of string delimiters must be explicit, even if only one.
   5: string[] results = testString.Split(new[] { ",," }, StringSplitOptions.None);

Which yields:

   1: Element 0: "James Hare"
   2: Element 1: "1001 Broadway Ave,St. Louis,MO,63101"

Notice that on the string delimiter version of Split(), you must provide a StringSplitOptions value.  This is optional on the char delimiter versions as well and gives you flexibility on how you handle back-to-back delimiters (that is, two delimiters with empty string in the middle).

This enum has two defined values:

  • StringSplitOptions.None: default, if an empty string would result between two delimiters, it is returned as an empty string in the result.
  • StringSplitOptions.RemoveEmptyEntries: if an empty string would result between two delimiters, it is omitted from the result.

So basically, what this does is let you decide if back-to-back delimiters should return an empty entry (string.Empty) or just “skip” the empties from the results:

   1: string testString = "James Hare,,1001 Broadway Ave,,,St. Louis,MO,63101";
   2:  
   3: // Perform the split and keep empty entries.
   4: // These overloads do not support params array (since must be last arg).
   5: string[] resultsWithEmpties = testString.Split(new[] { ',' }, 
   6:     StringSplitOptions.None);
   7:  
   8: // perform the split and discard empty entries
   9: string[] resultsWithoutEmpties = testString.Split(new[] { ',' }, 
  10:     StringSplitOptions.RemoveEmptyEntries);

So notice, the first call with StringSplitOptions.None treats back-to-back delimiters as an empty entry, whereas StringSplitOptions.RemoveEmptyEntries does not have those empty entries in the results:

   1: With StringSplitOptions.None:
   2:         Element 0: "James Hare"
   3:         Element 1: ""
   4:         Element 2: "1001 Broadway Ave"
   5:         Element 3: ""
   6:         Element 4: ""
   7:         Element 5: "St. Louis"
   8:         Element 6: "MO"
   9:         Element 7: "63101"
  10:  
  11: With StringSplitOptions.RemoveEmptyEntries:
  12:         Element 0: "James Hare"
  13:         Element 1: "1001 Broadway Ave"
  14:         Element 2: "St. Louis"
  15:         Element 3: "MO"
  16:         Element 4: "63101"

Finally, there are forms of Split() for both char and string that let you limit the number of results returned to a maximum count of entries.  In these forms, the first n-1 entries in the result are the first n-1 delimited strings, and the nth entry is the remainder of the string:

   1: string testString = "James Hare,,1001 Broadway Ave,,,St. Louis,MO,63101";
   2:  
   3: // returns only at most 2 items.  The first is the first delimited string, and
   4: // the rest is the second.  Again, no params array support since not last arg
   5: string[] results = testString.Split(new[] { ',' }, 2, StringSplitOptions.None);
   6:  
   7: for (int i = 0; i < results.Length; i++)
   8: {
   9:     Console.WriteLine("\tElement {0}: \"{1}\"", i, results[i]);
  10: }

Using the max count of 2 above, the first delimited entry is James Hare, and the second is the remainder of the string, as below:

   1: Element 0: "James Hare"
   2: Element 1: ",1001 Broadway Ave,,,St. Louis,MO,63101"

Notice that in this case the second element has only one delimiter removed.  Since there were back to back delimiters, the first was removed since it was after the first entry, but since the second entry was the remainder, it has the remainder.  This just happens to be an empty entry followed by the second delimiter to the end.

So as you can see, while you may have known about Split() in the basic sense, it has a lot of other options to explore for altering the way your split is performed.

Join() – joining multiple items together into one string

So the antithesis of the Split() is the Join(), which is a static method of the String class that joins a sequence of items together into one string, adding a delimiter between each part.

Now, I’ve actually met many people who knew about Split() but didn’t know there was a string.Join() static method that did the opposite, and so they end up writing code that looks like this:

   1: string[] parts = { "Apple", "Orange", "Banana", "Pear", "Peach" };
   2:  
   3: var builder = new StringBuilder();
   4: for (int i = 0; i < parts.Length; i++)
   5: {
   6:     builder.Append(parts[i]);
   7:  
   8:     // don't want a comma after last item...
   9:     if (i != parts.Length - 1)
  10:     {
  11:         builder.Append(", ");
  12:     }
  13: }
  14:  
  15: // result is "Apple, Orange, Banana, Pear, Peach"
  16: var result = builder.ToString();

But there’s no need to do this, string.Join() already does this for you:

   1: string[] parts = { "Apple", "Orange", "Banana", "Pear", "Peach" };
   2:  
   3: // result is "Apple, Orange, Banana, Pear, Peach"
   4: var result = string.Join(", ", parts);

The logic is all written for you!  Your delimiter can be any string you want.  If you want them all concatenated with no delimiter, you can pass String.Empty or null as the delimiter.

   1: // result is "AppleOrangeBananaPearPeach"
   2: var result = string.Join("", parts);

Also, many people don’t realize that Join() can create a joined string of sequences of any type.  That is, you can String.Join() sequences of int, DateTime, double, or any other custom type! 

So what happens if you do that?  If you call string.Join() on any non-string sequence, it will invoke the ToString() method on each item in the sequence.  This, of course, only works well if the type of item in the sequence has a meaningful ToString() defined – though most of the BCL types do, and you can provide a meaningful one for any custom types you create.

For example:

   1: // On a sequence of ints ==> "1,2,3,4,5,6,7,8,9,10"
   2: var numsFromOneToTen = string.Join(",", Enumerable.Range(1, 10));
   3:  
   4: // On a sequence of objects ==> "1-3.1415927-9/8/2011 4:20:32 PM" -- (at the time I am writing this)
   5: var variousObjects = string.Join("-", new object[] { 1, 3.1415927, DateTime.Now });

Finally, the Join() method obviously supports IEnumerable<T> and object[], string[] explicitly, but the forms that take object[], and string[] are variable argument lists (params) which means that you don’t have to pass an explicit array but can just list all the things to join in the call, if possible.  Just keep in mind the first argument is the delimiter, and all the other arguments are the implicit array of things to join:

   1: // You can pass in the string[] as a variable argument list.  Just remember
   2: // that the first argument is the delimiter, and the others are the list.
   3: // This yields ==> "A,B,C,D,E"
   4: var numsFromOneToTen = string.Join(",", "A", "B", "C", "D", "E");
   5:  
   6: // This is also true on object[] as a variable argument list.  So if
   7: // your types are mixed, it will use the params object[] form which 
   8: // calls ToString() on each one.  Same as if the array were passed explicitly.
   9: // This yields ==> "1-3.1415927-9/8/2011 4:20:32 PM" -- (at the time I am writing this)
  10: var variousObjects = string.Join("-", 1, 3.1415927, DateTime.Now);

So as you can see, string.Join() can be very useful in summarizing a sequence of any type, provided the ToString() representation is what you want. 

Summary

So, in conclusion, if you need to break apart a string into delimited entries, or join together a sequence of items into a delimited string, check out Split() and Join().  Beyond their basic uses, they both have features that make them quite useful.

Technorati Tags:  ,,,,,,

 

Print | posted on Thursday, September 8, 2011 7:21 PM | Filed Under [ My Blog C# Software .NET Little Wonders ]

Feedback

Gravatar

# re: C#/.NET Little Wonders: The String Split() and Join() methods

I am surprised you didn't mention you can use split with single characters not in an array:

Your first example can be re-written like this:

string testString = "James Hare,1001 Broadway Ave,St. Louis,MO,63101";
string[] results = testString.Split(',');

Your second example can be written like this:

string testString = "James Hare,1001 Broadway Ave,St. Louis,MO,63101";
string[] results = testString.Split(',', ' ');

I do enjoy reading your blog.
9/9/2011 2:08 AM | Tom Evans
Gravatar

# re: C#/.NET Little Wonders: The String Split() and Join() methods

I was thinking the same thing. I think people often miss that the char[] is a params array, so you don't have to explicitly pass an array parameter.

Also, in my current job, I have taken over from an ex VB programmer... one of my pet peeves is that (I know I shouldn't generalize but) VB programmers tend to overuse string.Split, never validate inputs, and always assume that it's safe to do someString.Split("\\")[0].

This guy's code is also sprinkled with string concatenations using hard-coded backslashes... and his Windows Forms code has hundreds of Application.DoEvents()

Perhaps you can write about the wonders of Path.Combine, and the evils of Application.DoEvents()?

Love your series btw; it's always concise, well-written, easy to follow and interesting.
9/9/2011 5:34 AM | Jerome
Gravatar

# re: C#/.NET Little Wonders: The String Split() and Join() methods

@Tom:

You are absolutely right, on the one form of string.Split() which only takes a char array, it is a params array and you can just pass in the chars w/o an explicit array, all the other forms of Split() do require the explicit array.

@Jerome:

Yes, very true, many people do tend to immediately assume the number of split entries w/o validating. Can lead to some nasty consequences :-)

Actually, I did do Path.Combine() in one of my posts here: http://blackrabbitcoder.net/archive/2010/09/09/c.net-five-final-little-wonders-that-make-code-better-3.aspx
9/9/2011 6:46 AM | James Michael Hare
Gravatar

# re: C#/.NET Little Wonders: The String Split() and Join() methods

@Tom: I updated to show usage of the variable arguments lists for both Split() and Join(), thanks for the catch!
9/9/2011 9:26 AM | James Michael Hare
Gravatar

# re: C#/.NET Little Wonders: The String Split() and Join() methods

Before LINQ I often found myself wishing that string.Join took an IEnumerable instead of an array to be more generally. Now its a bit of a moot point since the LINQ ToArray() extension method can be used on the IEnumerable<T>, but it is still a headache dealing with older non-generic IEnumerables. Why hasn't an overload been added to let string.Join take a System.Collections.IEnumerable instead of object[]?
9/9/2011 10:33 AM | creldredge
Gravatar

# re: C#/.NET Little Wonders: The String Split() and Join() methods

The non-generic IEnumerable is largely considered deprecated and mostly just used by the older non-generic collections. Microsoft themselves recommend to avoid usage of those older collections and use the new generics instead (as do I).

But yes, you can use ToArray() to perform the call to string.Split() - if you forget to use ToArray() it interprets it as one argument and calls ToString() on the ArrayList.

But you can also use the Cast<T>() extension method that casts an IEnumerable into an IEnumerable<T>.

I timed both these methods, however, and the ToArray() is the most performant, most likely because the cast is completely unnecessary since all objects implement ToString().
9/9/2011 10:56 AM | James Michael Hare
Gravatar

# re: C#/.NET Little Wonders: The String Split() and Join() methods

got error message when trying to run this line: 4: var numsFromOneToTen = string.Join(",", "A", "B", "C", "D", "E");

Error 1 No overload for method 'Join' takes '6' arguments C:\Users\Alex\Documents\Visual Studio 2008\Projects\ConsoleApplication1\ConsoleApplication1\Program.cs 46 36 ConsoleApplication1

Should make a string array first before use it, right? string[] arr =
9/9/2011 4:38 PM | Alex Song
Gravatar

# re: C#/.NET Little Wonders: The String Split() and Join() methods

@Alex: the params overload on string split is only available in .Net 4.0 im afraid, I'll clarify that in the post. The params on split are citable much further back.
9/9/2011 5:49 PM | James Michael Hare
Gravatar

# re: C#/.NET Little Wonders: The String Split() and Join() methods

I work with a database has several stored procedures that accept a list of IDs as a parameter. I often couple Join() with a Linq statement or Lambda expression to build those lists. It makes things very easy.

Given the following class:
public class Customer { public int CustomerID { get; set; } }

Build a list of CustomerIDs:

List<Customer> customers = new List<Customer> {
new Customer {CustomerID = 1},
new Customer {CustomerID = 2},
new Customer {CustomerID = 3}
};
//LINQ
string customerIDLinq = string.Join(",", (from customer in customers select customer.CustomerID).ToArray());
//Lambda
string customerIDLambda = string.Join(",", customers.Select(customer => customer.CustomerID));

The above statements result in this string: "1,2,3"
9/14/2011 10:44 AM | WaywardMage
Gravatar

# re: C#/.NET Little Wonders: The String Split() and Join() methods

@WaywardMage:

Yep, LINQ queries make it easy to produce IEnumerable<T> results to join. Whether you use the LINQ query expression syntax, or the LINQ extension methods (like Select(), Where(), etc) they work great!
9/14/2011 10:51 AM | James Michael Hare
Gravatar

# re: C#/.NET Little Wonders: The String Split() and Join() methods

this gooooooooooooood article.
9/15/2011 5:24 AM | Mkhurana
Gravatar

# re: C#/.NET Little Wonders: The String Split() and Join() methods

You should be careful when using split() on large text as memory tends to explodes. Using IndexOf() and comparison is more adequate on large text.

Great blog you've got there, keep up the good work.
9/27/2011 4:41 PM | etil
Gravatar

# re: C#/.NET Little Wonders: The String Split() and Join() methods

@Etil: It really depends what you are trying to do. If you have a large string and you truly want to split it into its constituent parts, then yes string.Split() is probably the best method to do it.

However, if you just want ONE field inside a string, then you really don't want to split, you want to pattern match or search and substring.

So I wouldn't say "never" use split on large strings, I'd say use split when you want to split a string (because all or nearly all parts are usable).

It's like string builder vs concat vs format. Sometimes people latch onto a piece of advice like StringBuilder is more efficient, and this isn't always true. In general StringBuilder is better for BUILDING strings over multiple statements (like in a loop), string format is better for FORMATTING complex string formats, and string concat is the absolute fastest way to concat two to multiple strings in the same statement.

So, really, it's more a question of using the correct tool for the job depending on what you need to do. And, when in doubt, profile :-)
9/27/2011 8:31 PM | James Michael Hare
Gravatar

# re: C#/.NET Little Wonders: The String Split() and Join() methods

Thanks, I didn't know about Join! I have been using LINQ's Aggregate. (I believe it throws an exception on null -- perhaps Join avoids this?)
10/31/2011 2:43 PM | Jared
Gravatar

# re: C#/.NET Little Wonders: The String Split() and Join() methods

Jared:

string.Join will throw if your IEnumerable<T> is null too, but you can get around that with:

string.Join(", ", myEnumerableToJoin ?? Enumerable.Empty<YourType>());

Or, just surround in a if (myEnumerableToJoin != null) check.
11/1/2011 10:28 AM | James Michael Hare
Gravatar

# re: C#/.NET Little Wonders: The String Split() and Join() methods

Great article. I just have a tip:
In the code which joins the strings using a for loop, you have put a condition inside the loop which checks if the index value is not pointing to the last element and then appending the comma. Instead, you can avoid this check (which i think slows down the loop) completely and just trim the last two characters - comma and the space after the loop. What do you say?
1/10/2012 3:29 PM | badmaash
Gravatar

# re: C#/.NET Little Wonders: The String Split() and Join() methods

@badmaash: That's definitely one way to do it, though the for loop example was more to show roughly the algorithm being used, and then I launched into using string.Join() instead because it does all that for you.
1/10/2012 4:26 PM | James Michael Hare
Gravatar

# re: C#/.NET Little Wonders: The String Split() and Join() methods

@badmaash: And actually there's a better solution than either:

if (parts.Length == 0) return string.Empty;

var builder = new StringBuilder(parts[0]);

for (int i=1; i<parts.Length; i++)
{
....builder.Append(separator);
....builder.Append(parts[i]);
}

return builder.ToString();



1/10/2012 4:32 PM | James Michael Hare
Gravatar

# re: C#/.NET Little Wonders: The String Split() and Join() methods

Ah, better than my method because trimming will resize the string (which is a bit costly :-). Thank you James. Cheers.
1/11/2012 2:53 PM | badmaash
Gravatar

# re: C#/.NET Little Wonders: The String Split() and Join() methods

This is effective for my work. Thanks.
5/27/2012 4:42 PM | mdrayhankhan
Post A Comment
Title:
Name:
Email:
Comment:
Verification:
 

Powered by: