James Michael Hare

...hare-brained ideas from the realm of software development...
posts - 166 , comments - 1431 , trackbacks - 0

My Links

News

Welcome to my blog! I'm a Sr. Software Development Engineer in the Seattle area, who has been performing C++/C#/Java development for over 20 years, but have definitely learned that there is always more to learn!

All thoughts and opinions expressed in my blog and my comments are my own and do not represent the thoughts of my employer.

Blogs I Read

Follow BlkRabbitCoder on Twitter

Tag Cloud

Archives

.NET

CSharp

Little Wonders

Little Wonders

vNext

C#/.NET Little Wonders: The String Remove() and Replace() Methods

Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders post can be found here.

This post continues a series of Little Wonders in the BCL String class.  Yes, we all work with strings in .NET daily, so perhaps you already know most of these.  However, there are a lot of little fun things that the String class can do that often get overlooked.

Today we are going to look at a pair of String method families to Remove() and Replace() parts of a string.

Background

When manipulating string data, many times you want to either remove parts of a String or replace characters in a String.  You can do this yourself, of course, by building a new String while manually inspecting each char in turn, but this gets to be a bit expensive and is less maintainable.

Instead, there are two methods in the String class, ready to use, that let you do this easily: Remove() and Replace().

Remove() – removes part of a String

Of course, we all know that String.Substring() can be used to return a part of a String, so you may think this sounds redundant from Remove(), but really the two are nearly exact opposites.  Substring() is used when you want to keep a portion of a String and discard the rest, whereas Remove() is used when you want to discard a portion of a String and keep the rest.

The Remove() method has two forms:

  • Remove(int startIndex)
    • Removes all characters from startIndex to end, returns remaining string.
  • Remove(int startIndex, int length)
    • Removes length characters starting at startIndex, returns remaining string.

Both Remove() and Substring() are range checked, so if you specify and invalid starting index or invalid length, an exception will be thrown for both of these.

Between Remove() and Substring(), which should you use?  It really depends on what you want.  If you just want to keep the first 10 characters of a String, both work equally well:

   1: string test = "Now is the time for all good men to come to the aid of their country.";
   2:  
   3: // takes 10 characters starting at offset zero.
   4: var sliceUsingSubstring = test.Substring(0, 10);
   5:  
   6: // removes all characters from offset 10 onward
   7: var sliceUsingRemove = test.Remove(10);

Notice that if all we want to do is take the first 10 characters, both methods work fine, but Remove() is a little more succinct since you don’t have to specify a starting offset. So for grabbing the first 10 characters of a string, Remove() works just as well as Substring(), what about if you want the last 10 characters (assuming the string is long enough)?

   1: string test = "Now is the time for all good men to come to the aid of their country.";
   2:  
   3: // takes from length - 10 to end.
   4: var sliceUsingSubstring = test.Substring(test.Length - 10);
   5:  
   6: // removes from zero to length - 10, keeps end
   7: var sliceUsingRemove = test.Remove(0, test.Length - 10);

Here we see the situation is kind of reversed, in that Substring() is a little bit more succinct than Remove().

Now, truthfully these are kind of six/half-dozen scenarios where you can just pick the one that seems more readable to you.  But it does give you two tools to take a part of a String: one where you specify what you want to keep (Substring()), and one where you specify what you want to throw away (Remove()).

The place where this difference actually matters most, though, is when we are talking about keeping or removing a part of the middle of the string.  For example, what if we wanted to keep characters 10 through 20?

   1: // takes starting at 10 for 10 characters
   2: var sliceUsingSubstring = test.Substring(10, 10);
   3:  
   4: // remove after 20, then remove first 10
   5: var sliceUsingRemove = test.Remove(20).Remove(0, 10);

So here, when we want to keep the middle portion, Substring() is much more concise, readable, and performant because it can be done in a single operation and only allocates 1 final String for the result.  Remove() in this case, however, means we must remove the end, and then remove from the front till we are left with what we want.

But what if we want to remove the middle 10 characters?

   1: // take substring at zero for 10 characters, and concat with substring at 20 till end
   2: var sliceUsingSubstring = test.Substring(0, 10) + test.Substring(20);
   3:  
   4: // Starting at index 10, remove 10 characters
   5: var sliceUsingRemove = test.Remove(10, 10);

In this case, the Remove() is much simpler, more readable, and performant because it takes only one operation and generates only one String for the result.

So, if you need to remove from the middle of a String, use Remove(), and if you want to keep a middle portion of a String, use Substring().  For front and end portions they both work equally well and just choose the one you feel is more readable.

Replace() – replace char or String

So what if you have a String and you need to replace all occurrences of one char with another, or all occurrences of a substring with another, or even remove all occurrences of a substring or char?

This is where the Replace() method comes in very handy.  Basically Replace() has two forms that allow you to do all these activities:

  • Replace(char oldChar, char newChar)
    • Replaces all instances of oldChar with newChar in the resulting String.
  • Replace(string oldValue, string newValue)
    • Replaces all instances of oldValue with newValue in the resulting String.

Notice that there are no options for different comparers here, the String or char must be case-sensitively equal (you could write an extension method for case-insensitive replacements yourself, but it’s not built in).

Really, the only one of these that is extremely strict is the char form because it always replaces exactly one char with another char.  However, the string form really gives you a lot of flexibility, because it allows you to replace char with String, String with char, or remove char or String. 

The key is that for a single char you’d just create a one-char String, and for removing you’d replace with an empty String:

   1: string test = "Now is the time for all good men to come to the aid of their country.";
   2:  
   3: // "Now is the time for all good people to come to the aid of their country"
   4: var politicallyCorrect = test.Replace("men", "people");
   5:  
   6: // "Now|is|the|time|for|all|good|men|to|come|to|the|aid|of|their|country"
   7: var spacesToPipes = test.Replace(' ', '|');
   8:  
   9: // "Now is time for all good men to come to aid of their country"
  10: var withoutThe = test.Replace("the ", string.Empty);

 

So as you can see, this little method can give us quite a bit of power for manipulating a String!  These may seem trivial and academic exercises, but they come in handy when cleansing data.  For example, say you have a data blob that has “<BR/>” HTML tags denoting breaks, and you want to replace this with the new line as appropriate for your runtime environment:

   1: string test = "Some data &amp; markup was loaded from a data source.<BR/>&nbsp;Oh look, we started a new line!";
   2:  
   3: // replaces all instances of <BR/> with the new line as appropriate for this environment.
   4: var cleansedData = test.Replace("<BR/>", Environment.NewLine);
   5:  
   6: // You can even chain replacements... but could potentially create several strings to GC.
   7: var moreCleansedData = test.Replace("&amp;", "&")
   8:     .Replace("&nbsp;", " ")
   9:     .Replace("<BR/>", Environment.NewLine);

 

On a whim, I attempted to do the same thing using StringBuilder, and the StringBuilder actually performed slower, which surprised me somewhat.

   1: // load the StringBuilder with the data, manipulate it, then call ToString() for the result
   2: var morePerformantCleansedData = new StringBuilder(test)
   3:     .Replace("&amp;", "&")
   4:     .Replace("&nbsp;", " ")
   5:     .Replace("<BR/>", Environment.NewLine)
   6:     .ToString();

 

It is very interesting that chaining multiple String.Replace() calls outperforms the multiple StringBuilder.Replace() calls, and perhaps I’ll dig into that in another blog post as to why.  Regardless, String.Replace() is very handy for replacing all occurrences of a char or String with another, or removing all occurrences outright.

Summary

In conclusion, the Remove() method can be handy for trimming a portion of a String from the middle, consider it to be the inverse operation of Substring() and use it when it’s appropriate and it can be a more performant alternative where its usage makes more sense.

In addition, the Replace() method is very handy for scrubbing data by replacing or removing unwanted markup to create a new, clean String.

Technorati Tags: , , , , , ,

Print | posted on Thursday, September 15, 2011 6:31 PM | Filed Under [ My Blog C# Software .NET Little Wonders ]

Powered by: