James Michael Hare

...hare-brained ideas from the realm of software development...
posts - 136 , comments - 1095 , trackbacks - 0

My Links

News

Welcome to my blog! I'm a Sr. Software Development Engineer in Seattle, WA. I've been doing C++/C#/Java development for over 18 years, but have definitely learned that there is always more to learn!

All thoughts and opinions expressed in my blog and my comments are my own and do not represent the thoughts of my employer.

Blogs I Read

MCC Logo MVP Logo

Follow BlkRabbitCoder on Twitter

Tag Cloud

Archives

Post Categories

C#/.NET Little Wonders: Static Char Methods

Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders posts can be found here.

Often times in our code we deal with the bigger classes and types in the BCL, and occasionally forgot that there are some nice methods on the primitive types as well.  Today we will discuss some of the handy static methods that exist on the char (the C# alias of System.Char) type.

The Background

I was examining a piece of code this week where I saw the following:

   1: // need to get the 5th (offset 4) character in upper case
   2: var type = symbol.Substring(4, 1).ToUpper();
   3:  
   4: // test to see if the type is P
   5: if (type == "P")
   6: {
   7:     // ... do something with P type...
   8: }

Is there really any error in this code?  No, but it still struck me wrong because it is allocating two very short-lived throw-away strings, just to store and manipulate a single char:

  1. The call to Substring() generates a new string of length 1
  2. The call to ToUpper() generates a new upper-case version of the string from Step 1.

In my mind this is similar to using ToUpper() to do a case-insensitive compare: it isn’t wrong, it’s just much heavier than it needs to be (for more info on case-insensitive compares, see #2 in 5 More Little Wonders).

One of my favorite books is the C++ Coding Standards: 101 Rules, Guidelines, and Best Practices by Sutter and Alexandrescu.  True, it’s about C++ standards, but there’s also some great general programming advice in there, including two rules I love:

        8. Don’t Optimize Prematurely
        9. Don’t Pessimize Prematurely

We all know what #8 means: don’t optimize when there is no immediate need, especially at the expense of readability and maintainability.  I firmly believe this and in the axiom: it’s easier to make correct code fast than to make fast code correct.  Optimizing code to the point that it becomes difficult to maintain often gains little and often gives you little bang for the buck.

But what about #9?  Well, for that they state:

“All other things being equal, notably code complexity and readability, certain efficient design patterns and coding idioms should just flow naturally from your fingertips and are no harder to write then the pessimized alternatives. This is not premature optimization; it is avoiding gratuitous pessimization.”

Or, if I may paraphrase: “where it doesn’t increase the code complexity and readability, prefer the more efficient option”.

The example code above was one of those times I feel where we are violating a tacit C# coding idiom: avoid creating unnecessary temporary strings.  The code creates temporary strings to hold one char, which is just unnecessary.  I think the original coder thought he had to do this because ToUpper() is an instance method on string but not on char.  What he didn’t know, however, is that ToUpper() does exist on char, it’s just a static method instead (though you could write an extension method to make it look instance-ish).

This leads me (in a long-winded way) to my Little Wonders for the day…

Static Methods of System.Char

So let’s look at some of these handy, and often overlooked, static methods on the char type:

  • IsDigit(), IsLetter(), IsLetterOrDigit(), IsPunctuation(), IsWhiteSpace()
    • Methods to tell you whether a char (or position in a string) belongs to a category of characters.
  • IsLower(), IsUpper()
    • Methods that check if a char (or position in a string) is lower or upper case
  • ToLower(), ToUpper()
    • Methods that convert a single char to the lower or upper equivalent.

For example, if you wanted to see if a string contained any lower case characters, you could do the following:

   1: if (symbol.Any(c => char.IsLower(c)))
   2: {
   3:    // ...
   4: }

Which, incidentally, we could use a method group to shorten the expression to:

   1: if (symbol.Any(char.IsLower))
   2: {
   3:     // ...
   4: }

Or, if you wanted to verify that all of the characters in a string are digits:

   1: if (symbol.All(char.IsDigit))
   2: {
   3:     // ...
   4: }

Also, for the IsXxx() methods, there are overloads that take either a char, or a string and an index, this means that these two calls are logically identical:

   1: // check given a character
   2: if (char.IsUpper(symbol[0])) { ... }
   3:  
   4: // check given a string and index
   5: if (char.IsUpper(symbol, 0)) { ... }

Obviously, if you just have a char, then you’d just use the first form.  But if you have a string you can use either form equally well.

As a side note, care should be taken when examining all the available static methods on the System.Char type, as some seem to be redundant but actually have very different purposes. 

For example, there are IsDigit() and IsNumber() methods, which sound the same on the surface, but give you different results. IsDigit() returns true if it is a base-10 digit character (‘0’, ‘1’, … ‘9’) where IsNumber() returns true if it’s any numeric character including the characters for ½, ¼, etc.

Summary

To come full circle back to our opening example, I would have preferred the code be written like this:

   1: // grab 5th char and take upper case version of it
   2: var type = char.ToUpper(symbol[4]);
   3:  
   4: if (type == 'P')
   5: {
   6:     // ... do something with P type...
   7: }

Not only is it just as readable (if not more so), but it performs over 3x faster on my machine:

   1,000,000 iterations of char method took: 30 ms, 0.000030 ms/item.
   1,000,000 iterations of string method took: 101 ms, 0.000101 ms/item.

It’s not only immediately faster because we don’t allocate temporary strings, but as an added bonus there less garbage to collect later as well.  To me this qualifies as a case where we are using a common C# performance idiom (don’t create unnecessary temporary strings) to make our code better.

Print | posted on Thursday, October 4, 2012 6:51 PM | Filed Under [ My Blog C# Software .NET Little Wonders ]

Feedback

Gravatar

# re: C#/.NET Little Wonders: Static Char Methods

Incidentally I've written a very small app that illustrates these concepts : http://blog.andrei.rinea.ro/2012/06/15/character-test-utility/
10/5/2012 10:07 AM | Andrei Rinea
Gravatar

# re: C#/.NET Little Wonders: Static Char Methods

"“where it doesn’t increase the code complexity and readability, prefer the more efficient option”."

why junior developer cannot understand this..?
10/5/2012 4:46 PM | nyelvoktatas
Gravatar

# re: C#/.NET Little Wonders: Static Char Methods

Could you have skipped the ToUpper() call altogether and compare with 'p' or 'P'? Or is the string culture sensitive?
10/5/2012 10:43 PM | chris
Gravatar

# re: C#/.NET Little Wonders: Static Char Methods

@chris: yes obviously you can check both as well. Really as long as you're avoiding making the unnecessary temporary strings it will perform well.
10/7/2012 7:31 AM | james michael hare
Gravatar

# re: C#/.NET Little Wonders: Static Char Methods

Great article - thank you.

A very minor point: if 1,000,000 iterations took 30ms then 1 iteration would have taken 0.000030 ms not 0.000050 ms as stated.

Keep up the good work

Mark
10/7/2012 9:29 AM | PTRMark
Gravatar

# re: C#/.NET Little Wonders: Static Char Methods

Great piece! Sometimes you forget the little things!
10/8/2012 7:26 AM | Chaz Gatian
Gravatar

# re: C#/.NET Little Wonders: Static Char Methods

Nice post!

Though, I believe you have misspelled the method name for char.IsNumber with char.IsNumeric.

I see the performance gain with using char.IsDigit and char.IsNumber, it's even faster then int.TryParse when you iterate each char of a string to check .IsNumeric() or .IsDigit(). This works well with integers, but how about decimals or doubles, is there anything "faster" than .TryParse()?
10/8/2012 11:15 AM | Mario
Gravatar

# re: C#/.NET Little Wonders: Static Char Methods

It is wrong, if you throw culture into the mix. Beware the Turkish I!
10/8/2012 6:52 PM | Mark S
Gravatar

# re: C#/.NET Little Wonders: Static Char Methods

@Mario: Thanks, I'll correct that.
@Mark: Culture definitely adds twists to anything you do, but the main point is to avoid creating substrings when the substring isn't necessary.
10/11/2012 9:24 AM | James Michael Hare
Gravatar

# re: C#/.NET Little Wonders: Static Char Methods

Great article and very good helpful information I like it I am a blogger so I can imagine that how much effort did you put in this, thanks buddy keep up the good work
10/16/2012 9:16 AM | loveclip99
Gravatar

# Latest Computer Games & Software

Thank you . very good information. its really helpful.
10/18/2012 12:38 AM | Abcar rakin
Gravatar

# re: C#/.NET Little Wonders: Static Char Methods

very informative and precise and on the spot
applied this and this works in most of cases and easy for newcomers as well to understand
10/25/2012 1:34 PM | wikijohn
Gravatar

# re: C#/.NET Little Wonders: Static Char Methods

To expand on Mark S' comment, IsDigit will match more than just 0-9. It will also match digits in other languages, such as Arabic. See http://blogs.msdn.com/b/oldnewthing/archive/2004/03/09/86555.aspx for more info.
11/19/2012 6:34 PM | Bryan Bates
Post A Comment
Title:
Name:
Email:
Comment:
Verification:
 
 

Powered by: