James Michael Hare

...hare-brained ideas from the realm of software development...
posts - 166 , comments - 1431 , trackbacks - 0

My Links

News

Welcome to my blog! I'm a Sr. Software Development Engineer in the Seattle area, who has been performing C++/C#/Java development for over 20 years, but have definitely learned that there is always more to learn!

All thoughts and opinions expressed in my blog and my comments are my own and do not represent the thoughts of my employer.

Blogs I Read

Follow BlkRabbitCoder on Twitter

Tag Cloud

Article Categories

Archives

Post Categories

Image Galleries

.NET

CSharp

Little Wonders

Little Wonders

vNext

C# Fundamentals: Parameters Passing Nuances

Last week I went into quite a bit of detail on C# struct (here) and the consequences  of using a struct versus a class to represent complex data types.  In the course of that article, I had a section describing the differences between value types and reference types and thought I would expand upon one of the ideas in there which seems to confuse some folks who are new to C#: parameter passing nuances.

But first, let's lay the groundwork with some definitions:

  • Parameter - the variable defined in a method signature that accepts a value from the caller.
  • Argument - the actual value passed to a method from the caller.
  • Value Types - types with deterministic lifespans that are primatives, enums, or structs.
  • Reference Types - types that are reference counted and garbage collected, these are all classes.
  • Pass-by-value - The argument passed in is copied into the parameter variable in the method.
  • Pass-by-reference - The parameter in the method refers back to the original argument and is not a copy.

At first mention, parameter passing sounds like it should be a no-brainer, but C# (and to be fair Java as well) have a lot of nuances to parameter passing, due to the dichotomy of types into reference and value, that newer developers to those languages may not be aware. 

So first, let's begin with the high level question: how are parameters passed in C#?  It's a simple question to ask, but the answer, while equally simple to answer, is very subtle.  So the answer?  Parameters in C# are, by default, passed by value.

Passing Reference Types

Now some of you may chafe at this a bit and say that is incorrect, that value types are passed by value and reference types are passed by reference, but this is strictly speaking incorrect as well.  The key is you have to remember what actually gets passed for each type.

Let's take a look at some sample code to illustrate:

   1: public static class Program
   2: {
   3:     public static void Main()
   4:     {
   5:         string greetings = "Hi"; 
   6:  
   7:         Console.WriteLine("Greetings before call: " + greetings);
   8:         ReferenceParameter(greetings);
   9:         Console.WriteLine("Greetings after call: " + greetings);
  10:     } 
  11:  
  12:     public static void ReferenceParameter(string p)
  13:     {
  14:         Console.WriteLine("string was passed to ReferenceParameter() as: " + p);
  15:         p = "Bye";
  16:         Console.WriteLine("string is leaving ReferenceParameter() as: " + p);        
  17:     }
  18: } 

So, what do we expect this output to be?  Well, if you take the very strong hint that all C# parameters are passed by value, you'd probably say:

Greetings before call: Hi
string was passed to ReferenceParameter() as: Hi
string is leaving ReferenceParameter() as: Bye
Greetings after call: Hi

And you'd be right.  Notice that even though we change the parameter p in the method ReferenceParameter(), that change is not reflected in the original parameter greetings that was passed in.  This is because when you pass a reference type in C#, you aren't passing the object itself, but the reference to the object. 

Now at this point some of you may say, "Ah HAH!  It is pass by reference."  Well, no, it's not.  We must remember that the argument being passed is the reference, not the object itself.  This is a subtle distinction but it makes all the difference in the world.

Let's look at the line above where we create and assign the string to the reference greetings:

   1: string greetings = "Hi";

It's tempting here to say we've created an object named greetings which is a string.  This is technically incorrect, what we've done is created a string that contains the text "Hi" that is referred to by the reference called greetings.  See the distinction?  A reference variable is not the object itself, only a reference to it.

What if we had the following:

   1: string name = "James";
   2: string alsoName = name;

How many objects did I create?  Only one, the string "James", and two references to the same object.  Hence, you can see that when we call a parameter with a reference type, we are not really passing in the object itself, but a reference to it. 

Let's assume in our original example that when we said:

   1: string greetings = "Hi";

That it created a string object at mythical location 1000 in .Net's heap.  In truth, memory allocation in .Net is more complex than this, but
to put a simple face on the illustration go with me on this.

    greetings                  System.String     Ref Ct
    +------+                   +----------------------+
    | 1000 |   --------------> | "Hi"            | 1  |
    +------+                   +----------------------+

So as you can see, we created a System.String somewhere in memory with a reference count of 1 (that is, 1 reference is referring to this object).  Once again, this is a simplification because string literals in .Net use interring to avoid having them duplicated in memory, but go with me once again on this for the sake of a simplified example.

Now, we pass the reference greetings into the method ReferenceParameter(), what happens?  The answer is we pass the reference greetings by value into the function.  Remember that pass-by-value says that we make a copy of the value passed in.  This means that the parameter named p will be a reference local variable in ReferenceParameter() that will be a copy of greetings.  Does this mean we have a copy of the string "Hi" passed in?  No, it means we have a copy of the location 1000 in p as illustrated below:

    greetings                  System.String     Ref Ct
    +------+                   +----------------------+
    | 1000 |   --------------> | "Hi"            | 2  |
    +------+                   +----------------------+
                                         ^
    p                                    |
    +------+                             |
    | 1000 |-----------------------------+
    +------+

See the difference?  We now have two references (greetings and p) that both refer to the original object.  The object itself was never passed (since it's a reference type), but the reference was and that reference was copied (pass-by-value).  Now let's look at what happens after the re-assignment in the ReferenceParameter() method:

   1: p = "Bye";

Let's assume the string "Bye" lives at a theoretical location 2000, if so we'd get:

    greetings                  System.String     Ref Ct
    +------+                   +----------------------+
    | 1000 |   --------------> | "Hi"            | 1  |
    +------+                   +----------------------+

    p                          System.String     Ref Ct
    +------+                   +----------------------+
    | 2000 |   --------------> | "Bye"           | 1  |
    +------+                   +----------------------+

Note what happened?  At the point p was reassigned, it released its reference to the object containing "Hi" (reference count went down to 1) and it was assigned the value of 2000 which refers to a new string object holding "Bye".  Now, when the method is ended, the local variable p is destroyed (all parameters are in local variables) which leaves us back with:

    greetings                  System.String     Ref Ct
    +------+                   +----------------------+
    | 1000 |   --------------> | "Hi"            | 1  |
    +------+                   +----------------------+

See, the original argument greetings was never changed!  This is because the argument was not the string "Hi", but a reference to it, and you can't change that original argument because the parameter is a copy of it.  To some extent, this seems like a point-of-view argument, but you have to remember that the original argument was a reference, not an object.  This is why reassigning the reference did not take in the original code.

To make things even more confusing, consider this:

   1: public static class Program
   2: {
   3:     public static void Main()
   4:     {
   5:         var fruits = new List<string> { "apple", "banana", "peach" }; 
   6:  
   7:         ChangeFruitsToVegetables(fruits); 
   8:  
   9:         Console.WriteLine("Fruits after call: ");
  10:         fruits.ForEach(fruit => Console.WriteLine("\t{0}", fruit));
  11:     } 
  12:  
  13:     public static void ChangeFruitsToVegetables(List<string> fruitList)
  14:     {
  15:         fruitList.Clear();
  16:         fruitList.Add("carrot");
  17:         fruitList.Add("asparagus");
  18:         fruitList.Add("broccoli");
  19:     }
  20: } 

What do we expect from this?  The answer is we'd get back a list of vegetables:

    Fruits after call:
            carrot
            asparagus
            broccoli

Now your first instinct may be to cry out that I said you couldn't modify the original argument, and I stand by that remark fully and
say we didn't.  But, you might say, the List<string> referred to by fruits completely changed!  And, I'd answer, no, the List<string> object reference named fruits never changed (it was still the same object at the same location), only it's contents changed.

Now, if that's mind blowing, let's look at the example drawn at the time of the call:

    fruits                     System.Collections.Generic.List<string>   Ref Ct
    +------+                   +-----------------------------------------------+
    | 1000 |   --------------> | "apple", "banana", "peach"               | 2  |
    +------+              +--> +-----------------------------------------------+
                          |
    fruitList             |                        
    +------+              |                         
    | 1000 |   -----------+                        
    +------+
                  

Notice, once again we are not making a copy of the list object, just a copy of the reference to it.  This means that even though
the reference was passed by value and can't change the argument, since both the reference and its copy both refer to the same
object, we can change the underlying object.  By calling:

   1: fruitList.Clear();
   2: fruitList.Add("carrot");
   3: fruitList.Add("asparagus");
   4: fruitList.Add("broccoli");

We are calling Clear() and Add() on the List<string> at location 1000.  No matter which reference we use (the parameter or the
supplied argument) they refer to the same object, and that object we can mutate.  Hence after these calls we have:

    fruits                     System.Collections.Generic.List<string>   Ref Ct
    +------+                   +-----------------------------------------------+
    | 1000 |   --------------> | "carrot", "asparagus", "broccoli"        | 2  |
    +------+              +--> +-----------------------------------------------+
                          |
    fruitList             |                        
    +------+              |                         
    | 1000 |   -----------+                        
    +------+                  

And then when we leave ChangeFruitsToVegetables(), the parameter (a local variable) is destroyed which gives us:

    fruits                     System.Collections.Generic.List<string>   Ref Ct
    +------+                   +-----------------------------------------------+
    | 1000 |   --------------> | "carrot", "asparagus", "broccoli"        | 1  |
    +------+                   +-----------------------------------------------+

Thus with reference objects, the thing to remember is that we pass a reference by value, and this means that we cannot change the specific object that we are referring to, but we can change the contents of the object we are referring to.

To illustrate, once again, let's look at that same code another way.  You could try to argue that this should be functionally the same as
the first fruits list code:

   1: public static class Program
   2: {
   3:     public static void Main()
   4:     {
   5:         var fruits = new List<string> { "apple", "banana", "peach" }; 
   6:  
   7:         ChangeFruitsToVegetables(fruits); 
   8:  
   9:         Console.WriteLine("Fruits after call: ");
  10:         fruits.ForEach(fruit => Console.WriteLine("\t{0}", fruit));
  11:     } 
  12:  
  13:     public static void ChangeFruitsToVegetables(List<string> fruitList)
  14:     {
  15:         fruitList = new List<string> { "carrot", "asparagus", "broccoli" };
  16:     }
  17: } 

However, this one won't work because we are attempting to change what the parameter points to, which that location is a local variable and will not reflect on the original argument!  Let's look at what we have at the start of the call again:

    fruits                     System.Collections.Generic.List<string>   Ref Ct
    +------+                   +-----------------------------------------------+
    | 1000 |   --------------> | "apple", "banana", "peach"               | 2  |
    +------+              +--> +-----------------------------------------------+
                          |
    fruitList             |                        
    +------+              |                         
    | 1000 |   -----------+                        
    +------+  

And now at the end of the call after we assign the new list (theoretical memory location 2000 again for the new list):   

    fruits                     System.Collections.Generic.List<string>   Ref Ct
    +------+                   +-----------------------------------------------+
    | 1000 |   --------------> | "apple", "banana", "peach"               | 1  |
    +------+                   +-----------------------------------------------+
    fruitList                  System.Collections.Generic.List<string>   Ref Ct
    +------+                   +-----------------------------------------------+
    | 2000 |   --------------> | "carrot", "asparagus", "broccoli"        | 1  |
    +------+                   +-----------------------------------------------+

See why this code fails to work?  At the point we assign a new list, the local reference (fruitList) which was a copy of the reference argument (fruits) has now changed to refer to a new object, and that new List<string> is the one that gets updated, not the original since they're two completely separate objects.

Passing Value Types

Now, compare this to using structures (structs).  Remember that structures (and all primitive types) are value types in C#.  This means that every time they are assigned they are copied.  This is true in parameter passing as well.  Thus, let's look at what happens if we were to use struct to represent a Point instead:

   1: public static class Program
   2: {
   3:     public static void Main()
   4:     {
   5:         var thePoint = new Point { X = 5, Y = 5 };
   6:         ChangePointToOrigin(thePoint); 
   7:  
   8:         Console.WriteLine("The point is now at [{0},{1}].", thePoint.X, thePoint.Y);
   9:     } 
  10:  
  11:     public static void ChangePointToOrigin(Point pointToMove)
  12:     {
  13:         pointToMove.X = 0;
  14:         pointToMove.Y = 0;
  15:     }
  16: }

Notice, I'm not reassigning the point (like we tried and failed to do with the reference type), I'm just changing the object itself.  However,
because structures are value types, this doesn't work because the copy does not refer back to the original, it is a copy of the original at the
time of the call:

    thePoint
    +--------+
    | X = 5  |
    | Y = 5  |
    +--------+
    pointToMove
    +--------+
    | X = 5  |
    | Y = 5  |
    +--------+

See?  Unlike reference types, value types are always copies.  Thus, obviously, if we change the contents of the copy, we don't affect the original:

    thePoint
    +--------+
    | X = 5  |
    | Y = 5  |
    +--------+
    pointToMove
    +--------+
    | X = 0  |
    | Y = 0  |
    +--------+

As a side note, interestingly enough, if the structure contains a reference type and you change the contents of the referred-to object, it
will reflect because the reference is copied as well and that reference refers back to the original object:   

   1: public struct Polygon
   2: {
   3:     public List<Point> Points { get; set; }
   4: } 
   5:  
   6: public struct Point
   7: {
   8:     public int X { get; set; }
   9:     public int Y { get; set; }
  10: } 
  11:  
  12: public static class Program
  13: {
  14:     public static void Main()
  15:     {
  16:         var myPoly = new Polygon
  17:             {
  18:                 Points = new List<Point>
  19:                     {
  20:                         new Point { X = 5, Y = 5 }, 
  21:                         new Point { X = 4, Y = 4 },
  22:                         new Point { X = 3, Y = 3 }
  23:                     }
  24:             }; 
  25:  
  26:         AddOriginToPolygon(myPoly); 
  27:  
  28:         myPoly.Points.ForEach(point => Console.WriteLine("[{0},{1}]", point.X, point.Y)); 
  29:  
  30:     } 
  31:  
  32:     public static void AddOriginToPolygon(Polygon polygon)
  33:     {
  34:         polygon.Points.Add(new Point { X = 0, Y = 0});
  35:     }
  36: } 
  37:  

This will correctly add the origin to the polygon, because in reality the Polygon contains only a reference to the List<Point> and that is the only thing that is copied:

    myPoly
    +------------+
    | Points     |
    | +--------+ |
    | | 1000   |---------------->List<Point>
    | +--------+ |               +-----------------------------+
    +------------+          +--->| [5,5], [4,4], [3,3], [0,0]  |
                            |    +-----------------------------+
                            |
    polygon                 |
    +------------+          |
    | Points     |          |
    | +--------+ |          |
    | | 1000   |------------+              
    | +--------+ |
    +------------+
                                

So, you must keep in mind that when I say structures are passed by value, and this means when they are copied, it is a copy of themselves and their members, but you must keep in mind what types their members are.  If their members are value types (like X and Y ints in Point) then they are copied as well, but if their members are reference types (classes), then only the reference location is copied which means both copies will refer to the same object member. 

If a structure contains other structures, obviously those other structures are copied as well, it's only reference types that have the caveat that only the location of the object is copied.

Passing By Reference

So, as we said in the beginning, all parameters are passed by value by default in C#.  So what happens if you do want to alter the original argument to a method?  That is, what if you want to change the original value of a value-type argument, or you want to change the original reference of a reference-type argument?  The answer is the C# ref and out keywords. 

The ref and out keywords are extremely similar in C# with one minor distinction:

  • out - argument is not required to be initialized before the call but must be assigned a value before exiting the method.
  • ref - argument is required to be initialized before the call but need not be assigned a value before exiting the method.

You can think of it this way, out parameters are used only to return a value from a method through a parameter, but ref parameters are a kind of in/out parameter that can take in a value and/or return a value.

So how does this work in practice?  You add the out or ref keyword in from of the parameter type and before the argument in the call:

   1: public static class Program
   2: {
   3:     public static void Main()
   4:     {
   5:         string greetings = "Hi"; 
   6:  
   7:         Console.WriteLine("Greetings before call: " + greetings);
   8:         ReferenceParameter(ref greetings);
   9:         Console.WriteLine("Greetings after call: " + greetings);
  10:     } 
  11:  
  12:     public static void ReferenceParameter(ref string p)
  13:     {
  14:         Console.WriteLine("string was passed to ReferenceParameter() as: " + p);
  15:         p = "Bye";
  16:         Console.WriteLine("string is leaving ReferenceParameter() as: " + p);        
  17:     }
  18: } 

Notice the addition of the ref keyword both in the method declaration and in the call.  This now means that the parameter named p is not merely a copy of the original reference greetings, but is a reference to the reference called greetings.

    greetings                  System.String     Ref Ct
    +------+                   +----------------------+
    | 1000 |   --------------> | "Hi"            | 1  |
    +------+                   +----------------------+
        ^                                 
    p   |                                 
    +------+                             
    |      |
    +------+

Since p is no longer a copy of greetings but refers back to the original reference, we can now change it in the method and end up with:

    greetings                  System.String     Ref Ct
    +------+                   +----------------------+
    | 2000 |   --------------> | "Bye"           | 1  |
    +------+                   +----------------------+
        ^                                 
    p   |                                 
    +------+                             
    |      |
    +------+

Notice greetings now refers to an entirely new object?  The out parameter works the sameway except out parameters are treated as uninitialized and can't be accessed until they get a value in the method (so the first Console.WriteLine() in the method would have been an error).  And when calling, they need not be initialized before the call.

Summary

Now, my general rule-of-thumb is to avoid ref and out unless you have a very good need of them.  For the most part you can accomplish whatever you need from the return value of a method, and to return values through parameters is not your best choice and makes your methods harder to use.  In fact, Microsoft's static code analysis tool in Visual Studio (FxCop) will warn you against using ref and out.  Now, that said, there are times when they do come in handy.  So I won't say don't use them outrights, but be careful where you do and make sure it's not a location where a return value would serve you better and be less confusing.

So, in summary, remember the rules about how parameters are passed in C#:

  • reference types are passed by value by default, this means the reference is copied and you can't alter the original reference argument in the method, but you can alter the object it refers to (if the object is not immutable).
  • value types are passed by value by default, this means the entire value is copied and you can't alter the original value argument in the reference.
  • reference types passed by reference (using ref/out) are not copied but are passed as a reference to the original reference which means you can change the reference itself and alter the object it refers to (if the object is not immutable).
  • value types passed by reference (using ref/out) are not copied but are passed as a reference to the original value which means you can change the original value itself.

 

 Technorati Tags: , , , ,

 

Print | posted on Thursday, August 5, 2010 6:18 PM | Filed Under [ My Blog C# Software .NET Fundamentals ]

Powered by: