5 December 2011

Beware of excessive string objects

Of all the primitive types in C#, string is a special primitive type in the following regard,

  1. It is a reference type (class), unlike - other primitive types being ValueType.
  2. It is immutable - i.e. once a string is created, it's contents cannot be altered.

Since string is a reference type, a string object must be created on a managed heap which should be garbage collected when the object becomes orphan. This is fine in the sense that at one or other place we can't avoid creating managed objects and all managed objects have to be garbage collected. But, if you look at the second point above which states strings are immutable. Having said this, what if you want to manipulate the string contents (like appending substring, converting to lower case etc.) ? Ofcourse you can do that. But, the original string is never altered, instead, a new string object is created with new value. What does that mean? Take a look at below code.

    string s1 = "Hello World";
    string temp = s1;           // store the reference of s1 into temp
    s1 = "New World";           // modify s1.

First I created a string s1 with value "Hello Word". Then I stored the reference of s1 in temp i.e. both temp & s1 are pointing to the same object. You can verify that by calling Object.ReferenceEquals (s1, temp) which will return true if the reference is same. Then I modify s1 to "New World". At this time, I am actually creating a new String rather than modifying the value in s1. You can verify that by comparing contents of temp and s1. So what is the impact of this? Well, to know the impact, see the below code.

    string s = string.Empty;
    Stopwatch watch = new Stopwatch();
    Int32 count = 0;
    while (true)
        s = s + (count++).ToString();

        if (watch.ElapsedMilliseconds >= 1000)
        //Thread.Sleep (1);
When I ran this code on my i3 machine running Windows7, the loop ran approximately 3 million times. Also, Garbage collection (Gen0) ran for 100 times !!! This is really huge impact on the performance. This is because, after every loop, a new string is created and the old reference becomes the candidate for garbage collection. Also, if you look at the average size of string created at every loop - it is 3Mb. If the garbage collection had not occured, just calculate the total memory occupied by all strings in one second. Thanks to GC for avoiding a disaster. When I uncommented the Sleep statement, the GC collection count reduced to 0.1 per second as the number of strings created per second reduced from 3 million to just 300 ! 

Above demo clearly states that too many strings in memory is not good. So, what are all the possible ways?

  1. Prefer StringBuilder over String in cases involving heavy string manipulations
  2. Avoid string manipulations inside a loop at all possible places.
  3. Avoid too many class level/ global string fileds.