29 December 2011

CLR Optimizations - String Interning

In my previous post, I explained how the immutable nature of strings can hurt the performance of your application. In this post, I am going to explain how CLR optimizes string handling through a technique called String Interning.

To begin with, let's take a look at below simple program.

static void Main()
{
    String s1 = "tiger";
    String s2 = "tiger";

    //compare the values of s1 and s2
    bool valuesEqual = String.Equals(s1, s2);

    //compare the references of s1 and s2
    bool referenceEqual = Object.ReferenceEquals(s1, s2);
}

When you execute this program, valuesEqual will be true which is expected to be true since both s1 and s2 contain the same value "tiger". What about the value of referenceEquals variable? there is a twist here. You expect referenceEquals to be false because both s1 and s2 are completely different objects and hence their references should be different. But wait, the value of referenceEquals will also be true!

To proceed further, just change the value of s2 to something else say "lion" and run your program. Now, valueEquals is false which is expected. Also referenceEqual is false too. Now, if you are wondoring why referenceEquals was true in first case. The answer is - it was because the result of String Interning, an optimization technique adapted by clr for string manipulation. So, let's understand what is String interning.

When you run your application (say the above program iteself), CLR creates an internal hash table. Initially the hash table will empty. Then, string s1 is created on heap with value "tiger". Now, an entry is made in the hash table where key will be "tiger" and value will be reference to string object created on the heap.




You see that s1 is created on the heap and the address (reference) of the object is stored in hash table for "tiger". Then, the CLR sees the second instruction String s2 = "tiger"; Now, instead of creating a new string object on heap, it first searches the hash table for the key "tiger" and it will defnitely find an entry. This means that a string "tiger" already exists on the heap whose reference is 0x100. Hence, CLR simly stores the reference from hash table into s2. This way, creating of a new object is avoided and thereby saving memory.



Later  if s2 is assigned a different value say "lion", then CLR will first search for the key "lion" in hash table. But it will not find any entry in hash table. Hence, a new String object will be created on heap for "lion". Also, a new entry will be made in the hash table with key being "lion" and value being reference to new object on heap.



Pretty interesting right? By adapting string interning mechanism, CLR efficiently controls the creation of strings. If you think that this feature is quite useful and want to take advantage of it, you can refer MSDN for String class's static methods Intern and IsInterned.

Apart from above said advantanges, this mechanism has also some disadvantages. The additional overhead in creating & maintaining hash table, repeated hash table lookups can hurt the performance of your application. If you think, string interning hurts your application performance, you can turn this feature off by supplying assembly level attribute "CompilationRelaxationAttribute" with value "CompilationRelaxation.NoStringInterning". But there is a catch here. Even if you supply this attribute, CLR may ignore this attribute and use String Interning. So, just be aware of this.

28 December 2011

WBEMTEST - A simple tool to access WMI classes

Windows Management Instrumentation (WMI) is the repository of windows information. Basically WMI contains a whole bunch of classes where each class holds complete information about some part of windows. For example there is a class called Win32_Printer which represents a printer installed in the system. Similarly there is a Win32_Product which represents a product installed in the system. You can complete information of all WMI classes here.

Whenver you decide to use WMI for accessing any piece of information, the first thing is to know which class contains the information and what are all the properties and methods present in that class. Initially, I always used to visit msdn to search for the right class and its components. This usually is quite time consuming. That time I cam across a handy tool that comes along with windows operating system itself - wbemtest.exe. This tool is quite handy - you can quickly access a class to know what and all properties it contains. Also, you can access existing istances of these classes. For example, there exist as many istannces of Win32_Processor class as the number of processors present in the system. If you are interested in knowing any information about a perticular processor, you can open corresponding instance. Here is how you can use wbemtest.exe to access existing instances of Win32_Processor class.

1. Run "wbemtest.exe" either from command prompt or directly from \System32\wbem folder. This will open the window shown in below figure.



2. Then click "Connect" button. This will open a window (Figure B) where you have to give the namespace which contains your desired class. A namespace is just like your C# namespace, which contains many classes. You can also think of namespace as a category like CimV2, SecurityCenter etc. Find more info on these namespaces here.



3. I am intersted in Win32_Processor class which is in cimv2 namespace. So, I will enter cimv2 and click connect. This will load cimv2 and all the buttons in first figure will be enabled. Click "Enum Instances" button which will ask you to enter the class name. It looks like below.



4. Click Ok. This will open another window listing all the instances of Win32_Processor. In my computer, there is only one processor in my system, so i get only one entry.



5. Double click on the entry. This will open the details of the processor which looks like below image. You can see that the window lists all Win32_Processor class properties with values loaded into them. So, you can get all information like, processor architecture, name, address width, data width etc.


Isn't this a nice tool? One drawback of this tool is that you need to know which class you want to access which means you cannot loop through all WMI classes. But, this tool is very handy as it is at your fingertip. I hope you too will find this useful.

22 December 2011

Obtaining Culture Information through FormatProviders

Often we come across the situations where we need to read the culture specific information like date/time format, decimal seperator, currency symbol, number styles etc. Mostly we need this information for formatting the output according to user culture. In this post, I will show you how you can obtain such culture specific information.
That said, without spending much time in theory, let's see where we find this information. As you might know, System.Globalization.CultureInfo class is the one you must target for this purpose. This class, CultureInfo, provides a public function called GetFormat (a method declared in IFormatProvider interface) whose signture looks like below,
 public virtual object GetFormat(Type formatType);
As of now, formatType paramter can be either NumberFormatInfo or DataTimeFormatInfo both of which are defined in System.Globalization namespace. The NumberFormatInfo class provides you information about number system such as Positive Sign, Negetive Sign, Decimal seperator, Currency Symbo, Native Digits etc. On the other hand DateTimeFormatInfo provides information about Date and Time formats like, DateSeperator, FirstDayOfWeek, LongDate format, ShortDate format, Time Seperator etc.
 
For example, here is how I get the currency symbol for the culture fr-FR.
 Type formatType = typeof(NumberFormatInfo);  
 CultureInfo frCulture = CultureInfo.GetCultureInfo("fr-FR");  
 NumberFormatInfo info = frCulture.GetFormat(formatType) as NumberFormatInfo;  
 String currencySymbol = info.CurrencySymbol;
I think the above example gives you the whole idea on how to get culture specific information. Below is the code that will give you all DateTime format info that is possible to get from current culture. You can use this code for NumberFormatInfo as well (just replace DateTimeFormatInfo with NumberFormatInfo).

 public String GetDateTimeFormatInfo(String cultureName)  
 {  
   CultureInfo cultInfo = CultureInfo.GetCultureInfo(cultureName);  
   var formatInfo = (DateTimeFormatInfo)cultInfo.GetFormat(typeof(DateTimeFormatInfo));  
   PropertyInfo[] properties = formatInfo.GetType().GetProperties();  
   String info = String.Empty;  
   foreach (var prop in properties)  
   {  
      if (prop.GetValue(formatInfo, null) is Array)  
      {  
         Array a = prop.GetValue(formatInfo, null) as Array;  
         String values = String.Empty;  
         foreach (var item in a)  
         {  
            values += item + ",";  
         }  
         info += prop.Name+":"+values.Substring(0,values.Length-1);  
         info += Environment.NewLine;  
      }  
      else  
      {  
         info += prop.Name + " : " + prop.GetValue(formatInfo, null);  
         info += Environment.NewLine;  
      }  
   }  
   return info;  
 }  
Note: You  can also obtain all this information from registry. If you are keen, you can have examine the below registry key.

 HKEY_CURRENT_USER\Control Panel\International  

5 December 2011

Handling System Shutdown / Logoff events in C#.NET

On .NET forums, I often come across people interested in knowing about notifications of something changed in the system. For example whether windows is shutting down, whether user is logging off, power mode changed, or system font changed etc. For instance, you may want to perform some operation when your application is terminating. In that case, what are all the possible ways of terminating the application?
  1. Closing the application by clicking close 'x' button
  2. Due to exception
  3. User kills the process
  4. When the user logs off or system shuts down
The first 2 are common scenarios which the user can take care of easily. But what about the 3rd and 4th cases? The 3rd case is, I don't think we can control. However, 4th one is interesting. When the system shutdown is started either by user or by any automated process, the shutdown process will terminate all running processes. Before that it sends a signal to all running processes that the system is going to shutdown. Upon receiving this signal, a process can do any windup operation so as to prevent any data loss. This is very important because you might be in middle of some important operation and sudden application termination may cause valuable data loss.

So, how do a .NET application receives this signal? Whenver it comes to the matter of system related things, many people think of native OS DLLs or PInvoke (some unmanaged way). But wait, we have some managed objects that do the work. There is a class called SystemEvents in Microsoft.Win32 dll which offers many system based events. You can google or refer MSDN for more info SystemEvents class. Here, I will just take an example of how System shutdown or user logoff events are handled.

Below is a simple console application that displays a message box whenever a system shutdown occurs or user logs off (In any case, it causes CLR to unload and hence a notification is sent about the event).

class Program
{
    static void Main(string[] args)
    {
        SystemEvents.SessionEnding += SystemEvents_SessionEnding;
        Console.ReadLine();  //This is needed to keep the application running.
    }

    static void SystemEvents_SessionEnding(object sender, SessionEndingEventArgs e)
    {
        switch (e.Reason)
        {
            case SessionEndReasons.Logoff:
                MessageBox.Show("User logging off");
                break;

            case SessionEndReasons.SystemShutdown:
                MessageBox.Show("System is shutting down");
                break;
        }
    }
}

Beware of excessive string objects

Of all the primitive types in C#, string is a special primitive type in the following regard,

  1. It is a reference type (class), unlike - other primitive types being ValueType.
  2. It is immutable - i.e. once a string is created, it's contents cannot be altered.

Since string is a reference type, a string object must be created on a managed heap which should be garbage collected when the object becomes orphan. This is fine in the sense that at one or other place we can't avoid creating managed objects and all managed objects have to be garbage collected. But, if you look at the second point above which states strings are immutable. Having said this, what if you want to manipulate the string contents (like appending substring, converting to lower case etc.) ? Ofcourse you can do that. But, the original string is never altered, instead, a new string object is created with new value. What does that mean? Take a look at below code.


    string s1 = "Hello World";
    string temp = s1;           // store the reference of s1 into temp
    s1 = "New World";           // modify s1.
 

First I created a string s1 with value "Hello Word". Then I stored the reference of s1 in temp i.e. both temp & s1 are pointing to the same object. You can verify that by calling Object.ReferenceEquals (s1, temp) which will return true if the reference is same. Then I modify s1 to "New World". At this time, I am actually creating a new String rather than modifying the value in s1. You can verify that by comparing contents of temp and s1. So what is the impact of this? Well, to know the impact, see the below code.
 

    string s = string.Empty;
    Stopwatch watch = new Stopwatch();
    Int32 count = 0;
       
    watch.Start();
    while (true)
    {
        s = s + (count++).ToString();

        if (watch.ElapsedMilliseconds >= 1000)
        {
            break;
        }
        //Thread.Sleep (1);
    }
 
When I ran this code on my i3 machine running Windows7, the loop ran approximately 3 million times. Also, Garbage collection (Gen0) ran for 100 times !!! This is really huge impact on the performance. This is because, after every loop, a new string is created and the old reference becomes the candidate for garbage collection. Also, if you look at the average size of string created at every loop - it is 3Mb. If the garbage collection had not occured, just calculate the total memory occupied by all strings in one second. Thanks to GC for avoiding a disaster. When I uncommented the Sleep statement, the GC collection count reduced to 0.1 per second as the number of strings created per second reduced from 3 million to just 300 ! 
 

Above demo clearly states that too many strings in memory is not good. So, what are all the possible ways?

  1. Prefer StringBuilder over String in cases involving heavy string manipulations
  2. Avoid string manipulations inside a loop at all possible places.
  3. Avoid too many class level/ global string fileds.