Categories: "Development"

C# - Using references as key in Dictionary

by Fredrik Ljung  

A Dictionaryis used to do quick* lookups of a value using a specific key. Any type can be used as both value and key but if you use your own custom type as a key you need to think of a few things.

*Quicker then say a LINQ-query on a list like MyList.Where(x => x.id == id).Single();

Default GetHashCode() and Equals() are equal if referencing the exact same object

For the following to make sense you need to understand the difference between same object and equal object. The same object is a reference to the exact same object in memory, whereas different objects in memory are only considered equal if you the programmer defines them as such:

Person A = new Person()

{

    FirstName = "Gypsy",

    LastName = "Danger"

};

 

Person B = new Person()

{

    FirstName = "Gypsy",

    LastName = "Danger"

};

 

Person C = B;

 

if (B.Equals(C)) // true

    Console.WriteLine("These references are the same");

 

if (A.Equals(B)) // not true unless Person implements Equals()

    Console.WriteLine("These references are equal");

In the above code Person A and Person B are equal but not the same. They are two different instances of the same type. Person B and Person C on the other hand are the same, they reference the exact same object. Comparison between A and B will not return true unless you override Equals in the Person class. Until then they will only be considered equal if they are also the same. This is because of how comparison works on Object. Object.Equals() will only be equal if the compared Object references the exact same object. The seemingly subtle difference between Object.Equals and Object.GetHashCode() is that GetHashCode() must* return the same value if two Objects are equal.

*With must, I mean should, or you will end up with all sorts of pain trying to use Dictionaries, HashSets, and anything else that relies on GetHashCode() to function properly.

GetHashCode() can return the same value for unequal objects.

Unequal objects usually won’t return the same value for GetHashCode(), but there is no restriction on them to not return the same value. This means that comparison between GetHashCode() can be used to check if an actual equality check needs to performed. If they are unequal it’s safe to assume they are not equal, but if they are the same, an actual equality check needs to be performed. This is why it’s usually good to override Equals in your custom classes, at least if you intend to do any equality comparisons*.

* If you haven’t provided an override for Equals in your objects, if you are using objects you haven’t created, or if you just need a different meaning of equality, you can also use IEqualityComparerwhere it is supported. Dictionary for instance supports equality checks with IEqualityComparer.

Override Equals(Object other) to bring equality to all

So GetHashCode() has returned the same value, now it’s time to prove equality, or inequality if that’s the case. Microsofts guidelines for overriding Equals is to override Equals(), and for added speed bonuses, also define Equals for your specific type:

public Person Class

{

    public string FirstName { get; set; }

    public string LastName { get; set; }

 

    public override bool Equals(object other)

    {

        if (other == null || !(other is Person))

            return false;

 

        return Equals(other as Person);

    }

 

    public bool Equals(Person other)

    {

        return FirstName.Equals(other.FirstName, StringComparison.Ordinal) &&

               LastName.Equals(other.LastName, StringComparison.Ordinal);

    }

 

    // I found the following pattern on StackOverflow. The numbers are

    // prime numbers which magically (or mathematically if you believe

    // in that stuff) creates a fairly good normal distribution of the

    // numbers, which is good for hashing (apparently).

    public override GetHashCode()

    {

        int hash = 17;

        hash = hash * 23 + FirstName.GetHashCode();

        hash = hash * 23 + LastName.GetHashCode();

        return hash;

    }

}

The cost of breaking the guidelines of GetHashCode()

Breaking the rules usually comes with a price. In programming it just about always does. In the case of references as a key, implementing GetHashCode() and then changing the values of the properties used to calculate the has code is a bad thing. However there are a few things you can do to recover. In the case of a Dictionary and HashSet and others I’m sure, you can use IEqualityComparerto work around your proper use of GetHashCode(). If you know your key will change but not how, then perhaps using the object.GetHashCode() is an option as long as you will not try and get values from the Dictionary with a key that’s Equal but not Same.

class PersonComparer : IEqualityComparer

{

    public bool Equals(Person A, Person B)

    {

        if (A.FirstName.Equals("Connor" StringComparison.Ordinal) &&

            A.LastName.Equals("MacLeod, StringComparison.Ordinal))

            return false; // There can be only one

 

        return A.Equals(B);

    }

}

I accidently (this is how I defend that i messed up, an “accident”) fell into the key change trap recently. I was using Entity Framework entities as keys before I called SaveChanges(). When SaveChanges() was called, all temporary Ids were replaced by the Ids generated in the database. The Dictionary stopped working properly since my override of Equals had the Id as part of the comparison. Since none of the other properties made the object unique, I couldn’t use a IEqualityComparer. The two solutions I came up with was to either wrap the entity in an object and not override GetHashCode(), or to recreate the dictionary using the same references as keys. I decided for the last one since it made the smallest change to my code, and since I was dealing with database calls which are slow, I figured the CPU hit of recreating the Dictionary was small in comparison.

var newMap = new Dictionary();

foreach (var entity in entityMap)

{

    newMap.Add(entity.Key, entity.Value);

}

entityMap = newMap;

The importance of using - well using

by Fredrik Ljung  

In our company, like many others I'm sure, we have a lot of small utility apps filled with legacy code. For us legacy code means code that lacks unit tests, and is full of anti-patterns. Global variables, functions with 100+ lines of code, static classes doing the brunt of the work, and a naming strategy that includes such marvels as MyFunction(), litter the code. It's very easy to judge the code on these failures, but the code has been in production for many years. It might be held together with staples and duct tape but years of small necessary tweaks means it works well for what they are used for. Occasionally, however, I get reminded of just how fragile this code base is.

The update loop

About a year ago I was out on site at a customer helping out with the implementation of our application. The application is an average size handheld application running on Windows CE. While testing I came across some issues which were easily solved by using newer hardware drivers that the OEM ships for the handheld. I decided to update the drivers, rebuild the solution, and publish the new files through our update solution. The update solution  is a rather simple service that basically keeps an application folder on the handheld in sync with a dedicated folder on the server. The client application achieves this by firing up an update utility and then shuts itself down and lets the updater handle it from there. The update utility has been used in plenty of our solutions for years without any problem, and it had worked fine for the past six months at the current customer location. However after applying the new update to the client it stopped working. Something in the process failed and the updater initiated a rollback and restarted the client. The problem though was that the client would restart the update and the entire process would repeat itself. The error messages where vague at best and no logging existed. My only clues were that the first two files updated ok then the process would freeze and after 30 second or so it would fail and rollback. As my day at the customer was all but over I didn't have time to find out why it wasn't working. I decided to update all handhelds manually and promised myself I would look at the problem at the first available moment.

Reminded of my procrastination

Fast forward one year and obviously I never got that available moment to fix the update problem. It never seemed important enough to take care of. Instead necessity hit me at a most inconvenient time. The customer had been stuck in a delayed change of ERP systems and so our small sub project also got put on hold. Recently things started moving again and I returned to the project to get the application up to speed with the rest of the project. As I'm sure you have guessed by now, I came to a point where I needed to publish updates to the application that consisted of more than two files. Since I hadn't touched the update utility in the past year the update yet again failed as it tried to update more than two files. I realize I have to put my current work on hold and go digging around in some legacy code.

Disposing of the culprits

The code was a shining example of the state of the legacy code we have amassed. The update applications Main function was about 60 lines and the rest of the code was all in static functions, ranging 20 to 50 lines of code. It's not a large project by any means, consisting of one class, Program, at about 400 lines and an easy enough structure to understand. I spotted the first hint of the problem on the third row in Main(), where the creation of the reference to the web service was not wrapped in a  using() statement, and no dispose in sight. I spent about an hour refactoring the code and found another few undisposed objects. Those were just a reflection of the actual problem, and below is a snippet from the function that broke the utility.

HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(new Uri(url));

HttpWebResponse response = (HttpWebResponse)request.GetResponse();

Stream s = response.GetResponseStream();

using (FileStream fs = File.Create(file))

{

    ...

}

The code above resided in a function that was called for every file to be updated. Once I added using() for HttpWebResponse response and for Stream S the problem disappeared. My assumption is that the lack of calls to Dispose() for response and s kept the connections to the server active. With a limit of two concurrent connections, the application timed out trying to create the third, and then initiated a roll-back.

And that is why using using()is best practice when possible.