.NET Nested Loops vs Hash Lookups

Recently I was tasked with improving the performance for a number of processes we have at work as some of these processes where taking as long as 25 mins to complete. After some initial investigation I noticed all 3 processes suffered from the same issue… Nested Loops. Basically the slow running code could be represented by the following:  

foreach (var a in _listA)
{
    foreach (var b in _listB)
    {
        if (a.String1 == b.String1) continue;
        // Do some work
    }
}

As you can hopefully spot straight away, this code would take O(N2) time for which means the time taken to complete the outer loop exponentially grows as the 2 collections of items got bigger, which can be seen in the graph below. In my tests, when both collections where at 100k, it took almost 9 minutes to complete with a single string comparison. 2015-03-02 03_45_20-Book1 - ExcelWhen you see such code, you straight away should see the low hanging fruit of where time could be shaved off in that loop, which is by removing the inner foreach by using a Hashed collection class such as the HashSet/Dictionary/Lookup to create an Index of the properties of interest.so that foreach element in A, you don’t have to iterate over each element in B. You first create an Index for the property you will be looking up, and then for each element in A, you calculate the hash for the required property and check if the hash exists in your Index (may also need to do direct comparisons in case of hash collisions). Therefore you only iterate once over List B in order to generate the Index, and once over List A in order to check if the property exists in your index, therefore taking O(n) time (linear).

 
// Could be HashSet or Dictionary or any other Hashbased collection 
var lookup = _listB.ToLookup(b => b.String1);
 
foreach (var a in _listA) 
{ 
    if (lookup.Contains(a.String1)) 
        continue; 
    // Do some work 
} 

Implementing this in my tests, made the A[100k], B[100k] test case go from 8min to 250ms which is what allowed me to make the process run in matter of seconds compared to 25 minutes!

AlgorithmA[1k] , B[1k]A[100k] , B[10k]A[10k] , B[100k]A[100k] , B[100k]
Nested Loop500ms50s50s500s
Hash Lookup0.058ms50ms125ms256ms

As you can see, using a Hash Lookup takes exponentially more time than a Nested Loop. It is also worth noting that it is quicker to build a lookup for the smaller collection and iterate over the larger collection as can be seen by the highlighted values in the table above. This is actually a topic which I will be blogging about in the near future! A few things to be aware of are:

  • Immutable Keys – A fundamental assumption by most if not all hashed collections (yet to see an implementation with handlers to some sort of HashCodeChanged event) is that the HashCode/key must not change. As if it changed, it will no longer be detected by the collection as it will be looking up the object using the new HashCode and not finding it, as it would still be in the bucket for the old HashCode. Again, this is a topic I will probably blog about in the near future.
  • Hash Algorithm – Make sure there is a performant hash algorithm being if you have rolled your own.
  • Hash Collisions – It is usual for 2 separate properties to map to a single hash, therefore you need to ensure Equals() is overloaded in the custom key struct/class or provide a custom Equality comparer.
  • Hash the Smaller Collection – If you have a choice of which collection to build an Index for, it is generally better to build it for the smaller collection. I will be blogging about this in the near future!

You can find all the source code on GitHub @ michal-ciechan/codePerf/20150302-NestedLoopsVsHashLookups

One thought on “.NET Nested Loops vs Hash Lookups

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

This site uses Akismet to reduce spam. Learn how your comment data is processed.