SQL – Union vs Join Performance

Precursor

Following on from a comment on the answer I gave for this question on StackOverflow I was asked to comment by @seebiscuit on the following answer by someone else:

a well written join will be faster than a union

So me being me, I decided to look into this issue and do some testing myself.

Enviornment

So first off I will be running this on the following:

  • Local version of SQL Server Express 2014 (12.0.2269.0)
  • Using SSMS to run my queries.
  • Windows 10 x64
  • i7 930 1st Gen CPU | 16GB Ram | Samsung Evo 950 Pro SATA3 SSD

I decided to start of with the trusted Microsft Adventure Works SQL Server 2014 Sample Database available here:

Test Method

In my SSMS session run the following to turn on statistics

SET STATISTICS TIME ON

then record the CPU time and elapsed time for each run. Run each query 10 times, discard the 1 Minimum, 1 Maximum and average the rest (8)

Results

2016-04-14 00_39_00-Book1 - Excel

Queries – PK Clustered Index <int>

Wrote the following queries on the Person.Person table which was the biggest ( ~20k Records / 30mb Data )

Query 1 – UNION ALL:

SELECT BusinessEntityID, ModifiedDate, rowguid FROM Person.Person
UNION ALL
SELECT BusinessEntityID, ModifiedDate, rowguid FROM Person.Person    

2016-04-06 21_20_25-Start

Query 2 – UNION

SELECT BusinessEntityID, ModifiedDate, rowguid FROM Person.Person
UNION
SELECT BusinessEntityID, ModifiedDate, rowguid FROM Person.Person    

2016-04-06 21_23_43-Start

Query 3 – INNER JOIN

SELECT p1.BusinessEntityID, p1.ModifiedDate, p1.rowguid
    ,  p2.BusinessEntityID, p2.ModifiedDate, p2.rowguid 
FROM Person.Person p1 
INNER JOIN Person.Person p2 
    ON p2.BusinessEntityID = p1.BusinessEntityID

2016-04-06 21_25_35-Start

Query 4 – LEFT JOIN

SELECT p1.BusinessEntityID, p1.ModifiedDate, p1.rowguid
    ,  p2.BusinessEntityID, p2.ModifiedDate, p2.rowguid 
FROM Person.Person p1 
LEFT JOIN Person.Person p2 
    ON p2.BusinessEntityID = p1.BusinessEntityID    

2016-04-06 21_26_21-Start

Query 5 – RIGHT JOIN

SELECT p1.BusinessEntityID, p1.ModifiedDate, p1.rowguid
    ,  p2.BusinessEntityID, p2.ModifiedDate, p2.rowguid 
FROM Person.Person p1 
LEFT JOIN Person.Person p2 
    ON p2.BusinessEntityID = p1.BusinessEntityID    

2016-04-06 21_26_21-Start

Query 6 – FULL OUTER JOIN

SELECT p1.BusinessEntityID, p1.ModifiedDate, p1.rowguid
    ,  p2.BusinessEntityID, p2.ModifiedDate, p2.rowguid 
FROM Person.Person p1 
FULL OUTER JOIN Person.Person p2 
    ON p2.BusinessEntityID = p1.BusinessEntityID

2016-04-06 21_30_47-Start

Results – Table

MethodMedianStdDevCPUTime(ms)
1. Union All21.8937 ms6.7085 ms18
2. Union19.1815 ms2.2459 ms16
3. Inner Join24.5083 ms4.8172 ms21
4. Left Join23.3757 ms4.2216 ms23
5. Right Join25.7348 ms5.0168 ms23
6. Full Outer Join21.8262 ms2.2162 ms20

Comments

I was planning to do more and different types of queries and compare how Unions vs Joins compare, as well as much larger datasets (close to 2gb) but this has taken longer than I hope already (a week) and if there is more interest oleave a comment and it wil motivate me to delve into the issue.

In the end, it is hard to think of a scenario where you could use 1 or the other as they serve very different purposes! And you are best of using the right tool for the job.

If someone has some interesting scenario’s where you could actually use both then let me know!

.NET No-Op Delegate vs if vs Rx

Recently I’ve had on my mind, what is the most performant way to do optional logging, e.g. Tracing in particular, where by default it is turned off, but by using some sort of switch/command line argument, I can turn it on for extra info when trying to debug something on a users machine, and therefore cannot recompile.

In this post I will be concentrating on comparing the performance for the following methods of achieving this:

  • IObservable // RxIObservable
  • if (LogLevel <= LogLevel.Trace) // IfMoreThanOrEq
  • if (LogLevel < LogLevel.Debug) // IfMoreThanEnum
  • _traceDelegate = new delegate { } // NoOpDelegate
  • if (_traceDelegate != null) // IfTraceDelegate
  • if (IsTraceEnabled) // IfIsTraceProp
  • if (IsTraceEnabled) // IfIsTraceField

The reason I am interested is because there may arise a situation where I may occasionally want to turn on tracing, for operations which happen 1000’s of times per second. Of course once you turn it on it would be negligible which method you choose as the time taken for output of the log would be far higher, but ideally I would never be turning this logging on, only in rare occasions.

First suite of tests I did was run it with Logging Off, so I could see the effect that each method would have on the performance. I run each method 1,000,000 times as well as doing a multiplication and keeping a total to mimic some very light work.

If-Vs-NoOp-Results-LoggingOff

As you can see using a boolean field to represent whether it should log or not is fastest, and almost as fast as doing nothing where as using a property for this, took the longest. Don’t get me wrong, for doing these checks 1 million times, it is pretty pointless going and changing your loggers to use this if they are not already, but it is worth noting should you need to optimize some mission critical code in the future.

I also run the tests with the Trace turned on for completeness. The following is a relative representation of how long each test method took.

If-vs-NoOp-Results-Main.xlsx - Excel

Let me start of with the code that was being run 1,000,000 times:

for (int i = 0; i < Size; i++)
{
    methodToLog(i);
    total += DoSomeWork(i);
}

and

public int DoSomeWork(int i)
{
    return i*2;
}

So as you can see it was really simple stuff. Ofcourse normally you would have heavier work load and the impact would be marginal. But none the less, let me continue.

The implementations of each logging method are as follows in relative performance impact ascending when logging was turned of.

public class IfIsTraceField // 6% Impact
{
    public static bool IsTraceEnabled;

    public static void Trace(int x)
    {
        if (IsTraceEnabled) TraceLog.Add(x);
    }
}

Followed by

public class IfTraceDelegate // 16% Impact
{
    public static Log TraceHandler = null;

    public static void Trace(int x)
    {
        if (TraceHandler != null) TraceHandler(x);
    }
}

In Third Place:

public class RxIObservable // 36% Impact
{
    private static readonly Subject<int> TraceSubject = new Subject<int>();

    public static IObservable<int> TraceObservable
    {
        get { return TraceSubject; }
    }

    public static void Trace(int x)
    {
        TraceSubject.OnNext(x);
    }
}

And the remainder as they were all very close:

public class IfLessThanEnum // 64%
{
    public static LogLevel LogLevel { get; set; }

    public static void Trace(int x)
    {
        if (LogLevel < LogLevel.Debug) TraceLog.Add(x);
    }
}

public class NoOpDelegate // 66% Impact
{
    public static Log TraceHandler = delegate { };

    public static void Trace(int x)
    {
        TraceHandler(x);
    }
}

public class IfLessThanOrEq // 69% Impact
{
    public static LogLevel LogLevel { get; set; }

    public static void Trace(int x)
    {
        if (LogLevel <= LogLevel.Trace) TraceLog.Add(x);
    }
}

public class IfIsTraceProp // 69% Impact
{
    public static bool IsTraceEnabled { get; set; }

    public static void Trace(int x)
    {
        if (IsTraceEnabled) TraceLog.Add(x);
    }
}

Conclusions, a private field is fastest, where as property is slowest (I am assuming that this is due to the stack push/pop and/or context switching. Will need to look into the generated IL and blog about it one day.

Anyhow the code will be available on GitHub shortly.

.NET Nested Loops vs Hash Lookups

Recently I was tasked with improving the performance for a number of processes we have at work as some of these processes where taking as long as 25 mins to complete. After some initial investigation I noticed all 3 processes suffered from the same issue… Nested Loops. Basically the slow running code could be represented by the following:  

foreach (var a in _listA)
{
    foreach (var b in _listB)
    {
        if (a.String1 == b.String1) continue;
        // Do some work
    }
}

As you can hopefully spot straight away, this code would take O(N2) time for which means the time taken to complete the outer loop exponentially grows as the 2 collections of items got bigger, which can be seen in the graph below. In my tests, when both collections where at 100k, it took almost 9 minutes to complete with a single string comparison. 2015-03-02 03_45_20-Book1 - ExcelWhen you see such code, you straight away should see the low hanging fruit of where time could be shaved off in that loop, which is by removing the inner foreach by using a Hashed collection class such as the HashSet/Dictionary/Lookup to create an Index of the properties of interest.so that foreach element in A, you don’t have to iterate over each element in B. You first create an Index for the property you will be looking up, and then for each element in A, you calculate the hash for the required property and check if the hash exists in your Index (may also need to do direct comparisons in case of hash collisions). Therefore you only iterate once over List B in order to generate the Index, and once over List A in order to check if the property exists in your index, therefore taking O(n) time (linear).

 
// Could be HashSet or Dictionary or any other Hashbased collection 
var lookup = _listB.ToLookup(b => b.String1);
 
foreach (var a in _listA) 
{ 
    if (lookup.Contains(a.String1)) 
        continue; 
    // Do some work 
} 

Implementing this in my tests, made the A[100k], B[100k] test case go from 8min to 250ms which is what allowed me to make the process run in matter of seconds compared to 25 minutes!

AlgorithmA[1k] , B[1k]A[100k] , B[10k]A[10k] , B[100k]A[100k] , B[100k]
Nested Loop500ms50s50s500s
Hash Lookup0.058ms50ms125ms256ms

As you can see, using a Hash Lookup takes exponentially more time than a Nested Loop. It is also worth noting that it is quicker to build a lookup for the smaller collection and iterate over the larger collection as can be seen by the highlighted values in the table above. This is actually a topic which I will be blogging about in the near future! A few things to be aware of are:

  • Immutable Keys – A fundamental assumption by most if not all hashed collections (yet to see an implementation with handlers to some sort of HashCodeChanged event) is that the HashCode/key must not change. As if it changed, it will no longer be detected by the collection as it will be looking up the object using the new HashCode and not finding it, as it would still be in the bucket for the old HashCode. Again, this is a topic I will probably blog about in the near future.
  • Hash Algorithm – Make sure there is a performant hash algorithm being if you have rolled your own.
  • Hash Collisions – It is usual for 2 separate properties to map to a single hash, therefore you need to ensure Equals() is overloaded in the custom key struct/class or provide a custom Equality comparer.
  • Hash the Smaller Collection – If you have a choice of which collection to build an Index for, it is generally better to build it for the smaller collection. I will be blogging about this in the near future!

You can find all the source code on GitHub @ michal-ciechan/codePerf/20150302-NestedLoopsVsHashLookups