Precursor
Following on from a comment on the answer I gave for this question on StackOverflow I was asked to comment by @seebiscuit on the following answer by someone else:
a well written join will be faster than a union
So me being me, I decided to look into this issue and do some testing myself.
Enviornment
So first off I will be running this on the following:
- Local version of SQL Server Express 2014 (12.0.2269.0)
- Using SSMS to run my queries.
- Windows 10 x64
- i7 930 1st Gen CPU | 16GB Ram | Samsung Evo 950 Pro SATA3 SSD
I decided to start of with the trusted Microsft Adventure Works SQL Server 2014 Sample Database available here:
Test Method
In my SSMS session run the following to turn on statistics
SET STATISTICS TIME ON
then record the CPU time and elapsed time for each run. Run each query 10 times, discard the 1 Minimum, 1 Maximum and average the rest (8)
Results
Queries – PK Clustered Index <int>
Wrote the following queries on the Person.Person
table which was the biggest ( ~20k Records / 30mb Data )
Query 1 – UNION ALL:
SELECT BusinessEntityID, ModifiedDate, rowguid FROM Person.Person
UNION ALL
SELECT BusinessEntityID, ModifiedDate, rowguid FROM Person.Person
Query 2 – UNION
SELECT BusinessEntityID, ModifiedDate, rowguid FROM Person.Person
UNION
SELECT BusinessEntityID, ModifiedDate, rowguid FROM Person.Person
Query 3 – INNER JOIN
SELECT p1.BusinessEntityID, p1.ModifiedDate, p1.rowguid
, p2.BusinessEntityID, p2.ModifiedDate, p2.rowguid
FROM Person.Person p1
INNER JOIN Person.Person p2
ON p2.BusinessEntityID = p1.BusinessEntityID
Query 4 – LEFT JOIN
SELECT p1.BusinessEntityID, p1.ModifiedDate, p1.rowguid
, p2.BusinessEntityID, p2.ModifiedDate, p2.rowguid
FROM Person.Person p1
LEFT JOIN Person.Person p2
ON p2.BusinessEntityID = p1.BusinessEntityID
Query 5 – RIGHT JOIN
SELECT p1.BusinessEntityID, p1.ModifiedDate, p1.rowguid
, p2.BusinessEntityID, p2.ModifiedDate, p2.rowguid
FROM Person.Person p1
LEFT JOIN Person.Person p2
ON p2.BusinessEntityID = p1.BusinessEntityID
Query 6 – FULL OUTER JOIN
SELECT p1.BusinessEntityID, p1.ModifiedDate, p1.rowguid
, p2.BusinessEntityID, p2.ModifiedDate, p2.rowguid
FROM Person.Person p1
FULL OUTER JOIN Person.Person p2
ON p2.BusinessEntityID = p1.BusinessEntityID
Results – Table
Method | Median | StdDev | CPUTime(ms) |
---|---|---|---|
1. Union All | 21.8937 ms | 6.7085 ms | 18 |
2. Union | 19.1815 ms | 2.2459 ms | 16 |
3. Inner Join | 24.5083 ms | 4.8172 ms | 21 |
4. Left Join | 23.3757 ms | 4.2216 ms | 23 |
5. Right Join | 25.7348 ms | 5.0168 ms | 23 |
6. Full Outer Join | 21.8262 ms | 2.2162 ms | 20 |
Comments
I was planning to do more and different types of queries and compare how Unions vs Joins compare, as well as much larger datasets (close to 2gb) but this has taken longer than I hope already (a week) and if there is more interest oleave a comment and it wil motivate me to delve into the issue.
In the end, it is hard to think of a scenario where you could use 1 or the other as they serve very different purposes! And you are best of using the right tool for the job.
If someone has some interesting scenario’s where you could actually use both then let me know!