Set Me Up: How to Think in Sets

Abstract:

SQL Server is optimized to work in sets, but this is certainly a paradigm shift for someone accustomed to row-by-row operations.

We will look at T-SQL features that can break set-based processing — generally meaning poor performance — and gather empirical evidence to support that assertion. We’ll also look at some constructs to overcome these issues and make your T-SQL scripts run faster and use fewer resources.

  • Slide deck (PDF)
  • Scripts
  • CorpDB Database (19 MB) used by the scripts. This is a backup from SQL Server 2012. If you restore to 2014+ be sure to upgrade the compatibility level for best results. Last updated 4 January 2017.

To run the demos, restore the CorpDB database, set the compatibility level (if needed), then run the “001 – Create DB Objects.sql” script to create some needed database objects. If you are unable or unwilling to enable CLR on your system, skip everything in the script starting with the sp_configure statement.

Before running the SSIS demo, copy the “UpdatedPriceList.csv” file to c:\temp. To run the SSIS demos, execute both the “050 – Price List Update – Command.dtsx” and “051 – Price List Update – Staging.dtsx” packages, either through Visual Studio or through the Execute Package Utility application. Check the results by running the corresponding SQL scripts (“050…” and “051…”).

To run the .NET App demos, execute the “ThinkInSetsTest06x.exe” utility. Modify the “ThinkInSetsTest06x.exe.config” file first to point to your SQL instance and database and to indicate a path on the file system where file names will be collected. Run the app without parameters to execute both tests, or include a single parameter as indicated below to run an individual test. The results will be written to the console.

  • Test 060 (Singleton insert test): “ThinkInSetsTest06x singleton” or “ThinkInSetsTest06x 060”.
  • Test 061 (Bulk copy test): “ThinkInSetsTest06x bulkcopy” or “ThinkInSetsTest06x 061”.

Please be aware that the “Scalar UDF Data Access,” “Multi-Statement TVF” and “CLR” demos will take some time to run (about 15 minutes on my system), and that the “Triangle Join” demo will take a very long time (I never had the patience to let it finish, but my guess is that would take about a week to finish).

Here are the results of the various tests performed during the session:

Test #Test NameRun time, ms
11Cursor16,138
12Cursor ROFF11,036
13While Loop9,372
14Set-based140
20Subquery in SELECT3,311
21Rewrite as APPLY3,046
22Rewrite using #temp874
23Rewrite as RowNum495
30Scalar UDF Data Access183,561
31Scalar UDF Constant140
32Multi-statement TVF194,591
33CLR184,538
34APPLY1,869
35Inline TVF1,757
36RowNum240
40Running Total - Cursor5,118
41Running Total - Triangle Join100,000,000
42Running Total -
Windowing
536
50SSIS - Command Component30,279
51SSIS - Staging223
60C# Singleton Insert3,546
61C# SqlBulkCopy84

Presentation History: