NUMS table creation performance

There is a Stack Overflow posting from 2009 that addresses different ways to create a NUMS table.  The accepted answer from poster KM gives a fascinating summary of seven different ways to accomplish this, including a performance test for creating 10,000 values.  I was curious to know how these different methods scaled into the 1,000,000 value range.

To this end, I created a program that created NUMS tables of various sizes and recorded the time required.  The methodology is same as that used by KM:  for methods 1 and 2, four runs are executed; for the remaining methods, eleven runs are executed.  In each case, the slowest run is discarded and the remaining times are averaged for the final result.  (Yes, one could make the case that a “cold cache” run is more applicable to real world situations, but this methodology will suffice for the purposes of this test.)

A couple of notes about the individual methods.  KM made an error in Method 1 by creating 100,000 values instead of 10,000.  Method 1 is generally the slowest of the lot, both in my tests and in KM’s test, but not by as much as KM’s test indicates.  It is pretty much on-par with Method 2.

Method 3 uses a recursive common table expression to execute a single insert into the NUMS table.  Recursive CTEs are limited to 32,767 iterations, so this method will not work for larger NUMS tables.  Accordingly, the program will skip method 3 if the number of values exceeds 32,767.

Method 4 is limited to creating NUMS tables with 2n values, where n is an integer, so the program will select a value of n that results in at least the requested the number of records.  In most cases, this will generate a NUMS table with significantly more values than the other methods.

For Method 5, I included an additional “Pass” CTE so that it is capable of generating up to about 4.3 billion values.  Similarly, Method 7’s script was modified to include two additional self-joins to ensure that even with databases with a minimal number of sys.objects records will be capable of generating at least 1,000,000 records.

I also removed the code from each method that relates to calculating the timing, and instead captured that information externally from the scripts.  Creating and dropping the NUMS table is also handled externally; however, the scripts still include the ALTER TABLE statements to add the clustered primary key index.

These tests were executed in July 2013 on a Dell R900 server with a good I/O subsystem.  I get quite different results on desktop systems, and I don’t have any results from other server systems, so bear that limitation is mind and take these results with a grain of salt.

Here are the timing results, in seconds, for each of the seven methods for 10,000 numbers, 25,000 numbers, 100,000 numbers and 1,000,000 numbers:

NumsPerfSummaryTable

NumsPerfMethods1-2Graph  NumsPerfMethods3-7Graph

Clearly, the best performer is Method 7, as KM also found.

It is useful to scale the results of the second graph.  I scaled the y-axis to show the time required to generated 1,000,000 records.  For instance, Method 7 took 0.062431 seconds to create 10,000 records, so at this same rate it would take 6.2431 seconds to generate one million numbers.  I also scaled the x-axis logarithmically.

NumsPerfMethods3-7GraphScaledLog

From this, it is apparent that Method 4 shows the best improvement at larger scales, and may well perform better than the other methods at much larger scales than 1,000,000 records.

One final note.  While Method 7 out-performs the other methods in this test, it and Method 6 have the disadvantage of being dependent on system tables that exist for quite different purposes.  While it seems unlikely that these methods will stop working for any future service pack or version of SQL Server, something just bugs me about this system table “abuse.”  My personal favorite is Method 5.  It performs reasonably well, and with the addition of a “Pass6” CTE, it can be capable of generating massive NUMS tables (beyond the reach of the BIGINT limit).  However, it does have the problem of being a bit “clunky” as code, and its purpose may not be readily apparent by those come along later.

Leave a Reply

Your email address will not be published. Required fields are marked *