Category Archives: T-SQL

SELECTing from a VALUES clause

This post falls into the category of something that I am always forgetting how to do, so this post is a reminder to self.

It is possible to create a virtual table in SQL Server using just the VALUES clause. The secret sauce is proper placement of parenthesis (which is where I always go wrong).

select *
from
(
	values
	(1, 'Virgil', 'Davila', '18 E Bonnys Rd', 'Natchez', 'MS'),
	(2, 'Jill', 'White', '675 W Beadonhall Rd', 'Providence', 'RI'),
	(3, 'David', 'Walden', '12663 NE Sevenside St', 'Bloomington', 'IN')
) tbl (CustomerID, FirstName, LastName, Address, City, State);

Now just wrap that up in a CTE or subquery and it can be used pretty much like any SQL table, such as the source for an INSERT or UPDATE statement.

How I Create a Nums Table

You need a Nums or Tally table. You really do. OK, you can make do without, but having one really makes a lot of operations easier.

A Nums table is a good candidate for either an admin database or for the user database, depending on how tightly coupled it is with the stored procedures, functions, views, etc. On the one hand, you might want to make sure that the Nums table moves with a backup and restore of your primary database. On the other hand, you may not want the overhead of multiple Nums tables across several databases.

My preferred way to create and populate a Nums tables is:

create table Nums
(
	n int not null,
	constraint pk_Nums primary key clustered (n)
);
 
with	l0 as (select 1 v union all select 1),
		l1 as (select a.v from l0 a, l0),
		l2 as (select a.v from l1 a, l1),
		l3 as (select a.v from l2 a, l2),
		l4 as (select a.v from l3 a, l3),
		l5 as (select a.v from l4 a, l4),
		n as (select row_number() over (order by (select null)) n from l5)
insert Nums(n)
select n.n
from n
where n.n <= 1000000;

The downside here is that the code is ugly, and unless you are familiar with what’s
going on it may take a bit to decipher the script.

The upside is that the script is fast (about 5 seconds on my machine to load 1,000,000 records) and is quite adaptable. If you want more than one million numbers in the Nums table, just increase the value in the script up to about 4 billion. And if for reason you need more than that just add another CTE for “l6” following the same pattern.

For a million numbers, this occupies less than 13 MB of storage, so in most cases the overhead for a single database is pretty trivial.

Finally, I really like the flexibility of taking this same method and generating a virtual Nums table. Yes, it add a lot of ugly to an otherwise nice script, but you also get independence from having to have that Nums table, which is useful in conditions where can’t have an Admin database and can’t modify the user database.

If I want to find gaps in the IDs in my Customer table, I can come up with something like this:

declare @maxCustomerId int = (select max(CustomerID) from Customer);
 
with	l0 as (select 1 v union all select 1),
		l1 as (select a.v from l0 a, l0),
		l2 as (select a.v from l1 a, l1),
		l3 as (select a.v from l2 a, l2),
		l4 as (select a.v from l3 a, l3),
		l5 as (select a.v from l4 a, l4),
		Nums as (select row_number() over (order by (select null)) n from l5)
select	Nums.n MissingCustomerId
from	Nums left join Customer cust on Nums.n = cust.CustomerID
where	Nums.n <= @maxCustomerId
and		cust.CustomerID is null;

The nice thing is that SQL Server quite efficiently optimizes the query and doesn’t evaluate any more CTEs than are really necessary based on the constraints used on the Nums CTE.

This also makes it easy and efficient to create a table-valued function instead of (or in addition to) a persisted Nums table:

if exists (select * from sys.objects where type = 'IF' and name = 'fnNums')
	drop function dbo.fnNums;
go
create function dbo.fnNums()
returns table
as
return
with	l0 as (select 1 v union all select 1),
		l1 as (select a.v from l0 a, l0),
		l2 as (select a.v from l1 a, l1),
		l3 as (select a.v from l2 a, l2),
		l4 as (select a.v from l3 a, l3),
		l5 as (select a.v from l4 a, l4),
		Nums as (select row_number() over (order by (select null)) n from l5)
select Nums.n
from Nums
go

Convert Local Time to UTC in SQL Server

I have occasionally had to convert from local time to UTC in SQL Server, but unfortunately there is no built-in functionality to do this. I will share one solution in this post to accomplish this conversion.

By far, the hardest part of the conversion has been accounting for daylight saving time. The rules behind DST vary greatly in different parts of the world and change periodically. The solution that I present here works only for the United States, as accounting for every possible combination would be far more complex. Even for the United States, this code doesn’t account for everything; however, for my purposes, this has been more than adequate.

A little research shows that in the United States, there was no uniform DST rule prior to 1966, and I am making the simplification to ignore DST from 1965 and earlier. One set of rules applied from 1966 to 1986 (last Sunday in April to last Sunday in October). There were some exceptions in 1974 and 1975, but I am ignoring that here. From 1987 to 2006, DST was in effect from the first Sunday in April to the last Sunday in October, and since 2007 it is in effect from the second Sunday in March until the first Sunday in November. The change always happens at 2:00 am local time.

In order to implement this code, I first create a small code table that tells us when when DST starts and ends. This table, as well as all of the functions described in this post, should be created in an administrative database. The table structure is:

create table DstRanges
(
	RangeID int not null identity(1,1) primary key clustered,
	RangeStart datetime null,
	RangeEnd datetime null,
	IsDst bit not null
);
 
create nonclustered index ix_DstRanges__RangeStart_RangeEnd on DstRanges (RangeStart, RangeEnd);
go

Next, I create a helper function that will be used to populate the DstRanges tables. This function can dropped after the load is completed.

create function fn_GetDSTBeginEnd (@date datetime)
returns @dates table (dst_begin datetime, dst_end datetime)
as
begin
	declare	@year int,
			@dst_begin datetime,
			@dst_end datetime;
 
	-- determine if daylight saving time is in effect (all of the following rules based in the US)
 
	select	@year = datepart(year, @date);
 
	-- From 1966 to 1986, daylight saving in effect from:
	-- last Sunday in April until the last Sunday in October
	-- (some exception for 1974-1975, but those will not be accounted for here)
 
	if	@year >= 1966 and @year <= 1986
	begin
		-- start by getting May 1 and November 1
		select	@dst_begin = cast(cast(@year as varchar(20)) + '-05-01' as datetime),
				@dst_end = cast(cast(@year as varchar(20)) + '-11-01' as datetime);
		-- set to first Sunday in May and first Sunday in November
		select	@dst_begin = dateadd(day, 7 - ((datepart(weekday, @dst_begin) + @@datefirst - 1) % 7), @dst_begin),
				@dst_end = dateadd(day, 7 - ((datepart(weekday, @dst_end) + @@datefirst - 1) % 7), @dst_end);
		-- back one week to get last Sunday in April and last Sunday in October
		select	@dst_begin = dateadd(day, -7, @dst_begin),
				@dst_end = dateadd(day, -7, @dst_end);
		-- add two hours because the clock change happens at 2:00 am
		select	@dst_begin = dateadd(hour, 2, @dst_begin),
				@dst_end = dateadd(hour, 2, @dst_end);
	end;
 
	-- From 1987 to 2006, daylight saving in effect from:
	-- first Sunday in April until the last Sunday in October
 
	else if	@year >= 1987 and @year <= 2006
	begin
		-- start by getting April 1 and November 1
		select	@dst_begin = cast(cast(@year as varchar(20)) + '-04-01' as datetime),
				@dst_end = cast(cast(@year as varchar(20)) + '-11-01' as datetime);
		-- set to first Sunday in April and first Sunday in November
		select	@dst_begin = dateadd(day, 7 - ((datepart(weekday, @dst_begin) + @@datefirst - 1) % 7), @dst_begin),
				@dst_end = dateadd(day, 7 - ((datepart(weekday, @dst_end) + @@datefirst - 1) % 7), @dst_end);
		-- back off end date by one week to get last Sunday in October
		select	@dst_end = dateadd(day, -7, @dst_end);
		-- add two hours because the clock change happens at 2:00 am
		select	@dst_begin = dateadd(hour, 2, @dst_begin),
				@dst_end = dateadd(hour, 2, @dst_end);
	end;
 
	-- From 2007 onward, daylight saving in effect from:
	-- second Sunday in March until the first Sunday in November
 
	else if	@year >= 2007
	begin
		-- start by getting March 1 and November 1
		select	@dst_begin = cast(cast(@year as varchar(20)) + '-03-01' as datetime),
				@dst_end = cast(cast(@year as varchar(20)) + '-11-01' as datetime);
		-- set to first Sunday in March and first Sunday in November
		select	@dst_begin = dateadd(day, (7 - ((datepart(weekday, @dst_begin) + @@datefirst - 1) % 7))%7, @dst_begin),
				@dst_end = dateadd(day, (7 - ((datepart(weekday, @dst_end) + @@datefirst - 1) % 7))%7, @dst_end);
		-- forward end date by one week to get second Sunday in March
		select	@dst_begin = dateadd(day, 7, @dst_begin);
		-- add two hours because the clock change happens at 2:00 am
		select	@dst_begin = dateadd(hour, 2, @dst_begin),
				@dst_end = dateadd(hour, 2, @dst_end);
	end;
 
	if  @date >= @dst_begin
	and	@date < @dst_end
	begin
		-- spring forward
		select	@date = dateadd(hour, 1, @date);
	end;
 
	insert	@dates (dst_begin, dst_end) values (@dst_begin, @dst_end);
	return;
end
go

And next, the table can be populated with this code. In this example the table will be loaded through the year 2099, assuming that the current rules are not changed.

create table #year
(
	YearNbr int not null,
	DstStart datetime null,
	DstEnd datetime null
);
 
with l0 as (select 1 v union all select 1), l1 as (select a.v from l0 a, l0), l2 as (select a.v from l1 a, l1),
l3 as (select a.v from l2 a, l2), l4 as (select a.v from l3 a, l3), l5 as (select a.v from l4 a, l4),
nums as (select row_number() over (order by (select null)) n from l5)
insert #year (YearNbr)
select n.n
from nums n
where n.n >= 1966
and n.n <= 2099;
 
update y
set y.DstStart = dst.dst_begin,
	y.DstEnd = dst.dst_end
from #year y cross apply (select * from fn_GetDSTBeginEnd(datefromparts(y.YearNbr, 1, 1))) dst;
 
with Dst as
(
	select	y.YearNbr,
			lag(y.DstEnd, 1, null) over (order by y.YearNbr) PrevDstEnd,
			y.DstStart,
			y.DstEnd
	from	#year y
)
insert DstRanges (RangeStart, RangeEnd, IsDst)
select Dst.PrevDstEnd RangeStart, DstStart RangeEnd, 0
from Dst
union all
select DstStart RangeStart, DstEnd RangeEnd, 1
from Dst
order by RangeStart;

Finally, we define a pair of table-valued functions (one helper function, and one intended to be called by the user). Be sure to change the name of your Admin database in the two places it is referenced (lines 11 and 27).

if exists (select * from sys.objects where name = 'fn_GetDSTInfo')
	drop function fn_GetDSTInfo
go
create function fn_GetDSTInfo (@TargetDate datetime2)
returns table
as
return
	select		td.TargetDate,
				isnull(r.IsDst, 0) IsDst
	from		(select @TargetDate TargetDate) td
	left join	Admin.dbo.DstRanges r
	on			r.RangeStart <= td.TargetDate
	and			r.RangeEnd > td.TargetDate;
go
if exists (select * from sys.objects where name = 'fn_LocalTimeToUTC')
	drop function fn_LocalTimeToUTC;
go
create function fn_LocalTimeToUTC (@LocalTime datetime2, @UtcOffsetHours int, @UsesDst bit)
returns table
as
return
	with OffsetInfo as
	(
		select		@LocalTime LocalTime,
					@UtcOffsetHours UtcOffsetHours,
					case when @UsesDst = 1 and dst.IsDst = 1 then 1 else 0 end DstOffsetHours
		from		Admin.dbo.fn_GetDSTInfo(@LocalTime) dst
	)
	select	OffsetInfo.LocalTime,
            OffsetInfo.UtcOffsetHours,
            OffsetInfo.DstOffsetHours,
			dateadd(hour, -(OffsetInfo.UtcOffsetHours + OffsetInfo.DstOffsetHours), OffsetInfo.LocalTime) UtcTime
	from	OffsetInfo;
go

To use this, we call the fn_LocalTimeToUTC function. It has three parameters.

  • The local time, expressed as a datetime2
  • The offset of local time (ignoring daylight saving) from UTC
  • A bit indicating if daylight saving time is observed locally

  • For example, I am in the Central Time Zone (-6 hours from UTC) and daylight saving is observed, so I would call:

    select * from fn_LocalTimeToUTC(sysdatetime(), -6, 1);

    And the output is:

    LocalTimeToUTC

    Note that there is ambiguity between 1:00 am and 2:00 am when DST ends in the fall, and there is no good way to resolve that ambiguity. This code presumes that DST is still in effect.

    Because fn_LocalTimeToUTC is written as a TVF, it can be efficiently used in set-based operations, usually using an OUTER APPLY operator. For example:

    select oh.OrderId, oh.OrderDate, utc.UtcTime
    from OrderHeader oh
    outer apply Admin.dbo.fn_LocalTimeToUTC(oh.OrderDate, -6, 1) utc
    where oh.OrderId >= 50
    and oh.OrderId <= 59;

    LocalTimeToUTCSetBased

    NUMS table creation performance

    There is a Stack Overflow posting from 2009 that addresses different ways to create a NUMS table.  The accepted answer from poster KM gives a fascinating summary of seven different ways to accomplish this, including a performance test for creating 10,000 values.  I was curious to know how these different methods scaled into the 1,000,000 value range.

    To this end, I created a program that created NUMS tables of various sizes and recorded the time required.  The methodology is same as that used by KM:  for methods 1 and 2, four runs are executed; for the remaining methods, eleven runs are executed.  In each case, the slowest run is discarded and the remaining times are averaged for the final result.  (Yes, one could make the case that a “cold cache” run is more applicable to real world situations, but this methodology will suffice for the purposes of this test.)

    A couple of notes about the individual methods.  KM made an error in Method 1 by creating 100,000 values instead of 10,000.  Method 1 is generally the slowest of the lot, both in my tests and in KM’s test, but not by as much as KM’s test indicates.  It is pretty much on-par with Method 2.

    Method 3 uses a recursive common table expression to execute a single insert into the NUMS table.  Recursive CTEs are limited to 32,767 iterations, so this method will not work for larger NUMS tables.  Accordingly, the program will skip method 3 if the number of values exceeds 32,767.

    Method 4 is limited to creating NUMS tables with 2n values, where n is an integer, so the program will select a value of n that results in at least the requested the number of records.  In most cases, this will generate a NUMS table with significantly more values than the other methods.

    For Method 5, I included an additional “Pass” CTE so that it is capable of generating up to about 4.3 billion values.  Similarly, Method 7’s script was modified to include two additional self-joins to ensure that even with databases with a minimal number of sys.objects records will be capable of generating at least 1,000,000 records.

    I also removed the code from each method that relates to calculating the timing, and instead captured that information externally from the scripts.  Creating and dropping the NUMS table is also handled externally; however, the scripts still include the ALTER TABLE statements to add the clustered primary key index.

    These tests were executed in July 2013 on a Dell R900 server with a good I/O subsystem.  I get quite different results on desktop systems, and I don’t have any results from other server systems, so bear that limitation is mind and take these results with a grain of salt.

    Here are the timing results, in seconds, for each of the seven methods for 10,000 numbers, 25,000 numbers, 100,000 numbers and 1,000,000 numbers:

    NumsPerfSummaryTable

    NumsPerfMethods1-2Graph  NumsPerfMethods3-7Graph

    Clearly, the best performer is Method 7, as KM also found.

    It is useful to scale the results of the second graph.  I scaled the y-axis to show the time required to generated 1,000,000 records.  For instance, Method 7 took 0.062431 seconds to create 10,000 records, so at this same rate it would take 6.2431 seconds to generate one million numbers.  I also scaled the x-axis logarithmically.

    NumsPerfMethods3-7GraphScaledLog

    From this, it is apparent that Method 4 shows the best improvement at larger scales, and may well perform better than the other methods at much larger scales than 1,000,000 records.

    One final note.  While Method 7 out-performs the other methods in this test, it and Method 6 have the disadvantage of being dependent on system tables that exist for quite different purposes.  While it seems unlikely that these methods will stop working for any future service pack or version of SQL Server, something just bugs me about this system table “abuse.”  My personal favorite is Method 5.  It performs reasonably well, and with the addition of a “Pass6” CTE, it can be capable of generating massive NUMS tables (beyond the reach of the BIGINT limit).  However, it does have the problem of being a bit “clunky” as code, and its purpose may not be readily apparent by those come along later.

    COUNT DISTINCT not supported in a windowing context (not documented)

    As a trivial example, consider the following dataset:

    CountDistinctSalesDataset

    Suppose we need to determine the number of customers associated to employee, and we need this result for each row in a report in we are generating.  (The report computes other columns of interest, such as a running total, that we will ignore in this post.)

    We could write this as follows:

    select	EmployeeName,
    	count(CustomerName) over (partition by EmployeeName) CustomerCount
    from	Sales;

    This works fine for Bob, but for Alice it incorrectly returns a value of 3.  This, of course, happens because Alice is associated with Eve on two different sales.

    CountDistinctIncorrectResult

    It seems simple and logical to add the distinct keyword:

    select	EmployeeName,
    	count(distinct CustomerName) over (partition by EmployeeName) CustomerCount
    from	Sales;

    The first time that I tried this was on a SQL Server 2008 system running at SP2, and it generated the following error:

    Msg 102, Level 15, State 1, Line 2
    Incorrect syntax near 'distinct'.

    On a SQL Server 2008 system at SP3, the following improved error is generated instead:

    Msg 10759, Level 15, State 1, Line 2
    Use of DISTINCT is not allowed with the OVER clause.

    The same error happens with SQL Server 2008 R2 (SP1) and with SQL Server 2012 (RTM).

    As the second error message makes clear, this particular usage is not supported.  This is fine and well, except that this limitation doesn’t seem to documented in Books Online, at least not under the COUNT or OVER topics for these three versions of SQL Server.