Halloween Protection

I thought it would be appropriate today, on the 40th anniversary of discovery of the problem, to address the issue of Halloween protection. Describing the problem is nothing new, of course, but nonetheless, I thought it would be interesting to illustrate in my own words.

Reportedly, the Halloween problem was identified on October 31, 1976 by three researchers, and the name stuck. Imagine that we have a table of employees with salary information. We’ll create such a table and make the employee name the primary key and the clustering key, and then put a nonclustered index on salary. Here is such a table with some random data.

create table Employee
(
	FirstName nvarchar(40) not null,
	Salary money not null,
	constraint pk_Employee primary key clustered (FirstName)
);
 
create nonclustered index ix_Employee__Salary on Employee (Salary);
 
insert Employee (FirstName, Salary)
values ('Hector', 22136), ('Hannah', 21394), ('Frank', 28990), ('Eleanor', 29832), ('Maria', 28843),
       ('Maxine', 23603), ('Sylvia', 26332), ('Jeremy', 20230), ('Michael', 27769), ('Dean', 24496);

The task at hand is to give all employees with a salary of less than 25,000 a 10% raise. This can be accomplished with a simple query.

update Employee
set Salary = Salary * 1.10
where Salary < 25000.00;

Now imagine that the SQL Server optimizer decides to use the nonclustered index to identify the rows to update. The index initially contains this data (bearing in mind that the index implicitly contains the clustering key).

employeencidx

Without Halloween protection, we can imagine that SQL walks through the rows in the index and makes the updates. So the Jeremy row is updated first, and the new salary is 22,253.

employeeafterjeremyupdate

However, the index is keyed by Salary and should be sorted on that column, so SQL Server will need to move the row in the index to the appropriate location.

employeeafterjeremyupdateandindexmove

Next, SQL updates the Hannah row and moves it within the index.

employeeafterhannahupdateandindexmove

And then for Hector:

employeeafterhectorupdateandindexmove

However, at this point, the next row is Jeremy again. So the database updates the Jeremy row for a second time.

employeeafterjeremyupdateandindexmove2

Hannah again.

employeeafterhannahupdateandindexmove2

Then Maxine.

employeeaftermaxineupdateandindexmove

Finally Hector for a second time, followed by Jeremy for a third time, and then Dean.

employeeafterhectorjeremydeanupdateandindexmove

At this point the next row is for Hannah again. However, her salary is greater than or equal to 25,000 and therefore doesn’t qualify for update based on the query predicate. Since none of the remaining rows meet the WHERE clause condition, processing stops and query execution is complete.

That, in a nutshell, is the Halloween problem: some rows get process multiple times as a result of the row being physically moved within the index being used to process the query. For comparison, the (incorrect) results described the above processing are on the left, and the correct results are on the right.

employeeupdatefinalresultsincorrect employeeupdatefinalresultscorrect

In this particular case, the process results in every employee having a salary of at least 25,000. However, imagine if the query didn’t have the WHERE clause but still chose to use the Salary index to process. The result would be an infinite loop where each employee’s salary would be repeatedly increased by 10% until an overflow condition occurred.

So how does SQL Server protect against the Halloween problem? There are a variety of mechanisms in place for this very purpose.

  • SQL Server may choose an alternate index. In fact, this is precisely what SQL will do in our example by reading from the clustered index to drive the updates. Because the clustered index isn’t sorted by Salary in any way, updates during the processing phase don’t affect the row location and the problem does not arise. This separation of the read and write portion avoids the problem.

    halloweenalternateindexselection

  • But what if scanning the clustered index is a relatively expensive operation. When I load 1,000,000 rows into the Employee table and only 5 employees have salaries below 25,000, SQL Server will naturally favor the nonclustered index.

    halloweennonclusteredindexselection

    Note particularly the highlighted Sort operator. Sort is a blocking operation, that is, the entire input set must be read (and then sorted) before any output is generated. Once again, this separates the read and write phases of query processing.

  • Suppose that the Salary index is the only means of access the data. For instance, if I drop the nonclustered Salary index and the primary key on the table, and then create a clustered index on Salary, SQL Server is forced to use the Salary index. In this case, SQL Server will insert an eager spool into the plan, once again separating the reading and writing.

    halloweeneagerspool

  • Memory-optimized (Hekaton) tables avoid the Halloween problem by their very nature. Rows in a memory table are read-only, and “updates” to the row cause the current version of the row to be invalidated a new version of the rows to be written. Once again, read and write separation happens, in this case organically.

I don’t intend this to be a comprehensive list of protections in SQL Server, but it covers the most common scenarios.

To me, the key takeaway here is that Halloween protection is an integral and crucial aspect of query processing that the optimizer simply must take into account. This most certainly affects the plan generation processes and likely adds overhead to the plan to ensure correct results.

Alerting on Missed Scheduled SQL Jobs, Part 3

The first two parts of this series addressed the general approach that I use in an SSIS script task to discover and alert on missed SQL Agent jobs. With apologies for the delay in producing this final post in the series, here I bring these approaches together and present the complete package.

To create the SSIS, start with an empty SSIS package and add a data flow task. In the task, add the following transformations.

1. An OLE DB Source. Point it to a connection manager with the instance to be monitored and a SQL command containing this script:

set nocount on
declare @xp_results table
(
			job_id                UNIQUEIDENTIFIER NOT NULL,
			last_run_date         INT              NOT NULL,
			last_run_time         INT              NOT NULL,
			next_run_date         INT              NOT NULL,
			next_run_time         INT              NOT NULL,
			next_run_schedule_id  INT              NOT NULL,
			requested_to_run      INT              NOT NULL,
			request_source        INT              NOT NULL,
			request_source_id     sysname          COLLATE database_default NULL,
			running               INT              NOT NULL,
			current_step          INT              NOT NULL,
			current_retry_attempt INT              NOT NULL,
			job_state             INT              NOT NULL
);
 
/* Values for "job_state":
		0 = Not idle or suspended
		1 = Executing
		2 = Waiting For Thread
		3 = Between Retries
		4 = Idle
		5 = Suspended
		[6 = WaitingForStepToFinish]
		7 = PerformingCompletionActions
*/
 
DECLARE @can_see_all_running_jobs INT = 1;
DECLARE @job_owner sysname = suser_sname();
 
INSERT INTO @xp_results
EXECUTE master.dbo.xp_sqlagent_enum_jobs @can_see_all_running_jobs, @job_owner;
 
select	@@SERVERNAME InstanceName,
		j.name as job_name,
		j.date_created,
		x.last_run_date,
		x.last_run_time,
		x.next_run_date,
		x.next_run_time,
		x.current_step,
		x.job_state,
		x.job_id,
		s.name schedule_name,
		s.freq_type,
		s.freq_interval,
		s.freq_subday_type,
		s.freq_subday_interval,
		s.freq_relative_interval,
		s.freq_recurrence_factor,
		s.active_start_date,
		s.active_end_date,
		s.active_start_time,
		s.active_end_time,
		@@servername ServerName
from	@xp_results x
inner join msdb.dbo.sysjobs j on x.job_id = j.job_id
inner join msdb.dbo.sysjobschedules js on js.job_id = j.job_id
inner join msdb.dbo.sysschedules s on s.schedule_id = js.schedule_id
where j.enabled = 1
and s.enabled = 1
and s.freq_type in (4, 8, 16, 32)
and x.job_state not in (1, 2, 3);

2. Add a sort transformation, with “job_id” as the sort key input column (in ascending order).

3. Add a script component. When prompted, add it as a transformation. On the “Input Columns” tab, select all columns. On the “Inputs and Outputs” tab, select “Output 0” and change SynchronousInputID to “None.”

Expand “Output 0” and select “Output columns.” Add the following as outputs:

  • InstanceName, Unicode string, length 256
  • JobId, unique identifier
  • JobName, Unicode string, length 256
  • ExpectedRunDate, database timestamp with precision
  • LastRunDate, database timestamp with precision

Return to the “Script” tab and “Edit Script.” At the bottom, find the Input0_ProcessInputRow method and delete it, replacing it with the contents of this ZIP file.

At the top of the script, expand the “Namespaces” section and add these statements:

using System.Collections.Generic;
using System.Linq;

Add the following declarations inside the ScriptMain class, such as immediately before the PreExecute method:

private bool _isFirstRow = true;
private RowData _previousRow = null;
private DateTime _previousJobRunDate;
private List<DateTime> _forecast = null;

4. Add a conditional split transformation.

In the first row (order = 1), set Output Name = ExpectedRunDateIsNull, Condition = ISNULL(ExpectedRunDate).

In the second row (order = 2), set Output Name = ExpectedRunDateInFuture, Condition = ExpectedRunDate >= DATEADD(“minute”,-30,GETDATE()).

Set the default output name = ExpectedRunDateInPast

5. Add an OLE DB Destination, linking to the ExpectedRunDateInPast output from the condition split. In a database somewhere, create a table to collect the results.

create table MissedScheduleMonitorResults
(
	ResultId int not null identity(1,1),
	MonitorRunTime datetime2 not null default (sysdatetime()),
	InstanceName int not null,
	JobId uniqueidentifier not null,
	JobName nvarchar(256) not null,
	LastRunDate datetime null,
	ExpectedRunDate datetime null,
	AlertSent bit not null default (0),
	constraint pk_MissedScheduleMonitorResults primary key clustered (ResultId),
);

Create a connection manager to this database, and in the OLD DB destination point to this connection manager and select the MissedScheduleMonitorResults table. Use the default mappings (where column names match).

The data flow task should looks something like this. Yeah, no naming standards here.

missedschedulemonitordft

That’s it! Now, you can create an job with a script that reads from the table where AlertSent = false, sends an appropriate message, and sets AlertSent to true.

Why TF3604?

I continue to be surprised at how few people in the SQL Server community “get” my username.

About a year ago, in mid-August 2015, I was trying to figure a Twitter username to use as a professional account. On a lark, I searched to see if “tf3604” was available, figuring that surely someone had already appropriated it. To my surprise it was available, so I grabbed it.

A few weeks passed before it even occurred to me to wonder if “tf3604.com” were available. Surely, surely that was in use somewhere, maybe even for non-SQL Server purposes. But nope, it was still out there. Even then, I hesitated for a day or two before buying the domain. That was September 17, 2015.

So why TF3604?

There are many undocumented commands and features in SQL Server that require trace flag 3604 to be enabled in order to produce useful output. For instance, DBCC PAGE will run fine but appear to do nothing without TF3604 turned on, but this trace flag tells SQL Server to route the output to the client. In SSMS, this output is then displayed on the Messages tab.

Virtually all of these undocumented commands and features that pair with TF3604 deal with SQL Server internals.

I am fascinated with SQL Server internals. To me, the name TF3604 relates to shining a light on these internals.

So here we are, a year after picking this username on a whim. And I still love the name.

Alerting on Missed Scheduled SQL Jobs, Part 2 (Custom Aggregates in SSIS)

In the previous post in this series, I introduced the goal of finding missed scheduled jobs in SQL Server. The next step in the solution required what essentially amounted to a custom aggregate in SSIS. This just wasn’t something I’d had to deal with before, so I’ll address that part of the problem in this post.

To keep things simple, I will use an example that is decoupled from the problem at hand. In part 3 of this series, I will use these techniques to present the final missed schedules solution.

My contrived example here is to do a string concatenation aggregate on a customer table where I want all of my customers who live in the same city to be aggregated into the same record. My source query is:

select c.City, c.FirstName + ' ' + c.LastName Name
from dbo.Customer c
where c.State = 'KS'
order by c.City, c.LastName, c.FirstName;

The first few records from this query are:

CustomersWithCity

Since we have only one customer in Abilene, I would expect for my output to contain a record with just one name for that city.

Abilene | Joe Wolf

Since there are multiple customers in Andover, the output should have this record:

Andover | Clara Anderson, Shirley Holt, Wanda Kemp, Brandon Rowe, Sandra Wilson

Here is how I do the aggregate. I start by creating a data flow task with three components.

CustomAggregateDataFlowTask

In the Source component, I connect to my database and give it the SQL listed above.

Next, I add a Script Component. SSIS prompts me to select the type of script component that this object will be:

ScriptComponentOptions

This will be a transformation script. After I connect my source to the script component and open the script component, I get this dialog.

ScriptTransformationEditor

On the Input Columns tab, I make sure that both City and Name are selected, and then I move to the Inputs and Outputs tab. I click on “Output 0.” Here is the secret sauce: I change the SynchronousInputId property to None. The default is that SSIS will have one output row per input row, but making this change allows me to control when an output row is generated. Finally, I expand “Output 0” and click on “Output Columns.” I press the Add Column button, rename it to City, and set the DataType property to “string [DT_STR].” The default length of 50 is OK for this situation. I then add another column, call it “CustomerNames” and make it a string as well and set the length to 8000.

I then go back to the Script tab and edit the script. I replace the Input0_ProcessInputRow method with the following.

public override void Input0_ProcessInputRow(Input0Buffer Row)
{
	string currentCity = Row.City_IsNull ? null : Row.City;
	if (currentCity == _previousCity)
	{
		if (_customers != null)
		{
			_customers += ", ";
		}
		_customers += Row.Name_IsNull ? string.Empty : Row.Name;
	}
	else
	{
		if (_customers != null)
		{
			Output0Buffer.AddRow();
			Output0Buffer.City = _previousCity;
			Output0Buffer.CustomerNames = _customers;
		}
 
		_customers = Row.Name_IsNull ? string.Empty : Row.Name;
	}
 
	_previousCity = currentCity;
}
 
public override void Input0_ProcessInput(Input0Buffer Buffer)
{
	base.Input0_ProcessInput(Buffer);
	if (Buffer.EndOfRowset())
	{
		if (_customers != null)
		{
			Output0Buffer.AddRow();
			Output0Buffer.City = _previousCity;
			Output0Buffer.CustomerNames = _customers;
		}
	}
}
 
private string _previousCity = null;
private string _customers = null;

What this code does is to take advantage of the fact that our input is sorted by City and reads through the records one at a time, watching for changes to the city field. As it goes, it keeps track of the customers associated with that city, and when the city changes, it writes a records to the output. The code in Input0_ProcessInput exists just to output the final row.

Finally, I create a destination in my data flow task. The destination can be anything we want; I just output to a table.

When I run the package and select from the output table, I get exactly what I expected.

CustomAggregateResults

Note that is very similar to how the SQL Stream Aggregate processes data, and it is absolutely dependent on having the input sorted by the aggregation column. If the data is not sorted, we would have to take a very different approach.

In the final part of this post, I will use this same technique to complete the missed jobs monitor.

Alerting on Missed Scheduled SQL Jobs, Part 1

We had an interesting situation at work recently where a SQL Agent went into what I would call a “zombie state” where the service was actually still alive and running but not really doing much. Scheduled jobs weren’t being executed. Well, mostly. It did appear that some jobs were going and others just sat there for several days without running. If a backup job doesn’t run, it’s a big deal.

I still don’t understand quite what happened, and likely never will since so much of SQL Agent is a black box and the various logs didn’t reveal a whole lot. Based on internet searches, this happens to folks from time to time, and rebooting the server is the general solution, which worked for us. But there doesn’t seem to be much out there about how to alert if the problem happens.

Such a monitor initially seemed like it would be a really simple thing. The more I dug in, however, it turned out to be considerably more involved than it first appeared, primarily for two reasons. The first is that the scheduling metadata is a bit complicated, and unraveling it is far more than just a simple query. Second, as I created the solution in SSIS, I came to realize that would need to write was basically amounted to a custom aggregate, something I’d not needed to do before and which was a learning process for me.

This is going to be a three-part post. I’ll address the first difficulty in this post, and the aggregation difficulty in the next one. Finally, I’ll pull it all together into the final monitoring package.

Scheduling Metadata

Job metadata is stored in msdb. There is a dbo.sysjobs table that stores one row per job. The dbo.sysschedules table stores the scheduling information for jobs, and dbo.sysjobschedules ties the other two together. All of these tables are well documented in BOL. There is also a sysjobhistory table to store the history, but we aren’t going need that. We are going to have use the undocumented xp_sqlagent_enum_jobs procedure to get some live job information.

For now, I’ll focus on sysschedules since that contains the meat of the scheduling metadata. The key bit of information here the freq_type column which indicates how frequently the schedule is activated. I am only concerned about recurring schedules, so the values I’m interested in are 4 (daily), 8 (weekly), 16 (monthly using days of the month) and 32 (monthly using weeks of the month). Some of the other columns in the table get interpreted in different ways depending on the value of freq_type. Most notably, the freq_interval columns gets used in varying ways for different frequency types.

Why Do We Even Care?

Why do we even need to parse through the scheduling metadata? It’s because SQL Server doesn’t have any built-in functionality (at least so far as I’ve been able to discover) to determine the next time a schedule should fire after an arbitrary date. There are ways, including with sp_help_job, to identify the next time a job should run, but this basically determines the next time the job should run from the current time. This doesn’t particularly help us.

For example, suppose a job last ran on January 1 and should fire every week. If today is January 20, SQL will tell us that the next run will be on January 22. Without knowing how to read through the schedule metadata, we don’t know that anything was missed. Knowing that the job runs every week, we can compute that the next time should be on January 8. Since this date is in the past, we know that some runs got skipped.

So the general idea for the monitor is this. We will identify all jobs that are enabled and that have at least one recurring schedule that is enabled. We will determine the last time that the job executed, and determine when it should run next. If the next run date is in the past, something happened, and we need to send an alert.

We will not consider any jobs are that currently executing. It is entirely possible to miss a scheduled run because the job is already active, and this monitor doesn’t care about that. However, it is a good idea to separately monitor for jobs that get stuck and run beyond what is normal.

Forecasting Schedules

I initially started writing the forecasting logic in T-SQL, but it very quickly got ugly and I switched to C#. It got ugly anyway. But as it stands now, the solution is in C# code that I later integrated into a script task in SSIS.

For now, we will ignore the fact that a job can have multiple schedules, which we address in Part 2 of this article. The code that follows treats one schedule at a time. Later, we will combine any multiple schedules for the job together.

The code consists of a single static class called SqlAgentSchedule with a single public method called Forecast. The signature of Forecast basically takes the values of the relevant columns from sysschedules.

public static List<DateTime> Forecast(
	DateTime startingFromDate,
	int type,
	int interval,
	int subdayType,
	int subdayInterval,
	int relativeInterval,
	int recurrenceFactor,
	int? startDateInt,
	int? endDateInt,
	int? startTimeInt,
	int? endTimeInt)

The method returns a list of the next few times that the job should run. Specifically, it will return, at a minimum, any remaining times that the job should run today as well any times that the job should run on the next scheduled day. This is definitely overkill, but will guarantee that the returned list contains at least one item, except in the case where the job’s end date has passed.

After doing some simple computation to determine the start and end dates and times for the job, we call one of four different functions to compute the schedule based on the freq_type value. Finally, we filter out any run times that are prior to the specified start time.

List<DateTime> forecast = null;
switch (type)
{
	case 4:
		forecast = ForecastDailySchedule(
			startingFromDate.Date, 
			interval, 
			subdayType, 
			subdayInterval, 
			startDate.Value, 
			endDate.Value);
		break;
 
	case 8:
		forecast = ForecastWeeklySchedule(
			startingFromDate.Date, 
			interval, 
			subdayType, 
			subdayInterval, 
			recurrenceFactor, 
			startDate.Value, 
			endDate.Value);
		break;
 
	case 16:
		forecast = ForecastMonthlyByDaysSchedule(
			startingFromDate.Date, 
			interval, 
			subdayType, 
			subdayInterval, 
			recurrenceFactor, 
			startDate.Value, 
			endDate.Value);
		break;
 
	case 32:
		forecast = ForecastMonthlyByWeeksSchedule(
			startingFromDate.Date, 
			interval, 
			subdayType, 
			subdayInterval,
			relativeInterval, 
			recurrenceFactor, 
			startDate.Value, 
			endDate.Value);
		break;
 
	default:
		return null;
}
 
forecast = forecast.Where(d => d > startingFromDate).ToList();

The devil is in those schedule type-specific forecast functions. Disclaimer: I don’t by any means guarantee that the code is right, especially for some edge-case scenarios. I am also quite unsure about the exact logic that SQL uses for weekly and monthly schedules where the “relative interval” is greater than 1.

This code is also pretty raw; it could definitely stand a good refactoring. I just haven’t had the time to invest in that effort. That said, the code should be functional as-is.

General Approach to Forecasting Schedules

The idea here is that we first need to find the first day on or after the “starting from date” value sent into the function that meets the schedule criteria. Each of the schedule types have a parameter indicating how often the schedule recurs. For daily schedules, this is the freq_interval column; for weekly and monthly schedules it is the freq_recurrence_factor columns. I’ll call it the Recurrence Interval in the following discussion.

We first compute the number of days (or weeks, or months) that have elapsed since the Start Date for the schedule, and then take this number modulo the Recurrence Interval. If the modulo is non-zero, we add enough days, weeks or months to get to the next multiple of Recurrence Interval. For weekly and monthly schedules, this may result in a date that doesn’t meet some of the other qualifications, so we need to keep advancing the date until all of the criteria are satisfied.

Next, we compute the next date that the schedule should run, either in the current current Recurrence Interval or in the next one.

Schedules also have information about when the job should run within a day. This is indicated by the freq_subday_type column. A value of 1 indicates that the schedule does not recur within the day and should just run once. Other values indicate the number of time units that are represented by the freq_subday_interval column. A freq_subday_type of 2 indicates that the time units are seconds; 4 represents minutes; and 8 represents hours.

We also have the active_start_time and active_end_time columns that indicate when in the day the schedule starts and stops. These are integer values that indicate the times using the formula hour * 10000 + minute * 100 + second, so we have to reverse this formula to get the times in to usable formats. So then we begin at the start time each day and go to the end time at freq_subday_interval seconds/minutes/hours and add an entry to the forecast for each one.

A special case is when end time is less than the start time. This seems to mean that SQL will run the job into the following day until the end time, so we also have to account for that.

The code to handle the sub-day time logic is as follows.

TimeSpan subDayFrequency = GetSubdayFrequency(subdayType, subdayInterval);
TimeSpan startTime = startDate.TimeOfDay;
TimeSpan endTime = endDate.TimeOfDay;
if (endTime < startTime)
{
	endTime = endTime.Add(_oneDay);
}

With this information we can loop through and find all times that meet the schedule criteria within the time span:

for (TimeSpan time = startTime; time <= endTime; time = time + subDayFrequency)
{
	forecast.Add(nextRunDate + time);
}

GetSubdayFrequency is defined as follows. Note that in the case of a one-time schedule (sub-day type 1) we use a bit of hack and defines the recurrence as the number of seconds in a day. This guarantees that we will only get one hit in the current day.

private static TimeSpan GetSubdayFrequency(int subdayType, int subdayInterval)
{
	int seconds = 0;
	switch (subdayType)
	{
		case 1:
			seconds = 24 * 60 * 60;
			break;
 
		case 2:
			seconds = subdayInterval;
			break;
 
		case 4:
			seconds = subdayInterval * 60;
			break;
 
		case 8:
			seconds = subdayInterval * 60 * 60;
			break;
 
		default:
			throw new ApplicationException("invalid subdayType.");
	}
 
	return new TimeSpan(0, 0, seconds);
}
Daily Schedules

The daily schedule is by far the simplest to implement. Since the job simply runs every x days, really all we need to do is to determine the next day on a multiple of x from the schedule start date.

int daysElapsed = (int)((startingFromDate - startDate.Date).TotalDays);
int daysInCurrentInterval = daysElapsed % interval;
int daysUntilNextRun = daysInCurrentInterval == 0 ? 0 : interval - daysInCurrentInterval;
DateTime nextRunDate = startingFromDate.AddDays(daysUntilNextRun);
Weekly Schedules

With a weekly schedule, things start off pretty much the same as for daily. If a schedules fires every x weeks, it’s really the same thing as saying the schedule files every 7x days, so the logic is pretty much the same as before. Remember, we now need to use the freq_recurrence_factor column instead of the freq_interval column.

int daysElapsed = (int)((startingFromDate - startDate.Date).TotalDays);
int daysInCurrentInterval = daysElapsed % (recurrenceFactor * 7);
int daysUntilNextRun = daysInCurrentInterval == 0 ? 0 : (recurrenceFactor * 7) - daysInCurrentInterval;
DateTime nextRunDate = startingFromDate.AddDays(daysUntilNextRun);

But recall that we can specify which days of the week that job runs, so we have to find the next day within the week that qualifies. The freq_interval column is used to identify the days of the week that are selected as a bitmap. I have a C# flag-based enum called SqlDayOfWeek that directly maps to the bitmap.

SqlDayOfWeek dayOfWeek = (SqlDayOfWeek)interval;
nextRunDate = AdjustToNextMatchingDayOfWeekInValidWeek(nextRunDate, dayOfWeek, startDate, recurrenceFactor);

The function that is called basically just the checks the date passed in to see if it matches both the scheduled days as well as the recurrence interval. If not, we advance to the next day and keep on rechecking. It’s a brute force approach.

private static DateTime AdjustToNextMatchingDayOfWeekInValidWeek(DateTime currentDate, SqlDayOfWeek sqlDayOfWeek, DateTime startDate, int recurrenceFactor)
{
	bool isDone = false;
	while (isDone == false)
	{
		currentDate = AdjustToNextMatchingDayOfWeek(currentDate, sqlDayOfWeek);
		int weeksElapsed = (int)((currentDate - startDate.Date).TotalDays) / 7;
		if ((weeksElapsed % recurrenceFactor) == 0)
		{
			isDone = true;
		}
		else
		{
			currentDate = currentDate.AddDays(1);
		}
	}
 
	return currentDate;
}
 
private static DateTime AdjustToNextMatchingDayOfWeek(DateTime currentDate, SqlDayOfWeek sqlDayOfWeek)
{
	while (IsDayOfWeekMatch(currentDate.DayOfWeek, sqlDayOfWeek) == false)
	{
		currentDate = currentDate.AddDays(1);
	}
 
	return currentDate;
}
 
private static bool IsDayOfWeekMatch(DayOfWeek dow, SqlDayOfWeek sql)
{
	return (dow == DayOfWeek.Sunday && sql.HasFlag(SqlDayOfWeek.Sunday)) ||
		(dow == DayOfWeek.Monday && sql.HasFlag(SqlDayOfWeek.Monday)) ||
		(dow == DayOfWeek.Tuesday && sql.HasFlag(SqlDayOfWeek.Tuesday)) ||
		(dow == DayOfWeek.Wednesday && sql.HasFlag(SqlDayOfWeek.Wednesday)) ||
		(dow == DayOfWeek.Thursday && sql.HasFlag(SqlDayOfWeek.Thursday)) ||
		(dow == DayOfWeek.Friday && sql.HasFlag(SqlDayOfWeek.Friday)) ||
		(dow == DayOfWeek.Saturday && sql.HasFlag(SqlDayOfWeek.Saturday));
}
Monthly Schedules (Day of Month)

We follow the same basic principle here, with the slight complication of having to deal with rollover into a new year. There is an interesting twist here. Suppose that I schedule a job to fire on the 31st of every month, starting in January, and recurring every three months. The next time the job should run is then “April 31,” a non-existent date. SQL Server will simply skip this invalid date, and next run on July 31, so the code has to handle this odd case.

Monthly Schedules (Week of Month)

This type of schedule takes a little bit more effort to interpret. I define another enumeration called SqlWeek that wraps the relativeInterval parameter, and then a couple of helper methods to determine the days of the week the job should run and to compute the next run date.

SqlDayOfWeek dayOfWeek = DayOfWeekFromRelativeMonthDay(interval);
SqlWeek week = (SqlWeek)relativeInterval;
DateTime nextRunDate = GetRelativeDayInMonth(nextRunYear, nextRunMonth, week, dayOfWeek);

The helper methods are as follows.

private static SqlDayOfWeek DayOfWeekFromRelativeMonthDay(int interval)
{
	switch (interval)
	{
		case 1:
			return SqlDayOfWeek.Sunday;
 
		case 2:
			return SqlDayOfWeek.Monday;
 
		case 3:
			return SqlDayOfWeek.Tuesday;
 
		case 4:
			return SqlDayOfWeek.Wednesday;
 
		case 5:
			return SqlDayOfWeek.Thursday;
 
		case 6:
			return SqlDayOfWeek.Friday;
 
		case 7:
			return SqlDayOfWeek.Saturday;
 
		case 8:
			return SqlDayOfWeek.Day;
 
		case 9:
			return SqlDayOfWeek.Weekday;
 
		case 10:
			return SqlDayOfWeek.WeekendDay;
	}
}
 
private static DateTime GetRelativeDayInMonth(int year, int month, SqlWeek week, SqlDayOfWeek dayOfWeek)
{
	int lowerDayInclusive;
	int upperDayInclusive;
	int lastDayInMonth = DateTime.DaysInMonth(year, month);
	switch (week)
	{
		case SqlWeek.First:
			lowerDayInclusive = 1;
			upperDayInclusive = 7;
			break;
 
		case SqlWeek.Second:
			lowerDayInclusive = 8;
			upperDayInclusive = 14;
			break;
 
		case SqlWeek.Third:
			lowerDayInclusive = 15;
			upperDayInclusive = 21;
			break;
 
		case SqlWeek.Fourth:
			lowerDayInclusive = 22;
			upperDayInclusive = 28;
			break;
 
		case SqlWeek.Last:
			lowerDayInclusive = lastDayInMonth - 6;
			upperDayInclusive = lastDayInMonth;
			break;
 
		default:
			throw new ApplicationException("Invalid week.");
	}
 
	for (int day = lowerDayInclusive; day <= upperDayInclusive; day++)
	{
		DateTime date = new DateTime(year, month, day);
		if (IsDayOfWeekMatch(date.DayOfWeek, dayOfWeek))
		{
			return date;
		}
	}
}
Summary

Well, that was an awful lot of work just get the next run time following an arbitrary date, and this was definitely far more work than I expected.

In Part 2 of this post, I’ll show how I built a custom aggregate in SSIS. In Part 3, I’ll wrap the C# code into an SSIS package and pull everything together.

United States Geographical Data: ZIP Codes

In a previous post, I gave a script to load state and county geographical data into SQL Server. Here, I extend that data to include ZIP code boundaries.

One thing that I have recently learned is that ZIP code data is really a collection of points, in particular, specific places where mail is delivered. These points can be abstracted in a series of lines along a delivery route, for instance, houses on a residential street. However, ZIP code geographic data is, strictly speaking, not a geographic area. It is possible to draw boundaries around the ZIP code delivery points, but these boundaries are only approximations, and different sources will draw the boundaries in different ways.

More specifically, what I present here are known as ZIP code tabulation areas (ZCTAs) used by the Census Bureau to approximate ZIP codes for census purposes. Here is source of the data.

One thing you may note is that the ZCTAs don’t cover all the possible area of the United States. For instance, large lakes and wilderness areas tend to not be included in any ZCTA.

As with the state and county data, the ZIP code boundaries can be loaded into a table with this definition:

CREATE TABLE [dbo].[PoliticalRegion](
	[PoliticalRegionId] [int] NOT NULL IDENTITY(1,1),
	[RegionName] [nvarchar](255) NOT NULL,
	[Type] [nvarchar] (30) NOT NULL,
	[ParentRegionName] [nvarchar] (255) NULL,
	[Boundaries] [geography] NOT NULL
);

Here is the ZIP code data. The file is quite large (41 MB). One way that I been able to get this data to load is via the command line:

sqlcmd -S serverName -d databaseName -E -i ZipCodeInserts.sql

What Records Are Included in a Transaction Log Backup?

I had an interesting situation come up recently, and it boiled down to this. Suppose I start a transaction log backup at 10:00:00 and it runs for one minute, until 10:01:00. Is a transaction that happened at 10:00:30 included in the backup? Let’s do an experiment.

I have a sample database containing customers and orders, and an application that fires up a bunch of threads and inserts random data into the database. So I kicked off the application and let it run for several hours so as to generate a bunch of log records. I disabled the normal log backups for the duration of this test.

With the application still running, I then kicked off a transaction log backup, introduced a brief delay, followed by a second log backup.

backup log CorpDB to disk = 'S:\Backup\SQL2014\Log\log1.trn' with init, compression;
waitfor delay '0:00:10';
backup log CorpDB to disk = 'S:\Backup\SQL2014\Log\log2.trn' with init, compression;

Now I want to look in the header of my backup files.

restore headeronly from disk = 'S:\backup\sql2014\Log\log1.trn';
restore headeronly from disk = 'S:\backup\sql2014\Log\log2.trn';

RestoreHeader

I see that the first log backup started at 7:30:17 pm and ended at 7:31:37, and the second backup ranged from 7:31:47 to 7:31:47. Unfortunately, this is only stored with one-second resolution, but we can see that the first backup ran for about 80 seconds.

So the question now is which backup contains a transaction that occurred at, say, 7:30:30?

I’m going to load the transaction log records into a couple of temp tables.

select row_number() over (order by [Current LSN]) rn, *
into #log1
from sys.fn_dump_dblog(null, null, 'disk', 1, 'S:\backup\sql2014\Log\log1.trn',
default, default, default, default, default, default, default, default, default, default, default, 
default, default, default, default, default, default, default, default, default, default, default, 
default, default, default, default, default, default, default, default, default, default, default, 
default, default, default, default, default, default, default, default, default, default, default, 
default, default, default, default, default, default, default, default, default, default, default, 
default, default, default, default, default, default, default, default);
 
create clustered index ix1_log1 on #log1 (rn);
 
select row_number() over (order by [Current LSN]) rn, *
into #log2
from sys.fn_dump_dblog(null, null, 'disk', 1, 'S:\backup\sql2014\Log\log2.trn',
default, default, default, default, default, default, default, default, default, default, default, 
default, default, default, default, default, default, default, default, default, default, default, 
default, default, default, default, default, default, default, default, default, default, default, 
default, default, default, default, default, default, default, default, default, default, default, 
default, default, default, default, default, default, default, default, default, default, default, 
default, default, default, default, default, default, default, default);
 
create clustered index ix1_log2 on #log2 (rn);

Now let’s take a look at the last 10 records in the first log file, the first 10 records in the second log, and the last 10 records in the second log.

select 1 log_nbr, rn, Operation, Context, [Current LSN], [Transaction ID], [Begin Time], [End Time]
from #log1
where rn >= (select max(rn) from #log1) - 9
union all
select 2 log_nbr, rn, Operation, Context, [Current LSN], [Transaction ID], [Begin Time], [End Time]
from #log2
where rn <= 10
union all
select 2 log_nbr, rn, Operation, Context, [Current LSN], [Transaction ID], [Begin Time], [End Time]
from #log2
where rn >= (select max(rn) from #log2) - 9
order by log_nbr, rn;

CombinedLogRecords

The first backup contains log records through 7:30:17.873, and the second backup starts with log records a few milliseconds later at 7:30:17.887 and ends at 7:31:47.327. In other words, the second log contains the records covering the 90 seconds between the start times of the two log backups.

To answer my specific question I can also find a transaction that happened at 7:30:30 pm in the second log file:

select top 1 rn, Operation, Context, [Current LSN], [Transaction ID], [Begin Time], [End Time]
from #log2
where [Begin Time] >= '2016/04/29 19:30:30'
order by [Begin Time];

SearchForSpecificTxn

From this, I conclude that a log backup includes transactions up to approximately the BackupStartDate contained in the header, and not the BackupEndDate.

Caveat: I only tried this on SQL Server 2014 SP1 Developer Edition. It is entirely possible that other versions / service packs / editions behave differently.

A Catch-All Backup Job (with Automatic Restore Script Generation)

Job one as a DBA is to ensure that the systems under your control are backed up. Actually, job one is to make sure that systems are restorable, but that starts with robust backup jobs.

One job that I always set up on my personal SQL Server instances is a catch-all backup job. A job, that is, that backups up all databases except for ones that are specifically excluded. Why? I don’t always think about creating the backup jobs just because I’ve added a new database. A shortcoming, to be sure, but even on those personal system I want to be protected.

I am a firm believer that backups on enterprise systems should be done via dedicated backup jobs due to the variety of RPO/RTO requirements associated with those systems, but I suppose that a case could be made for a catch-all backup job there as well.

The Catch-All Backup Job

First, we need a table to store a list of databases that will be excluded from the backups. I put this in a database called Admin.

create table dbo.DatabasesExcludedFromBackup
(
	 DatabaseName sysname not null
);
 
insert dbo.DatabasesExcludedFromBackup (DatabaseName) values ('tempdb');

Be sure to add other databases that don’t need to be covered this job, including databases that are backed up through other jobs, or databases like AdventureWorks that you can easily recreate from the original source.

Automated Restore Script Generation

I also like to throw in some code to generate restore scripts automatically. We need a couple of tables to store this (also in the Admin database).

create table dbo.BackupType
(
	BackupType varchar(10) not null,
	Description nvarchar(100) null,
	constraint pk_BackupType primary key clustered (BackupType)
);
 
create table dbo.BackupChainType
(
	BackupChainType varchar(25) not null,
	Description nvarchar(100) null,
	constraint pk_BackupChainType primary key clustered (BackupChainType)
);
 
insert dbo.BackupType (BackupType, Description)
	values ('Diff', 'Differential Backup'), ('Full', 'Full backup'), ('Log', 'Transaction log backup');
 
insert dbo.BackupChainType (BackupChainType, Description)
	values ('Full+Diff', 'Full backup plus differential backups'),
	('Full+Diff+Log', 'Full backup plus differential backups plus transaction log backups'),
	('Full+Log', 'Full backup plus transaction log backups'), 
	('FullOnly', 'Full backup only');
 
create table dbo.RestoreScript
(
	RestoreScriptId int identity(1,1) not null,
	DatabaseName sysname not null,
	BackupChainType varchar(25) not null,
	BackupType varchar(10) not null,
	TimeGenerated datetime2(7) not null constraint df_RestoreScript__TimeGenerated  default (sysdatetime()),
	ScriptText nvarchar(max) not null,
	constraint pk_RestoreScript primary key clustered (RestoreScriptId),
	constraint fk_RestoreScript__BackupChainType foreign key (BackupChainType) references dbo.BackupChainType (BackupChainType),
	constraint fk_RestoreScript__BackupType foreign key (BackupType) references dbo.BackupType (BackupType)
);

The Full Backup Job

Next, create a SQL Agent job. I call my job “Backup all databases – FULL” and add a job step with this script. Be sure to set the location of the backup folder at the beginning of the script, and also change all references to the “Admin” database if necessary.

declare	@dbName nvarchar(256),
		@recoveryModel nvarchar(60),
		@backupTime nvarchar(100),
		@fileName nvarchar(255),
		@sql nvarchar(4000),
		@sqlNoRecovery nvarchar(4000),
		@destinationFolder nvarchar(256);
 
-- Set this to the appropriate backup folder.  ** IMPORTANT ** It must end with a backslash!
select @destinationFolder = 'S:\SQL\Backups\Full\';
 
declare dbCsr cursor for select name, recovery_model_desc from sys.databases where name not in (select DatabaseName from Admin.dbo.DatabasesExcludedFromBackup)
	and source_database_id is null;
open dbCsr;
fetch next from dbCsr into @dbName, @recoveryModel;
while @@FETCH_STATUS = 0
begin
	select	@backupTime = replace(replace(replace(CONVERT(nvarchar(100), getdate(), 120), '-', ''), ':', ''), ' ', '_');
	select	@fileName = @destinationFolder + @dbName	+ '.' + @backupTime + '.bak'
	select	@sql = 'backup database [' + @dbName + '] to disk = ''' + @fileName + ''' with init, compression;';
	exec(@sql);
 
	delete Admin.dbo.RestoreScript where DatabaseName = @dbName;
 
	select @sql = 'restore database [' + @dbName + '] from disk = ''' + @fileName + ''' with replace';
	select @sqlNoRecovery = @sql + ', norecovery;';
	select @sql += ';';
 
	if @recoveryModel = 'SIMPLE'
	begin
		insert Admin.dbo.RestoreScript (DatabaseName, BackupChainType, BackupType, ScriptText)
		values (@dbName, 'FullOnly', 'Full', @sql);
 
		insert Admin.dbo.RestoreScript (DatabaseName, BackupChainType, BackupType, ScriptText)
		values (@dbName, 'Full+Diff', 'Full', @sqlNoRecovery);
	end
	else
	begin
		insert Admin.dbo.RestoreScript (DatabaseName, BackupChainType, BackupType, ScriptText)
		values (@dbName, 'Full+Log', 'Full', @sqlNoRecovery);
 
		insert Admin.dbo.RestoreScript (DatabaseName, BackupChainType, BackupType, ScriptText)
		values (@dbName, 'Full+Diff+Log', 'Full', @sqlNoRecovery);
	end
 
	fetch next from dbCsr into @dbName, @recoveryModel;
end
close dbCsr;
deallocate dbCsr;

Set an appropriate schedule on the job. For me, once per day works fine.

The Log Backup Job

Now create another job called “Backup all databases – LOG” and add this step:

declare	@dbName nvarchar(256),
		@backupTime nvarchar(100),
		@fileName nvarchar(255),
		@sql nvarchar(4000),
		@destinationFolder nvarchar(256);
 
-- Set this to the appropriate backup folder.  ** IMPORTANT ** It must end with a backslash!
select @destinationFolder = 'S:\SQL\Backups\Log\';
 
declare dbCsr cursor for
select db.name from sys.databases db
cross apply (select top 1 * from msdb.dbo.backupset bs where bs.database_name = db.name and bs.type = 'D' order by bs.backup_finish_date desc) bs
where db.recovery_model_desc != 'SIMPLE'
and db.name not in (select DatabaseName from Admin.dbo.DatabasesExcludedFromBackup)
and db.source_database_id is null;
 
open dbCsr;
fetch next from dbCsr into @dbName;
while @@FETCH_STATUS = 0
begin
	select	@backupTime = replace(replace(replace(CONVERT(nvarchar(100), getdate(), 120), '-', ''), ':', ''), ' ', '_');
	select	@fileName = @destinationFolder + @dbName	+ '.' + @backupTime + '.trn'
	select	@sql = 'backup log [' + @dbName + '] to disk = ''S:\Backup\SQL2014\Log\' + 
		@dbName	+ '.' + @backupTime + '.trn'' with init, compression';
	exec(@sql);
 
	select @sql = 'restore log [' + @dbName + '] from disk = ''' + @fileName + ''' with norecovery;';
 
	insert Admin.dbo.RestoreScript (DatabaseName, BackupChainType, BackupType, ScriptText)
	values (@dbName, 'Full+Log', 'Log', @sql);
 
	insert Admin.dbo.RestoreScript (DatabaseName, BackupChainType, BackupType, ScriptText)
	values (@dbName, 'Full+Diff+Log', 'Log', @sql);
 
	fetch next from dbCsr into @dbName;
end
close dbCsr;
deallocate dbCsr;

Be sure to schedule the job according to your needs. I run log backups once per hour.

Additional Steps

You will want to add an extra step into your backup jobs to purge the old backup files after a few days, otherwise they will keep accumulating forever.

I also add a step to copy the files off-system because I am definitely a believer in getting backups off to another box as quickly as possible, just in case the machine blows up.

The Restore Script

The backup jobs generate restore scripts as they process the backups. You can find a list of the possible restore paths by executing:

select distinct DatabaseName, BackupChainType
from Admin.dbo.RestoreScript;

Then you can pass in a combination of DatabaseName and BackupChainType into a script such as the following:

select ScriptText
from Admin.dbo.RestoreScript
where DatabaseName = 'Admin'
and BackupChainType = 'Full+Log'
order by TimeGenerated asc;

This will output the commands to restore the database to the last log backup. You will need to recover the database manually, and you easily select which logs to restore or add a STOPAT clause to recover to a point in time.

I even have a small SSIS package which runs a Foreach Loop and extract each restore script out to a text file. The few times I have needed to use it, these pre-generated scripts have made the restores trivially simple to execute.

Potential Enhancements

Even though the script tables allow for differential backups in the mix, there’s nothing here that really uses differentials. For my purposes, that’s overkill, but you can certainly add a “Backup all databases – DIFF’ and include logic to generate appropriate restore scripts.

First Impressions of anonymize()

In a recent previous post, I gave a method that I have used to create test data. As I was playing with the new anonymize() function introduced in SQL Server 2016 RC0, it occurred to me that it could be used for similar purpose.

The Books Online page for anonymize() is somewhat skimpy as of this writing, but there’s enough to figure out a the general idea. The function always require a column as input, and this seems to be the case so that the function can sample the data in the column to determine the range of output values, particularly with numeric data types. However, with fewer than three rows, there just doesn’t seem to be enough to work with, and anonymize() will simply output the row values.

declare @t table (FirstName nvarchar(25), LastName nvarchar(25), Age int, DateOfBirth date);
insert @t (FirstName, LastName, Age, DateOfBirth)
values	('Mark', 'Smith', 25, '1990-09-01'),
	('Emily', 'Jones', 23, '1992-12-31');
 
select	anonymize(FirstName) as FirstName,
	anonymize(LastName) as LastName,
	anonymize(Age) as Age,
	anonymize(DateOfBirth) as DateOfBirth
from @t;

anonymize_2rows

But when we add a third row, the magic starts to happen. I’ll run this statement twice in a row:

declare @t table (FirstName nvarchar(25), LastName nvarchar(25), Age int, DateOfBirth date);
insert @t (FirstName, LastName, Age, DateOfBirth)
values	('Mark', 'Smith', 25, '1990-09-01'),
	('Emily', 'Jones', 23, '1992-12-31'),
	('Sara', 'Brown', 31, '1985-02-22');
 
select	anonymize(FirstName) as FirstName,
	anonymize(LastName),
	anonymize(Age) as Age,
	anonymize(DateOfBirth) as DateOfBirth
from @t;

anonymize_3rows_run1

anonymize_3rows_run2

So, a couple of observations. First, the ages that are displayed are in the range of the input data. That is, the input in the age column are between 23 and 31 years old, and the output are similarly from 24 to 28 in the first result, and from 25 to 30 in the second. Clearly, SQL Server is reading the input column and basing the anonymized output on the data in there. Similarly, the dates output are in the ranges of the input column. However, there is no relationship between Age and DateOfBirth, which makes sense since SQL Server has nothing to tie the two columns together. (Books Online makes it clear, by the way, that anonymize() will not work on computed columns.)

But the real magic happens with the names columns (both first and last). Notice that the function output values are in no way derived from the inputs. I have to admit to being quite impressed that with only three rows, SQL Server can interpret that these columns contain name data. And no, it not based on the column names; when I rename the columns to c1 and c2, the above example still works fine.

Perhaps more impressive is that it can detect and generate addresses as well, even using obviously fake address data as input.

declare @t table (Address nvarchar(100), City nvarchar(40), State nvarchar(5));
insert @t (Address, City, State)
values ('1234 Main St', 'Anytown', 'PA'),
	('987 E 1st Ave', 'Podunkville', 'KS'),
	('456 S 654 W', 'By the Sea', 'CA');
 
select	anonymize(Address),
	anonymize(City),
	anonymize(State)
from @t;

anonymize_address

As with many other new features, the anonymize() function comes with a host of limitations. As previously mentioned, it doesn’t work on computed columns. As you would expect, you cannot use it in a WHERE clause, and the function is incompatible with most other new SQL Server features, including Always Encrypted, in-memory tables and temporal tables. Interesting things also happen when you use anonymize() with a TOP clause.

declare @t table (FirstName nvarchar(25), LastName nvarchar(25), Age int, DateOfBirth date);
insert @t (FirstName, LastName, Age, DateOfBirth)
values	('Mark', 'Smith', 25, '1990-09-01'),
	('Emily', 'Jones', 23, '1992-12-31'),
	('Sara', 'Brown', 31, '1985-02-22');
 
select	top 1 anonymize(FirstName) as FirstName,
	anonymize(LastName) as LastName,
	anonymize(DateOfBirth) as DateOfBirth

anonymize_top

How I Create a Nums Table

You need a Nums or Tally table. You really do. OK, you can make do without, but having one really makes a lot of operations easier.

A Nums table is a good candidate for either an admin database or for the user database, depending on how tightly coupled it is with the stored procedures, functions, views, etc. On the one hand, you might want to make sure that the Nums table moves with a backup and restore of your primary database. On the other hand, you may not want the overhead of multiple Nums tables across several databases.

My preferred way to create and populate a Nums tables is:

create table Nums
(
	n int not null,
	constraint pk_Nums primary key clustered (n)
);
 
with	l0 as (select 1 v union all select 1),
		l1 as (select a.v from l0 a, l0),
		l2 as (select a.v from l1 a, l1),
		l3 as (select a.v from l2 a, l2),
		l4 as (select a.v from l3 a, l3),
		l5 as (select a.v from l4 a, l4),
		n as (select row_number() over (order by (select null)) n from l5)
insert Nums(n)
select n.n
from n
where n.n <= 1000000;

The downside here is that the code is ugly, and unless you are familiar with what’s
going on it may take a bit to decipher the script.

The upside is that the script is fast (about 5 seconds on my machine to load 1,000,000 records) and is quite adaptable. If you want more than one million numbers in the Nums table, just increase the value in the script up to about 4 billion. And if for reason you need more than that just add another CTE for “l6” following the same pattern.

For a million numbers, this occupies less than 13 MB of storage, so in most cases the overhead for a single database is pretty trivial.

Finally, I really like the flexibility of taking this same method and generating a virtual Nums table. Yes, it add a lot of ugly to an otherwise nice script, but you also get independence from having to have that Nums table, which is useful in conditions where can’t have an Admin database and can’t modify the user database.

If I want to find gaps in the IDs in my Customer table, I can come up with something like this:

declare @maxCustomerId int = (select max(CustomerID) from Customer);
 
with	l0 as (select 1 v union all select 1),
		l1 as (select a.v from l0 a, l0),
		l2 as (select a.v from l1 a, l1),
		l3 as (select a.v from l2 a, l2),
		l4 as (select a.v from l3 a, l3),
		l5 as (select a.v from l4 a, l4),
		Nums as (select row_number() over (order by (select null)) n from l5)
select	Nums.n MissingCustomerId
from	Nums left join Customer cust on Nums.n = cust.CustomerID
where	Nums.n <= @maxCustomerId
and		cust.CustomerID is null;

The nice thing is that SQL Server quite efficiently optimizes the query and doesn’t evaluate any more CTEs than are really necessary based on the constraints used on the Nums CTE.

This also makes it easy and efficient to create a table-valued function instead of (or in addition to) a persisted Nums table:

if exists (select * from sys.objects where type = 'IF' and name = 'fnNums')
	drop function dbo.fnNums;
go
create function dbo.fnNums()
returns table
as
return
with	l0 as (select 1 v union all select 1),
		l1 as (select a.v from l0 a, l0),
		l2 as (select a.v from l1 a, l1),
		l3 as (select a.v from l2 a, l2),
		l4 as (select a.v from l3 a, l3),
		l5 as (select a.v from l4 a, l4),
		Nums as (select row_number() over (order by (select null)) n from l5)
select Nums.n
from Nums
go