Halloween Protection

I thought it would be appropriate today, on the 40th anniversary of discovery of the problem, to address the issue of Halloween protection. Describing the problem is nothing new, of course, but nonetheless, I thought it would be interesting to illustrate in my own words.

Reportedly, the Halloween problem was identified on October 31, 1976 by three researchers, and the name stuck. Imagine that we have a table of employees with salary information. We’ll create such a table and make the employee name the primary key and the clustering key, and then put a nonclustered index on salary. Here is such a table with some random data.

create table Employee
(
	FirstName nvarchar(40) not null,
	Salary money not null,
	constraint pk_Employee primary key clustered (FirstName)
);
 
create nonclustered index ix_Employee__Salary on Employee (Salary);
 
insert Employee (FirstName, Salary)
values ('Hector', 22136), ('Hannah', 21394), ('Frank', 28990), ('Eleanor', 29832), ('Maria', 28843),
       ('Maxine', 23603), ('Sylvia', 26332), ('Jeremy', 20230), ('Michael', 27769), ('Dean', 24496);

The task at hand is to give all employees with a salary of less than 25,000 a 10% raise. This can be accomplished with a simple query.

update Employee
set Salary = Salary * 1.10
where Salary < 25000.00;

Now imagine that the SQL Server optimizer decides to use the nonclustered index to identify the rows to update. The index initially contains this data (bearing in mind that the index implicitly contains the clustering key).

employeencidx

Without Halloween protection, we can imagine that SQL walks through the rows in the index and makes the updates. So the Jeremy row is updated first, and the new salary is 22,253.

employeeafterjeremyupdate

However, the index is keyed by Salary and should be sorted on that column, so SQL Server will need to move the row in the index to the appropriate location.

employeeafterjeremyupdateandindexmove

Next, SQL updates the Hannah row and moves it within the index.

employeeafterhannahupdateandindexmove

And then for Hector:

employeeafterhectorupdateandindexmove

However, at this point, the next row is Jeremy again. So the database updates the Jeremy row for a second time.

employeeafterjeremyupdateandindexmove2

Hannah again.

employeeafterhannahupdateandindexmove2

Then Maxine.

employeeaftermaxineupdateandindexmove

Finally Hector for a second time, followed by Jeremy for a third time, and then Dean.

employeeafterhectorjeremydeanupdateandindexmove

At this point the next row is for Hannah again. However, her salary is greater than or equal to 25,000 and therefore doesn’t qualify for update based on the query predicate. Since none of the remaining rows meet the WHERE clause condition, processing stops and query execution is complete.

That, in a nutshell, is the Halloween problem: some rows get process multiple times as a result of the row being physically moved within the index being used to process the query. For comparison, the (incorrect) results described the above processing are on the left, and the correct results are on the right.

employeeupdatefinalresultsincorrect employeeupdatefinalresultscorrect

In this particular case, the process results in every employee having a salary of at least 25,000. However, imagine if the query didn’t have the WHERE clause but still chose to use the Salary index to process. The result would be an infinite loop where each employee’s salary would be repeatedly increased by 10% until an overflow condition occurred.

So how does SQL Server protect against the Halloween problem? There are a variety of mechanisms in place for this very purpose.

  • SQL Server may choose an alternate index. In fact, this is precisely what SQL will do in our example by reading from the clustered index to drive the updates. Because the clustered index isn’t sorted by Salary in any way, updates during the processing phase don’t affect the row location and the problem does not arise. This separation of the read and write portion avoids the problem.

    halloweenalternateindexselection

  • But what if scanning the clustered index is a relatively expensive operation. When I load 1,000,000 rows into the Employee table and only 5 employees have salaries below 25,000, SQL Server will naturally favor the nonclustered index.

    halloweennonclusteredindexselection

    Note particularly the highlighted Sort operator. Sort is a blocking operation, that is, the entire input set must be read (and then sorted) before any output is generated. Once again, this separates the read and write phases of query processing.

  • Suppose that the Salary index is the only means of access the data. For instance, if I drop the nonclustered Salary index and the primary key on the table, and then create a clustered index on Salary, SQL Server is forced to use the Salary index. In this case, SQL Server will insert an eager spool into the plan, once again separating the reading and writing.

    halloweeneagerspool

  • Memory-optimized (Hekaton) tables avoid the Halloween problem by their very nature. Rows in a memory table are read-only, and “updates” to the row cause the current version of the row to be invalidated a new version of the rows to be written. Once again, read and write separation happens, in this case organically.

I don’t intend this to be a comprehensive list of protections in SQL Server, but it covers the most common scenarios.

To me, the key takeaway here is that Halloween protection is an integral and crucial aspect of query processing that the optimizer simply must take into account. This most certainly affects the plan generation processes and likely adds overhead to the plan to ensure correct results.

Alerting on Missed Scheduled SQL Jobs, Part 3

The first two parts of this series addressed the general approach that I use in an SSIS script task to discover and alert on missed SQL Agent jobs. With apologies for the delay in producing this final post in the series, here I bring these approaches together and present the complete package.

To create the SSIS, start with an empty SSIS package and add a data flow task. In the task, add the following transformations.

1. An OLE DB Source. Point it to a connection manager with the instance to be monitored and a SQL command containing this script:

set nocount on
declare @xp_results table
(
			job_id                UNIQUEIDENTIFIER NOT NULL,
			last_run_date         INT              NOT NULL,
			last_run_time         INT              NOT NULL,
			next_run_date         INT              NOT NULL,
			next_run_time         INT              NOT NULL,
			next_run_schedule_id  INT              NOT NULL,
			requested_to_run      INT              NOT NULL,
			request_source        INT              NOT NULL,
			request_source_id     sysname          COLLATE database_default NULL,
			running               INT              NOT NULL,
			current_step          INT              NOT NULL,
			current_retry_attempt INT              NOT NULL,
			job_state             INT              NOT NULL
);
 
/* Values for "job_state":
		0 = Not idle or suspended
		1 = Executing
		2 = Waiting For Thread
		3 = Between Retries
		4 = Idle
		5 = Suspended
		[6 = WaitingForStepToFinish]
		7 = PerformingCompletionActions
*/
 
DECLARE @can_see_all_running_jobs INT = 1;
DECLARE @job_owner sysname = suser_sname();
 
INSERT INTO @xp_results
EXECUTE master.dbo.xp_sqlagent_enum_jobs @can_see_all_running_jobs, @job_owner;
 
select	@@SERVERNAME InstanceName,
		j.name as job_name,
		j.date_created,
		x.last_run_date,
		x.last_run_time,
		x.next_run_date,
		x.next_run_time,
		x.current_step,
		x.job_state,
		x.job_id,
		s.name schedule_name,
		s.freq_type,
		s.freq_interval,
		s.freq_subday_type,
		s.freq_subday_interval,
		s.freq_relative_interval,
		s.freq_recurrence_factor,
		s.active_start_date,
		s.active_end_date,
		s.active_start_time,
		s.active_end_time,
		@@servername ServerName
from	@xp_results x
inner join msdb.dbo.sysjobs j on x.job_id = j.job_id
inner join msdb.dbo.sysjobschedules js on js.job_id = j.job_id
inner join msdb.dbo.sysschedules s on s.schedule_id = js.schedule_id
where j.enabled = 1
and s.enabled = 1
and s.freq_type in (4, 8, 16, 32)
and x.job_state not in (1, 2, 3);

2. Add a sort transformation, with “job_id” as the sort key input column (in ascending order).

3. Add a script component. When prompted, add it as a transformation. On the “Input Columns” tab, select all columns. On the “Inputs and Outputs” tab, select “Output 0” and change SynchronousInputID to “None.”

Expand “Output 0” and select “Output columns.” Add the following as outputs:

  • InstanceName, Unicode string, length 256
  • JobId, unique identifier
  • JobName, Unicode string, length 256
  • ExpectedRunDate, database timestamp with precision
  • LastRunDate, database timestamp with precision

Return to the “Script” tab and “Edit Script.” At the bottom, find the Input0_ProcessInputRow method and delete it, replacing it with the contents of this ZIP file.

At the top of the script, expand the “Namespaces” section and add these statements:

using System.Collections.Generic;
using System.Linq;

Add the following declarations inside the ScriptMain class, such as immediately before the PreExecute method:

private bool _isFirstRow = true;
private RowData _previousRow = null;
private DateTime _previousJobRunDate;
private List<DateTime> _forecast = null;

4. Add a conditional split transformation.

In the first row (order = 1), set Output Name = ExpectedRunDateIsNull, Condition = ISNULL(ExpectedRunDate).

In the second row (order = 2), set Output Name = ExpectedRunDateInFuture, Condition = ExpectedRunDate >= DATEADD(“minute”,-30,GETDATE()).

Set the default output name = ExpectedRunDateInPast

5. Add an OLE DB Destination, linking to the ExpectedRunDateInPast output from the condition split. In a database somewhere, create a table to collect the results.

create table MissedScheduleMonitorResults
(
	ResultId int not null identity(1,1),
	MonitorRunTime datetime2 not null default (sysdatetime()),
	InstanceName int not null,
	JobId uniqueidentifier not null,
	JobName nvarchar(256) not null,
	LastRunDate datetime null,
	ExpectedRunDate datetime null,
	AlertSent bit not null default (0),
	constraint pk_MissedScheduleMonitorResults primary key clustered (ResultId),
);

Create a connection manager to this database, and in the OLD DB destination point to this connection manager and select the MissedScheduleMonitorResults table. Use the default mappings (where column names match).

The data flow task should looks something like this. Yeah, no naming standards here.

missedschedulemonitordft

That’s it! Now, you can create an job with a script that reads from the table where AlertSent = false, sends an appropriate message, and sets AlertSent to true.