A Catch-All Backup Job (with Automatic Restore Script Generation)

Job one as a DBA is to ensure that the systems under your control are backed up. Actually, job one is to make sure that systems are restorable, but that starts with robust backup jobs.

One job that I always set up on my personal SQL Server instances is a catch-all backup job. A job, that is, that backups up all databases except for ones that are specifically excluded. Why? I don’t always think about creating the backup jobs just because I’ve added a new database. A shortcoming, to be sure, but even on those personal system I want to be protected.

I am a firm believer that backups on enterprise systems should be done via dedicated backup jobs due to the variety of RPO/RTO requirements associated with those systems, but I suppose that a case could be made for a catch-all backup job there as well.

The Catch-All Backup Job

First, we need a table to store a list of databases that will be excluded from the backups. I put this in a database called Admin.

create table dbo.DatabasesExcludedFromBackup
(
	 DatabaseName sysname not null
);
 
insert dbo.DatabasesExcludedFromBackup (DatabaseName) values ('tempdb');

Be sure to add other databases that don’t need to be covered this job, including databases that are backed up through other jobs, or databases like AdventureWorks that you can easily recreate from the original source.

Automated Restore Script Generation

I also like to throw in some code to generate restore scripts automatically. We need a couple of tables to store this (also in the Admin database).

create table dbo.BackupType
(
	BackupType varchar(10) not null,
	Description nvarchar(100) null,
	constraint pk_BackupType primary key clustered (BackupType)
);
 
create table dbo.BackupChainType
(
	BackupChainType varchar(25) not null,
	Description nvarchar(100) null,
	constraint pk_BackupChainType primary key clustered (BackupChainType)
);
 
insert dbo.BackupType (BackupType, Description)
	values ('Diff', 'Differential Backup'), ('Full', 'Full backup'), ('Log', 'Transaction log backup');
 
insert dbo.BackupChainType (BackupChainType, Description)
	values ('Full+Diff', 'Full backup plus differential backups'),
	('Full+Diff+Log', 'Full backup plus differential backups plus transaction log backups'),
	('Full+Log', 'Full backup plus transaction log backups'), 
	('FullOnly', 'Full backup only');
 
create table dbo.RestoreScript
(
	RestoreScriptId int identity(1,1) not null,
	DatabaseName sysname not null,
	BackupChainType varchar(25) not null,
	BackupType varchar(10) not null,
	TimeGenerated datetime2(7) not null constraint df_RestoreScript__TimeGenerated  default (sysdatetime()),
	ScriptText nvarchar(max) not null,
	constraint pk_RestoreScript primary key clustered (RestoreScriptId),
	constraint fk_RestoreScript__BackupChainType foreign key (BackupChainType) references dbo.BackupChainType (BackupChainType),
	constraint fk_RestoreScript__BackupType foreign key (BackupType) references dbo.BackupType (BackupType)
);

The Full Backup Job

Next, create a SQL Agent job. I call my job “Backup all databases – FULL” and add a job step with this script. Be sure to set the location of the backup folder at the beginning of the script, and also change all references to the “Admin” database if necessary.

declare	@dbName nvarchar(256),
		@recoveryModel nvarchar(60),
		@backupTime nvarchar(100),
		@fileName nvarchar(255),
		@sql nvarchar(4000),
		@sqlNoRecovery nvarchar(4000),
		@destinationFolder nvarchar(256);
 
-- Set this to the appropriate backup folder.  ** IMPORTANT ** It must end with a backslash!
select @destinationFolder = 'S:\SQL\Backups\Full\';
 
declare dbCsr cursor for select name, recovery_model_desc from sys.databases where name not in (select DatabaseName from Admin.dbo.DatabasesExcludedFromBackup)
	and source_database_id is null;
open dbCsr;
fetch next from dbCsr into @dbName, @recoveryModel;
while @@FETCH_STATUS = 0
begin
	select	@backupTime = replace(replace(replace(CONVERT(nvarchar(100), getdate(), 120), '-', ''), ':', ''), ' ', '_');
	select	@fileName = @destinationFolder + @dbName	+ '.' + @backupTime + '.bak'
	select	@sql = 'backup database [' + @dbName + '] to disk = ''' + @fileName + ''' with init, compression;';
	exec(@sql);
 
	delete Admin.dbo.RestoreScript where DatabaseName = @dbName;
 
	select @sql = 'restore database [' + @dbName + '] from disk = ''' + @fileName + ''' with replace';
	select @sqlNoRecovery = @sql + ', norecovery;';
	select @sql += ';';
 
	if @recoveryModel = 'SIMPLE'
	begin
		insert Admin.dbo.RestoreScript (DatabaseName, BackupChainType, BackupType, ScriptText)
		values (@dbName, 'FullOnly', 'Full', @sql);
 
		insert Admin.dbo.RestoreScript (DatabaseName, BackupChainType, BackupType, ScriptText)
		values (@dbName, 'Full+Diff', 'Full', @sqlNoRecovery);
	end
	else
	begin
		insert Admin.dbo.RestoreScript (DatabaseName, BackupChainType, BackupType, ScriptText)
		values (@dbName, 'Full+Log', 'Full', @sqlNoRecovery);
 
		insert Admin.dbo.RestoreScript (DatabaseName, BackupChainType, BackupType, ScriptText)
		values (@dbName, 'Full+Diff+Log', 'Full', @sqlNoRecovery);
	end
 
	fetch next from dbCsr into @dbName, @recoveryModel;
end
close dbCsr;
deallocate dbCsr;

Set an appropriate schedule on the job. For me, once per day works fine.

The Log Backup Job

Now create another job called “Backup all databases – LOG” and add this step:

declare	@dbName nvarchar(256),
		@backupTime nvarchar(100),
		@fileName nvarchar(255),
		@sql nvarchar(4000),
		@destinationFolder nvarchar(256);
 
-- Set this to the appropriate backup folder.  ** IMPORTANT ** It must end with a backslash!
select @destinationFolder = 'S:\SQL\Backups\Log\';
 
declare dbCsr cursor for
select db.name from sys.databases db
cross apply (select top 1 * from msdb.dbo.backupset bs where bs.database_name = db.name and bs.type = 'D' order by bs.backup_finish_date desc) bs
where db.recovery_model_desc != 'SIMPLE'
and db.name not in (select DatabaseName from Admin.dbo.DatabasesExcludedFromBackup)
and db.source_database_id is null;
 
open dbCsr;
fetch next from dbCsr into @dbName;
while @@FETCH_STATUS = 0
begin
	select	@backupTime = replace(replace(replace(CONVERT(nvarchar(100), getdate(), 120), '-', ''), ':', ''), ' ', '_');
	select	@fileName = @destinationFolder + @dbName	+ '.' + @backupTime + '.trn'
	select	@sql = 'backup log [' + @dbName + '] to disk = ''S:\Backup\SQL2014\Log\' + 
		@dbName	+ '.' + @backupTime + '.trn'' with init, compression';
	exec(@sql);
 
	select @sql = 'restore log [' + @dbName + '] from disk = ''' + @fileName + ''' with norecovery;';
 
	insert Admin.dbo.RestoreScript (DatabaseName, BackupChainType, BackupType, ScriptText)
	values (@dbName, 'Full+Log', 'Log', @sql);
 
	insert Admin.dbo.RestoreScript (DatabaseName, BackupChainType, BackupType, ScriptText)
	values (@dbName, 'Full+Diff+Log', 'Log', @sql);
 
	fetch next from dbCsr into @dbName;
end
close dbCsr;
deallocate dbCsr;

Be sure to schedule the job according to your needs. I run log backups once per hour.

Additional Steps

You will want to add an extra step into your backup jobs to purge the old backup files after a few days, otherwise they will keep accumulating forever.

I also add a step to copy the files off-system because I am definitely a believer in getting backups off to another box as quickly as possible, just in case the machine blows up.

The Restore Script

The backup jobs generate restore scripts as they process the backups. You can find a list of the possible restore paths by executing:

select distinct DatabaseName, BackupChainType
from Admin.dbo.RestoreScript;

Then you can pass in a combination of DatabaseName and BackupChainType into a script such as the following:

select ScriptText
from Admin.dbo.RestoreScript
where DatabaseName = 'Admin'
and BackupChainType = 'Full+Log'
order by TimeGenerated asc;

This will output the commands to restore the database to the last log backup. You will need to recover the database manually, and you easily select which logs to restore or add a STOPAT clause to recover to a point in time.

I even have a small SSIS package which runs a Foreach Loop and extract each restore script out to a text file. The few times I have needed to use it, these pre-generated scripts have made the restores trivially simple to execute.

Potential Enhancements

Even though the script tables allow for differential backups in the mix, there’s nothing here that really uses differentials. For my purposes, that’s overkill, but you can certainly add a “Backup all databases – DIFF’ and include logic to generate appropriate restore scripts.

First Impressions of anonymize()

In a recent previous post, I gave a method that I have used to create test data. As I was playing with the new anonymize() function introduced in SQL Server 2016 RC0, it occurred to me that it could be used for similar purpose.

The Books Online page for anonymize() is somewhat skimpy as of this writing, but there’s enough to figure out a the general idea. The function always require a column as input, and this seems to be the case so that the function can sample the data in the column to determine the range of output values, particularly with numeric data types. However, with fewer than three rows, there just doesn’t seem to be enough to work with, and anonymize() will simply output the row values.

declare @t table (FirstName nvarchar(25), LastName nvarchar(25), Age int, DateOfBirth date);
insert @t (FirstName, LastName, Age, DateOfBirth)
values	('Mark', 'Smith', 25, '1990-09-01'),
	('Emily', 'Jones', 23, '1992-12-31');
 
select	anonymize(FirstName) as FirstName,
	anonymize(LastName) as LastName,
	anonymize(Age) as Age,
	anonymize(DateOfBirth) as DateOfBirth
from @t;

anonymize_2rows

But when we add a third row, the magic starts to happen. I’ll run this statement twice in a row:

declare @t table (FirstName nvarchar(25), LastName nvarchar(25), Age int, DateOfBirth date);
insert @t (FirstName, LastName, Age, DateOfBirth)
values	('Mark', 'Smith', 25, '1990-09-01'),
	('Emily', 'Jones', 23, '1992-12-31'),
	('Sara', 'Brown', 31, '1985-02-22');
 
select	anonymize(FirstName) as FirstName,
	anonymize(LastName),
	anonymize(Age) as Age,
	anonymize(DateOfBirth) as DateOfBirth
from @t;

anonymize_3rows_run1

anonymize_3rows_run2

So, a couple of observations. First, the ages that are displayed are in the range of the input data. That is, the input in the age column are between 23 and 31 years old, and the output are similarly from 24 to 28 in the first result, and from 25 to 30 in the second. Clearly, SQL Server is reading the input column and basing the anonymized output on the data in there. Similarly, the dates output are in the ranges of the input column. However, there is no relationship between Age and DateOfBirth, which makes sense since SQL Server has nothing to tie the two columns together. (Books Online makes it clear, by the way, that anonymize() will not work on computed columns.)

But the real magic happens with the names columns (both first and last). Notice that the function output values are in no way derived from the inputs. I have to admit to being quite impressed that with only three rows, SQL Server can interpret that these columns contain name data. And no, it not based on the column names; when I rename the columns to c1 and c2, the above example still works fine.

Perhaps more impressive is that it can detect and generate addresses as well, even using obviously fake address data as input.

declare @t table (Address nvarchar(100), City nvarchar(40), State nvarchar(5));
insert @t (Address, City, State)
values ('1234 Main St', 'Anytown', 'PA'),
	('987 E 1st Ave', 'Podunkville', 'KS'),
	('456 S 654 W', 'By the Sea', 'CA');
 
select	anonymize(Address),
	anonymize(City),
	anonymize(State)
from @t;

anonymize_address

As with many other new features, the anonymize() function comes with a host of limitations. As previously mentioned, it doesn’t work on computed columns. As you would expect, you cannot use it in a WHERE clause, and the function is incompatible with most other new SQL Server features, including Always Encrypted, in-memory tables and temporal tables. Interesting things also happen when you use anonymize() with a TOP clause.

declare @t table (FirstName nvarchar(25), LastName nvarchar(25), Age int, DateOfBirth date);
insert @t (FirstName, LastName, Age, DateOfBirth)
values	('Mark', 'Smith', 25, '1990-09-01'),
	('Emily', 'Jones', 23, '1992-12-31'),
	('Sara', 'Brown', 31, '1985-02-22');
 
select	top 1 anonymize(FirstName) as FirstName,
	anonymize(LastName) as LastName,
	anonymize(DateOfBirth) as DateOfBirth

anonymize_top