United States Geographical Data: ZIP Codes

In a previous post, I gave a script to load state and county geographical data into SQL Server. Here, I extend that data to include ZIP code boundaries.

One thing that I have recently learned is that ZIP code data is really a collection of points, in particular, specific places where mail is delivered. These points can be abstracted in a series of lines along a delivery route, for instance, houses on a residential street. However, ZIP code geographic data is, strictly speaking, not a geographic area. It is possible to draw boundaries around the ZIP code delivery points, but these boundaries are only approximations, and different sources will draw the boundaries in different ways.

More specifically, what I present here are known as ZIP code tabulation areas (ZCTAs) used by the Census Bureau to approximate ZIP codes for census purposes. Here is source of the data.

One thing you may note is that the ZCTAs don’t cover all the possible area of the United States. For instance, large lakes and wilderness areas tend to not be included in any ZCTA.

As with the state and county data, the ZIP code boundaries can be loaded into a table with this definition:

CREATE TABLE [dbo].[PoliticalRegion](
	[PoliticalRegionId] [int] NOT NULL IDENTITY(1,1),
	[RegionName] [nvarchar](255) NOT NULL,
	[Type] [nvarchar] (30) NOT NULL,
	[ParentRegionName] [nvarchar] (255) NULL,
	[Boundaries] [geography] NOT NULL
);

Here is the ZIP code data. The file is quite large (41 MB). One way that I been able to get this data to load is via the command line:

sqlcmd -S serverName -d databaseName -E -i ZipCodeInserts.sql

What Records Are Included in a Transaction Log Backup?

I had an interesting situation come up recently, and it boiled down to this. Suppose I start a transaction log backup at 10:00:00 and it runs for one minute, until 10:01:00. Is a transaction that happened at 10:00:30 included in the backup? Let’s do an experiment.

I have a sample database containing customers and orders, and an application that fires up a bunch of threads and inserts random data into the database. So I kicked off the application and let it run for several hours so as to generate a bunch of log records. I disabled the normal log backups for the duration of this test.

With the application still running, I then kicked off a transaction log backup, introduced a brief delay, followed by a second log backup.

backup log CorpDB to disk = 'S:\Backup\SQL2014\Log\log1.trn' with init, compression;
waitfor delay '0:00:10';
backup log CorpDB to disk = 'S:\Backup\SQL2014\Log\log2.trn' with init, compression;

Now I want to look in the header of my backup files.

restore headeronly from disk = 'S:\backup\sql2014\Log\log1.trn';
restore headeronly from disk = 'S:\backup\sql2014\Log\log2.trn';

RestoreHeader

I see that the first log backup started at 7:30:17 pm and ended at 7:31:37, and the second backup ranged from 7:31:47 to 7:31:47. Unfortunately, this is only stored with one-second resolution, but we can see that the first backup ran for about 80 seconds.

So the question now is which backup contains a transaction that occurred at, say, 7:30:30?

I’m going to load the transaction log records into a couple of temp tables.

select row_number() over (order by [Current LSN]) rn, *
into #log1
from sys.fn_dump_dblog(null, null, 'disk', 1, 'S:\backup\sql2014\Log\log1.trn',
default, default, default, default, default, default, default, default, default, default, default, 
default, default, default, default, default, default, default, default, default, default, default, 
default, default, default, default, default, default, default, default, default, default, default, 
default, default, default, default, default, default, default, default, default, default, default, 
default, default, default, default, default, default, default, default, default, default, default, 
default, default, default, default, default, default, default, default);
 
create clustered index ix1_log1 on #log1 (rn);
 
select row_number() over (order by [Current LSN]) rn, *
into #log2
from sys.fn_dump_dblog(null, null, 'disk', 1, 'S:\backup\sql2014\Log\log2.trn',
default, default, default, default, default, default, default, default, default, default, default, 
default, default, default, default, default, default, default, default, default, default, default, 
default, default, default, default, default, default, default, default, default, default, default, 
default, default, default, default, default, default, default, default, default, default, default, 
default, default, default, default, default, default, default, default, default, default, default, 
default, default, default, default, default, default, default, default);
 
create clustered index ix1_log2 on #log2 (rn);

Now let’s take a look at the last 10 records in the first log file, the first 10 records in the second log, and the last 10 records in the second log.

select 1 log_nbr, rn, Operation, Context, [Current LSN], [Transaction ID], [Begin Time], [End Time]
from #log1
where rn >= (select max(rn) from #log1) - 9
union all
select 2 log_nbr, rn, Operation, Context, [Current LSN], [Transaction ID], [Begin Time], [End Time]
from #log2
where rn <= 10
union all
select 2 log_nbr, rn, Operation, Context, [Current LSN], [Transaction ID], [Begin Time], [End Time]
from #log2
where rn >= (select max(rn) from #log2) - 9
order by log_nbr, rn;

CombinedLogRecords

The first backup contains log records through 7:30:17.873, and the second backup starts with log records a few milliseconds later at 7:30:17.887 and ends at 7:31:47.327. In other words, the second log contains the records covering the 90 seconds between the start times of the two log backups.

To answer my specific question I can also find a transaction that happened at 7:30:30 pm in the second log file:

select top 1 rn, Operation, Context, [Current LSN], [Transaction ID], [Begin Time], [End Time]
from #log2
where [Begin Time] >= '2016/04/29 19:30:30'
order by [Begin Time];

SearchForSpecificTxn

From this, I conclude that a log backup includes transactions up to approximately the BackupStartDate contained in the header, and not the BackupEndDate.

Caveat: I only tried this on SQL Server 2014 SP1 Developer Edition. It is entirely possible that other versions / service packs / editions behave differently.