Category Archives: Uncategorized

Why TF3604?

I continue to be surprised at how few people in the SQL Server community “get” my username.

About a year ago, in mid-August 2015, I was trying to figure a Twitter username to use as a professional account. On a lark, I searched to see if “tf3604” was available, figuring that surely someone had already appropriated it. To my surprise it was available, so I grabbed it.

A few weeks passed before it even occurred to me to wonder if “tf3604.com” were available. Surely, surely that was in use somewhere, maybe even for non-SQL Server purposes. But nope, it was still out there. Even then, I hesitated for a day or two before buying the domain. That was September 17, 2015.

So why TF3604?

There are many undocumented commands and features in SQL Server that require trace flag 3604 to be enabled in order to produce useful output. For instance, DBCC PAGE will run fine but appear to do nothing without TF3604 turned on, but this trace flag tells SQL Server to route the output to the client. In SSMS, this output is then displayed on the Messages tab.

Virtually all of these undocumented commands and features that pair with TF3604 deal with SQL Server internals.

I am fascinated with SQL Server internals. To me, the name TF3604 relates to shining a light on these internals.

So here we are, a year after picking this username on a whim. And I still love the name.

United States Geographical Data: ZIP Codes

In a previous post, I gave a script to load state and county geographical data into SQL Server. Here, I extend that data to include ZIP code boundaries.

One thing that I have recently learned is that ZIP code data is really a collection of points, in particular, specific places where mail is delivered. These points can be abstracted in a series of lines along a delivery route, for instance, houses on a residential street. However, ZIP code geographic data is, strictly speaking, not a geographic area. It is possible to draw boundaries around the ZIP code delivery points, but these boundaries are only approximations, and different sources will draw the boundaries in different ways.

More specifically, what I present here are known as ZIP code tabulation areas (ZCTAs) used by the Census Bureau to approximate ZIP codes for census purposes. Here is source of the data.

One thing you may note is that the ZCTAs don’t cover all the possible area of the United States. For instance, large lakes and wilderness areas tend to not be included in any ZCTA.

As with the state and county data, the ZIP code boundaries can be loaded into a table with this definition:

CREATE TABLE [dbo].[PoliticalRegion](
	[PoliticalRegionId] [int] NOT NULL IDENTITY(1,1),
	[RegionName] [nvarchar](255) NOT NULL,
	[Type] [nvarchar] (30) NOT NULL,
	[ParentRegionName] [nvarchar] (255) NULL,
	[Boundaries] [geography] NOT NULL
);

Here is the ZIP code data. The file is quite large (41 MB). One way that I been able to get this data to load is via the command line:

sqlcmd -S serverName -d databaseName -E -i ZipCodeInserts.sql

First Impressions of anonymize()

In a recent previous post, I gave a method that I have used to create test data. As I was playing with the new anonymize() function introduced in SQL Server 2016 RC0, it occurred to me that it could be used for similar purpose.

The Books Online page for anonymize() is somewhat skimpy as of this writing, but there’s enough to figure out a the general idea. The function always require a column as input, and this seems to be the case so that the function can sample the data in the column to determine the range of output values, particularly with numeric data types. However, with fewer than three rows, there just doesn’t seem to be enough to work with, and anonymize() will simply output the row values.

declare @t table (FirstName nvarchar(25), LastName nvarchar(25), Age int, DateOfBirth date);
insert @t (FirstName, LastName, Age, DateOfBirth)
values	('Mark', 'Smith', 25, '1990-09-01'),
	('Emily', 'Jones', 23, '1992-12-31');
 
select	anonymize(FirstName) as FirstName,
	anonymize(LastName) as LastName,
	anonymize(Age) as Age,
	anonymize(DateOfBirth) as DateOfBirth
from @t;

anonymize_2rows

But when we add a third row, the magic starts to happen. I’ll run this statement twice in a row:

declare @t table (FirstName nvarchar(25), LastName nvarchar(25), Age int, DateOfBirth date);
insert @t (FirstName, LastName, Age, DateOfBirth)
values	('Mark', 'Smith', 25, '1990-09-01'),
	('Emily', 'Jones', 23, '1992-12-31'),
	('Sara', 'Brown', 31, '1985-02-22');
 
select	anonymize(FirstName) as FirstName,
	anonymize(LastName),
	anonymize(Age) as Age,
	anonymize(DateOfBirth) as DateOfBirth
from @t;

anonymize_3rows_run1

anonymize_3rows_run2

So, a couple of observations. First, the ages that are displayed are in the range of the input data. That is, the input in the age column are between 23 and 31 years old, and the output are similarly from 24 to 28 in the first result, and from 25 to 30 in the second. Clearly, SQL Server is reading the input column and basing the anonymized output on the data in there. Similarly, the dates output are in the ranges of the input column. However, there is no relationship between Age and DateOfBirth, which makes sense since SQL Server has nothing to tie the two columns together. (Books Online makes it clear, by the way, that anonymize() will not work on computed columns.)

But the real magic happens with the names columns (both first and last). Notice that the function output values are in no way derived from the inputs. I have to admit to being quite impressed that with only three rows, SQL Server can interpret that these columns contain name data. And no, it not based on the column names; when I rename the columns to c1 and c2, the above example still works fine.

Perhaps more impressive is that it can detect and generate addresses as well, even using obviously fake address data as input.

declare @t table (Address nvarchar(100), City nvarchar(40), State nvarchar(5));
insert @t (Address, City, State)
values ('1234 Main St', 'Anytown', 'PA'),
	('987 E 1st Ave', 'Podunkville', 'KS'),
	('456 S 654 W', 'By the Sea', 'CA');
 
select	anonymize(Address),
	anonymize(City),
	anonymize(State)
from @t;

anonymize_address

As with many other new features, the anonymize() function comes with a host of limitations. As previously mentioned, it doesn’t work on computed columns. As you would expect, you cannot use it in a WHERE clause, and the function is incompatible with most other new SQL Server features, including Always Encrypted, in-memory tables and temporal tables. Interesting things also happen when you use anonymize() with a TOP clause.

declare @t table (FirstName nvarchar(25), LastName nvarchar(25), Age int, DateOfBirth date);
insert @t (FirstName, LastName, Age, DateOfBirth)
values	('Mark', 'Smith', 25, '1990-09-01'),
	('Emily', 'Jones', 23, '1992-12-31'),
	('Sara', 'Brown', 31, '1985-02-22');
 
select	top 1 anonymize(FirstName) as FirstName,
	anonymize(LastName) as LastName,
	anonymize(DateOfBirth) as DateOfBirth

anonymize_top