United States geographic data: States and counties

I have been interested in SQL Server’s geography data type since its implementation beginning in SQL Server 2008. I store most of my GPS data (primarily track data from my travels) in a SQL database. While it nice to be able query against the data and have the visualizer display a track, that track lacks a lot of visual context about where it actually is.

I have long wanted to obtain basic shape data of states and counties so that I could select against that data in addition to my track data to provide just that context. However, I had difficulty getting the shape data in SQL format and so would quickly abandon the idea. A few months ago I bit the bullet and converted some publicly accessible sources. I believe that I obtained the state-level from this link, and the county data from here.  I had to write a program to extract the data from the KML and convert it to a format that could be loaded into SQL Server.

The purpose of this post is to share that data in SQL Server format. The attached files contain insert statements to add geographic data into tables.

There are some slight anomalies in the data between the county-level data and state-level data.  For instance, Miami County, Kansas, sits on the far eastern edge of the state and borders with Missouri.  If you select the data for Miami County and union it with the data for Missouri, you will observe a slight overlap.

declare @miamiCounty geography = (select Boundaries from PoliticalRegion where RegionName = 'Miami' and ParentRegionName = 'Kansas' and Type = 'County');
declare @missouriState geography = (select Boundaries from PoliticalRegion where RegionName = 'Missouri' and Type = 'USState');
 
select @miamiCounty
union all
select @missouriState;

This can also be seen by performing an intersection operation on the data:

declare @miamiCounty geography = (select Boundaries from PoliticalRegion where RegionName = 'Miami' and ParentRegionName = 'Kansas' and Type = 'County');
declare @missouriState geography = (select Boundaries from PoliticalRegion where RegionName = 'Missouri' and Type = 'USState');
 
select @miamiCounty.STIntersection(@missouriState).STArea();

This indicates an overlap of 9,145,460 square meters, or about 3.5 square miles.

Also, I observe that some states just have some strange things in their boundaries.  For instance, the data seems to imply that the Fox River and Lake Winnebago in Wisconsin is not actually part of Wisconsin.  Perhaps there is some reason for the data being like this, but I claim ignorance on that subject.

Regardless, the data is more than adequate for my purposes.  For instance, about a year ago I drove from Kansas City to Las Vegas. When I select the raw data I can generate this visualization:

VegasTripWithoutStates

(Sorry for the lightness of the image; click on it to enlarge.) We have the latitude and longitude lines as reference points. Only because I know what the data represents, I can tell that Kansas City is at the right end of the line, and Las Vegas is at the left end of the line, and that since I stayed the night in Denver, that represents the point in the middle where the line changes color. Without this previous knowledge, however, it would be difficult to place the line.

When I add state data into the query (via a union), the picture becomes much clearer:

VegasTripWithStates

Here is the data. It is written as a series of inserts that should go into a table defined as:

CREATE TABLE [dbo].[PoliticalRegion](
	[PoliticalRegionId] [int] NOT NULL IDENTITY(1,1),
	[RegionName] [nvarchar](255) NOT NULL,
	[Type] [nvarchar] (30) NOT NULL,
	[ParentRegionName] [nvarchar] (255) NULL,
	[Boundaries] [geography] NOT NULL
);

Now, here is a problem that you are likely to run into. The insert script is about 41 MB in size, and even on machines with plenty of horsepower I run into problems getting a file that large to load into SSMS and execute. A workaround that has been successful for me is to load the script via sqlcmd. For example, assuming you are using integrated security, a command like the following works for me:

sqlcmd -S serverName -d databaseName -E -i PoliticalRegionInserts.sql

I have tested this successfuly on both SQL Server 2008 and 2014.

Note: The state-level data includes the 50 states plus the District of Columbia but does not include any territories. The county-level data includes the 50 states, the District of Columbia, plus Puerto Rico.

One thought on “United States geographic data: States and counties”

Leave a Reply

Your email address will not be published. Required fields are marked *