Cartometric Blog

A scrapbook of GIS tricks, with emphasis on FOSS4G.

Prepare a Shapefile for OpenScales using ogr2ogr and PostGREsql

with one comment

This post explains how to import GIS data (a shapefile, in this case) into a database (PostGREsql) so it can be consumed by most any mapping API. I have OpenScales in mind, but this approach will support any mapping app with functions for rendering feature overlays using geodetic coordinates (i.e Longitude and Latitude). In many cases, you’ll need to translate your feature data out of a projected/cartesian system and into a geodetic/spherical system; so I’ll include a demonstration of that.

Quick and Dirty Summary

This approach has two parts. First, we’ll use GDAL’s ogr2ogr utility to import a shapefile into our database. Second, we’ll use a few SQL commands to translate our data from a projected to a geodetic system, as well as optimize the table for fast query speeds.

Prerequisites

The following prerequisites will need to be met in order to follow along:

1) GDAL is installed. If you need to install GDAL, check out my earlier post titled Install GDAL on Windows. Alternatively, you could install FWTools, which is admittedly easier, but that package is no longer maintained and it’s becoming out-of-date as GDAL/OGR continues to evolve.

2) PostGREsql is installed, and the PostGIS extension is enabled. If you need to install PostGREsql and PostGIS, check out the tutorial at Boston GIS demonstrating how to acquire and install PostGREsql with PostGIS. It won’t hurt to reveiw their entire tutorial, but I deviate from their approach once installation is complete (look for their sub-heading, Loading GIS Data Into the Database).

3) You have some geodata. I think the typical reader will have their own shapefile, pesonal geodatabase, or otherwise, but if you need something to follow along, here’s a US States shapefile projected to NAD83 Albers Equal Area Conic:

http://www.cartometric.com/blog/wp-content/uploads/data/usstates_nad83_aeac.zip

Loading Geodata into PostGREsql / PostGIS

To push shapefile data into your geodatabase, you can run an ogr2ogr script like this:

ogr2ogr -f “PostGreSQL” PG:”host=127.0.0.1 user=youruser dbname=yourdb password=yourpass” “E:\4_GIS\01_tutorials\usstates_nad83_aeac\usstates_albers.shp” -nln usstates -nlt geometry

For deeper reading on ogr2ogr utility flags (like -nln and nlt), check out the usage notes for ogr2ogr. Also, it may be worth you while to peruse the OGR PostgreSQL driver page, as well as the Advanced Driver Information page. In the meantime, here are a few quick notes regarding my script:

-f “PostGreSQL” PG:”host=127.0.0.1 user=youruser dbname=yourdb password=yourpass”  This tells OGR you’re exporting to PostGreSQL with the following connection string. Notice that my connection string is wrapped in double-quotes (“).

“E:\4_GIS\01_tutorials\usstates_nad83_aeac\usstates_albers.shp”   This is the path to my shapefile  input data. Once again, I wrapped this value in double-quotes (“). I do this to prevent the console from introducing linebreaks into the argument value and confusing the parser.

-nln usstates  The -nln flag means “rename the table on export”. In other words, my PostGREsql db will get a new table named usstates, and not one named usstates_albers.

-nlt geometry  This one’s particularly important for polygon data. It tells OGR “accept any geometry you encounter and store it in the feature’s geometry column”. Oftentimes, a polygon dataset will have polygons and multipolygons in the same table. For example here’s a narrow column of Well Known Text (WKT) geometries from the albers shapefile so you can see what I mean:

WKT;STATE_NAME;STATE_FIPS;STATE_ABBR…..
MULTIPOLYGON (((-1827806.2165497246 1227…..
POLYGON ((-1148108.0134327484 649421.311…..
MULTIPOLYGON (((1949555.0881544715 75264…..
POLYGON ((-199237.01920416657 704421.540…..
POLYGON ((-519870.38897088548 372373.616…..

If you run the ogr2ogr script noted above without -nlt geometry, you’ll get an error like this:

ERROR 1: Terminating translation prematurely after failed
translation of layer usstates_albers (use -skipfailures to skip errors)

By default, OGR refuses to mix geometry types in a table, so -nlt geometry allows you to duck that requirement and store both Polygon and Multipolygon features in the same table. You could optionally instruct OGR to “explode” Multipolygons into individual Polygons using the -explodecollections flag, as depicted in the following screenshot, but I don’t recommending that solution for the intended use case. For example, if a map user clicks on Michigan’s Upper Penenssula, I want the whole state to be selected, not just the UP. I’m not saying you can’t make that happen after exploding multifeatures; rather, it’s just not the approach I favor.

Without -nlt geometry, ogr2ogr will throw an error if it attempts to export polygons and multipolygons into the same table. Alternatively, you can use the flag -explodecollections (not recommended in this case) to translate your multipolygons into several polygons.

Assuming you used the script like the one I initially provided, you should be able to open pgAdmin III (the PostGREsql admin GUI that insalls with the database) and see your new usstates table:

Post-Processing your Geodata with SQL Instructions

With pgAdmin III open, expand the Tools menu and launch the Query tool. You’ll use the Query tool and the following SQL instructions to prep your data for production. I’ll start by listing all the queries together, then I’ll provide some deeper explaination in the text that follows.

SELECT SRID(wkb_geometry) FROM usstates;
SELECT * FROM spatial_ref_sys WHERE srid = 900925;
SELECT st_asText(wkb_geometry) FROM usstates;
ALTER TABLE usstates ADD COLUMN wgs84geom GEOMETRY;
UPDATE usstates SET wgs84geom = st_Transform(wkb_geometry, 4326); 
SELECT SRID(wgs84geom) FROM usstates;
SELECT st_asText(wgs84geom) FROM usstates;
VACUUM usstates;
CREATE INDEX usstates_wgs84_idx ON usstates USING GIST(wgs84geom);

Basically, the steps emphasized in blue do the actual work, while the steps in black are more for sanity checks. Steps 4 and 5 perform the geometry transformation, and the last two steps do some house-cleaning and table optimization. Now I’ll provide a one-by-one discussion of each step.

1) SELECT SRID(wkb_geometry) FROM usstates;

Here we’re getting the SRID for the features in this layer (which is 900925 on my system). By default, OGR will store feature geometries in a field called wkb_geometry. Also, your PostGIS installation includes a table named spatial_ref_sys that stores coordinate system definitions necessary for the database to remain “spatially aware” of your new table as well as the other spatial datasets the system is managing. Consider this, if you want to select points from one layer that fall inside polygons from another layer, PostGIS needs to understand the coordinate systems for both datasets so that it can align their features for analysis. So when we run the SRID() function on the table’s geometry field, wkb_geometry, PostGREsql will return the unique identifier for the coordinate system used to define the features in our table.

2) SELECT * FROM spatial_ref_sys WHERE srid = 900925;

In this step we answer the question, “Does the SRS established at import makes sense for the data?” This statement queries the PostGIS spatial_ref_sys table for the coordinate system whose ID is referenced in the previous step. Check the srText field for a readable version of the coordinate system. Mine begins with “PROJCS["North_America_Albers_Equal_Area_Coni.."  That's what I expected, and that's a good thing.

3) SELECT st_asText(wkb_geometry) FROM usstates;

Now I like to do a quick query to see the WKT for some of my features. The geometry field wkb_geometry was created by ogr2ogr when it imported the shapefile into PostGREsql. If you don't like this name, you can use the creation option -lco GEOMETRY_NAME=geom in your ogr2ogr import script to set the name of the geometry field at import time. As shown in the image, the WKT for my features looks like I would expect.

Querying feature geometries using the ST_asText() function on the geometry column, wkb_geometry.

4) ALTER TABLE usstates ADD COLUMN wgs84geom GEOMETRY;

This instruction adds a new column to the table, which I’ll use to store the feature geometries for my US States in geodetic coordinates. The column will be named wgs84geom and will expect data of type GEOMETRY. In other words, this field will store a permanent “cast” of our feature geometries in the WGS84 coordinate system, which is very popular due to its use by the famous Global Positioning System (GPS).

Note: GIS coordinate systems are complex beasts, and it’s easy to get lost in their particulars. Nevertheless, one distinction is very important, and that’s the difference between projected systems and geodetic systems. Projected systems are two-dimensional. —these are the X/Y grids you used for trigonometry exercises in High School. On the other hand, geodetic systems define coordinate geometries within a three-dimensional, spherical space.

Both systems are roses by many names. For instance, projected systems may be called “Cartesian” or “Geometric”. And Geodetic systems may be called “Sexigesimal” or “Geographic”. The PostGIS community may more often refer to features as being geometry, or geography data and mean projected vs. geodetic coordinates, respectively. So if you run across language like this, realize people intend for geometry to imply X and Y coordinates in a cartesian space, and for geography to mean familiar longitude and latitude coordinates.

In this image, state boundaries are drawn in Albers Equal Area Conic, which is a projected coordinate system.

 

Here the state boundaries are "represented" in NAD83 geographic coordinates. NAD83 geographic (EPSG 4269), is similar, if not nearly identical, to WGS84. As you can see, geodetic systems are difficult to represent in 2D space; as such, unprojected maps of large areas tend to look "suspect".

5) UPDATE usstates SET wgs84geom = st_Transform(wkb_geometry, 4326); 

With the new column ready to go, you can now wield an UPDATE statement and the st_Transform() function to translate feature geometries from their projected coordinates to their geographic WGS84 coordinates. The st_Transform() function expects two arguments, the source geometry field to transform, and the EPSG code for the output coordinate system. WGS84 is a fundamentally-popular coordinate system, and it’s EPSG code of 4326 is easy to find. If you do not know the EPSG code for your preferred coordinate system, head over to http://spatialreference.org/ and do some quick research. 

Note: We could optionally perform any coordinate system transformations in our queries by calling st_Transform on the geometry field right in the query. However, by casting our feature geometries in advance, we remove calculation overhead and get a subtle efficiency gain. This can particularly improve response times for spatial queries.

6) SELECT SRID(wgs84geom) FROM usstates;

Like the second step, this query is only intended to confirm whether the SRID established in the previous step makes sense for the data. It should return 4326.

7) SELECT st_asText(wgs84geom) FROM usstates;

Also like the third step, here I’m querying the feature geometrires as WKT to make sure they’re defined by longitude/latitude coordinate pairs.

Well Known Text (WKT) for features transformed into the WGS84 coordinate system.

8) VACUUM usstates;

Once you’re finished with the geometry transformation, call a VACUUM instruction for the usstates table. PostGREsql likes to have a deep knowledge of its feature tables so that it can optimize queries. To this end, the VACUUM command instructs PostGREsql to “gather fresh intel” on your table so that it can make better decisions. This step is particularly necessary for tables with a large number of features as well as tables experiencing a lot of maintenance (i.e. frequent feature INSERT and UPDATE activity).

9) CREATE INDEX usstates_wgs84_idx ON usstates USING GIST(wgs84geom);

Finally —if you intend to perform queries on this table, particularly spatial intersection queries on the new geometry column, you’ll want to create a spatial index referencing that column. Here, usstates_wgs84_idx is just a naming convention that implies TableName_FieldName_ThisIsAnIndex. To create an index, call the GIST() function on a table and pass in the table column you intend to search on —for instance, wgs84geom.

After following along with this post, you should have learned how to 1) use ogr2ogr to populate a PostGREsql database with shapefile data, 2) leverage PostGIS functions to perform a coordinate system transformation in the database, and 3) apply PostGREsql optimization functions to optimize the table for production use.

I hope you found this beneficial. Thanks for reading.

/Elijah

Written by elrobis

November 19th, 2011 at 5:55 pm

Posted in GDAL/OGR

Tagged with , , ,

One Response to 'Prepare a Shapefile for OpenScales using ogr2ogr and PostGREsql'

Subscribe to comments with RSS

  1. Visited yuor web blog through Reddit. You already know I am subscribing to your feed.

    Tyson

    26 Dec 11 at 7:03 pm

Leave a Reply