Fixing invalid SQL geography data

by Grank July 19, 2009 13:16

So you have all this spatial data, and you think "what a great time to use the new geospatial features in SQL Server 2008!"  And you're right to think that, because they're awesome and handy and save you writing a lot of arithmetic and optimization yourself.  (Plus the Spatial Results visualizer in Management Studio is just rad.)

However, to be a valid SQLGeography instance, it's pretty picky.  There are lots of little rules that apply to the geography type that don't apply to the geometry type.  That's mostly because the whole point of the geography type is that it takes the curvature of the earth into account, right?  If you think about it, when you're doing two-dimensional geometry, you have a single flat plane that goes on infinitely in any direction and will never touch itself (tee hee) no matter how far you go, whereas when you're doing three-dimensional sphereoid geometry, that's not the case, it wraps and you reach your start point again.  And that leads to all sorts of unfortunate little side effects.

Polygons are of particular issue.  To illustrate why, there's an old joke that goes something like this:

A manager, an engineer, and a software developer are out for a walk through the countryside.  As they pass a ranch, an elderly rancher calls out to them and asks them for a hand.  He tells them that a tornado blew his fence away and his farmhands all packed up and left.  He's gotta build a new fence to keep his cattle in, but he's old and tired and the storm didn't leave much raw material behind.
The manager speaks first, saying "you should just try to build the old fence along the path of the old fence.  That'll be easiest, because you won't have to think or worry too much about it, you don't have to dig new post holes, and you can probably use a lot of the stuff that's still there.  You know if you replace the fence you had, it'll be enough area and land for the cattle."
The engineer pipes up then and goes "That's an awful plan!  Those posts aren't sturdy anymore, and there isn't enough stuff left to build that shape and size of fence again.  You should put up a smaller new fence just encircling the pasture you need, that'll save you some building materials.  Some of your cattle probably died in the tornado or won't be found again, so if you lose some area, that's not so bad, and you can cultivate the corners of your land outside the new fence anyway."
The rancher listened intently to the two men while they debated, and then noticed the developer has been silent, staring off looking bored and impatient.  "What about you there?  What do you think is the easiest way to build a fence around my cattle?"
The developer rolls his eyes, thinks for a moment, picks up a nearby strand of barbed wire which he twists into a hoop and steps inside, holding it up at waist height. He looks in their confused faces and says "There, done.  I declare myself to be outside.  See you guys later."

So when you draw a closed figure on a sphere, how do you know if you want the shape inside the line or outside the line?  (It's like that lame "is it two faces or one vase?" illusion.)  SQL uses the order in which the points are specified, called "ring orientation", as an indicator of that.  If you get it wrong, your polygon won't instantiate, because it doesn't allow polygons larger than a hemisphere (it screws up all the math, and why would you need it?).

Even if you don't have this problem, you can run into all sorts of other issues.  Polygons that intersect themselves or have rays are no good.  Linestrings that have segments exactly on top of each other (like if you had four points all along the same straight line but put them in a different order) or have points duplicated can be a problem.  Too many little rules that I don't know, actually.

"But I already have all these points, how can I fix this data automatically", you ask?

Well there are fortunately ways around it.  Geometry will let you get away with a lot of the things that Geography will not, so your first step is to see if you can create a valid geometry instance.  If so, then you're probably in luck! 

Geometry will accept shapes that the OGC standards consider invalid, and then has a MakeValid() method you can call that will return a proper geometry definition considered valid.  For instance, if you had a polygon that had one ray out randomly, that won't be valid, but you can create a geometry instance from it, and then if you call MakeValid() it'll return a valid polygon and a valid linestring to make up that figure.  MakeValid() will solve most of the problems you might encounter with your data, except for ring orientation.  

So how do we deal with ring orientation?  Well, the lame thing is that if you already have a valid polygon, and you call any transforming OGC method on it, SQL Server will actually detect incorrect ring orientation and fix it as it goes.  I don't know why it won't do that while creating it, but we can still capitalize on this behavior.  What you do really depends on what the lowest impact operation will be.  If you already have a geometry polygon or a list of coordinates, the suggestion from the web is to call STUnion() on the polygon and its first point (which obviously won't do anything other than fix the ring orientation).  If you're creating the polygon from a string parameter, there might be a better method to call than to have to instantiate the polygon more than once if you're in-line, but I haven't really gone through them to see which ones would work and which ones wouldn't.

So here's a quick UDF definition that you may find makes your polygon problems disappear:

CREATE FUNCTION dbo.CreateGeographyPolygon
    (@CoordinateString varchar(max), @srid int)
RETURNS geography
AS
BEGIN
    DECLARE @geometryPolygon geometry
   
    -- create as geometry, and call MakeValid
    SET @geometryPolygon = GEOMETRY::STGeomFromText('POLYGON((' + @CoordinateString + '))', @srid).MakeValid()
   
    -- do a useless union to enforce ring orientation and return as geography
    RETURN GEOGRAPHY::STGeomFromWKB(@geometryPolygon.STUnion(@geometryPolygon.STStartPoint()).STAsBinary(), @srid)
END

You may still have some trouble, in which case I have one other possible piece of advice for you.  Some shapes that will return as valid in geometry are invalid in geography because the vertices might shift slightly when translating into 3D.  I'm not sure why that would be, but this happened to me with a few LINESTRINGs.  These problems seem to lie in areas where there are large clusters of points, and therefore if you use the Reduce() function to simplify the geometry a little, these issues might go away (they did for me).  I try to avoid doing this unless absolutely neccessary as it does eliminate some vertices and shift other ones, so you're not dealing with the same data as when you started.  However, if that's not a problem for you, you can just add a call to Reduce() into that return statement somewhere behind the STAsBinary() call.  Probably add a parameter to pass in the value as you'll want to experiment with it and use the lowest one possible...  Start at something very low like 0.0001 and work up fom there.

If after all that you're still having trouble, then I have no other knowledge to impart.

Tags: , , , , , ,

Comments

Add comment


(Will show your Gravatar icon)

biuquote
  • Comment
  • Preview
Loading



Powered by BlogEngine.NET 1.5.0.7
Theme by Mads Kristensen | Modified by Mooglegiant

About The Author

I'm a software developer and musician in Edmonton, AB.  I write mostly web-based software, primarily on the Microsoft stack.  I have an MCPD and several MCTS, but I've only been at this whole developer thing for a few years, and the truth is that I'm still learning more than knowing.  So these are my adventures and experiments and some of it will probably be blatantly wrong...  Just warning ya.