Line Simplification

The Douglas-Peucker line simplification algorithm is a wonderful example of a computational geometry approach to solving a GIS question. How can a complex line segment be represented using less vertices? Check out Mike Bostock's awesome D3 implementation of the algorithm to see what the simplification looks like in practice.

The algorithm functions by:

  • Define some ϵ\epsilon as the maximum distance a line can be deformed by removing a vertex.
  • Add the start and stop vertices to the set of vertices to be included (if we remove the start and stop, we change the spatial extent!)

Our initial starting line.

The black dashed line connects the start and stop vertices. The red dashed line is the distance from the black dashed line to the farthest point. For this example, assume that distance(A)>ϵdistance(A) > \epsilon.

  • If the farthest point from the line connecting the start and finish is closer than ϵ\epsilon, we can remove all other points. If the vertex is further than epsilon, we repeat the above process recursively. This means that we draw a new line connecting the start with the most distant point and then the most distance point to the stop.

The light blue line connects the start point with the most distant point (A). We then compute the distance between the light blue line and B. This step should be familiar (this is the recursive part, we keep applying the same operations over and over again). For this example, we assume that distance(B)<ϵdistance(B) < \epsilon and so B is removed. We then repeat the process with a line connecting point A and the stop.

  • Recirsively remove vertices.

The light blue line now connects A with the end node and the furthest point, C is identified. distance(C)>ϵdistance(C) > \epsilon so C is kept.

The light blue line now connects C with the end node and the furthest point, D is identified. distance(D)>ϵdistance(D) > \epsilon so D is kept.

The light blue line now connects C with D and the furthest point, E is identified. distance(E)<ϵdistance(E) < \epsilon so E is removed.

The final line is show (with the path bold). Points B and E were removed as their removal was within the ϵ\epsilon constraint.

A question

Would the algorithm change if you altered the method by which distance was computed?