Routing Faster Than Dijkstra Thanks to Contraction Hierarchies

15 min readNov 24, 2021

When I started working at AISIN 10 years ago, I thought that the problem of routing was an old and completely solved problem. I could not have been more wrong. If before it was enough to take seconds to compute a route from A to B, with static and fixed costs, it is not at all the case as of today.

A good router has now to deal with huge maps, with dynamic costs based on traffic information, and even to consider the type of motorization to compute ecological routes, within a bunch of milliseconds. This is a completely different problem, but hopefully, there was also giant leaps accomplished in the scientific community during this last decade.

Today, I would like to present one of those “novel” techniques that can bring huge acceleration of your routing time, well beyond the performance of the well known Dijkstra algorithm. But before, in this first part, I will go through a recap of this algorithm and present the concepts I will need later for the explanation of the Contraction hierarchies.

Part 1

The eponymous shortest path calculation algorithm, named after its inventor Edsger Dijkstra, is a great classic to be found on every street corner, whether it is in an algorithm book or during a computer course. Published in 1959, it can be applied to a weighted graph so as to determine the shortest path (in terms of the weight of the arcs of the graph) between a source node and any other node of the same graph.

A plethora of notes, demonstrations or illustrated examples abound on the Internet to explain its principle, to provide proof of its operation or to discuss its algorithmic complexity. I will therefore assume here that the reader has a good general knowledge of it and will content myself with a quick overall reminder, so as to go faster to the essentials of this article.

I will then explain what the limitations of this algorithm are and what can be done in practice to get around them. Later, I will switch to an in-depth explanation of the “Contraction Hierarchies”, which are a real game changer in the world of the shortest path computation.

Dijkstra and the wonderful world of dynamic programming

The Dijkstra algorithm is not only an efficient algorithm; it is possible to show that for a general weighted graph with non-negative weights, this is the best algorithm, from an asymptotic point of view. And to understand the reason for such a performance, it is useful to recall the close connection it has with dynamic programming.

Dynamic programming is an optimization method developed by Richard Bellman in the 1950s (a contemporary of Dijkstra), and which allows the simplification of a problem by its recursive decomposition into smaller structures. Once an elementary size has been reached, the optimal solution of the subproblem is trivial and it only remains to go back to build the final global solution. In other words, it is possible to zoom in on a part of a problem to solve it, without worrying about the rest, and then to reassemble each independent piece of the puzzle. This logic avoids any combinatorial explosion by limiting the complexity to the number of elementary subproblems and to their reassembly. It is therefore frequent to see a complexity of the form O(n.log n) to pop up, where n comes from the number of elementary structures and log n comes from the dichotomous decomposition of the problem.

It would, of course, be too good to be true if any problem could be solved in this way. In practice, this problem must meet certain conditions so that a procedure as effective as the dynamic programming approach can be implemented. One of these criteria to respect is to exhibit what is called an optimal substructure. This word designates a class of problems such that, for any optimal solution, a sub-part of the solution is also optimal; which allows, by definition, to solve it without worrying about the rest.

In order to shoot an example on this last sentence, imagine that you have determined the shortest path allowing going from A to B, and that this path goes through the intermediate points X1 and X2. Let us call this intermediate path i. Then it appears that the shortest path going from X1 to X2 also identifies itself with i. If this were not the case (let’s call this hypothetical better path i’), then there would also be a better path going from A to B and going through i’ instead of i. This is how one can see that finding a shorter path shows up an optimal substructure and that dynamic programming can be applied to it; this is precisely what the Dijkstra algorithm does.

If path A →B is optimal, then path X1 →X2 is also optimal.

Now to give a counterexample that cannot be solved with dynamic programming, let’s say you don’t have enough fuel to complete your entire trip and that you have to pass through a station along the way. In this case, it is quite possible that the best path would go to X1, make a swing to the station, come back to X2 and then continues the rest of the route as before. Section i’ then no longer identifies itself with the optimal path i joining X1 to X2, quite simply because for this simpler problem, you do not need to refuel.

Optimal path X1 →X2 is not part of optimal path A →B if refueling is part of the problem

Dijkstra in practice

Even though dynamic programming fits well with a recursive processing approach, it is not mandatory and it is not what Dijkstra’s algorithm does in practice. The idea consists rather of an exploratory and iterative approach of the different nodes of the graph, but guaranteeing that a node is only explored once at most. To do so, these nodes are classified into three categories:

the explored nodes {Ei}, and for which a shorter path has already been found;
the nodes at the border of the exploration territory {Fj}, and which will be treated according to a certain order of priority. We will come back on this soon;
the unexplored nodes {Ok}, among which is the destination node that we are still trying to reach.

At the beginning, only the starting node is in the {Ei} set, with a cost of typically 0, and all other nodes are placed in {Ok} with an infinite cost (or at the very least a very great value). In addition to the cost, each node is also assigned a pointer to the previous node of the solution determining the shortest path from there to the starting point. All the subtlety of the algorithm then lies in the way of extending the border and we are going to take an example to fully understand how it works.

Example showing a shortest path problem, partially solved with Dijkstra. Solved nodes are in blue, next nodes to be solved are in green, while unexplored nodes are in red.

By definition, the border nodes are made up of all the unexplored nodes adjacent to the nodes {Ei} (i.e. connected to it by a single edge), as for f0, f1 and f2 in green on the drawing. Explored nodes for which a solution exists are in blue, while unexplored ones are in red. It can be seen that at each blue or green node, an arrow points to the previous node in terms of the shortest path to the starting point.

The crucial point is then to realize that these arrows are definitive for the blue nodes, but temporary for the green nodes. And it is for this deep reason that a priority order is necessary to deal with the nodes of the border. Indeed, only the explored nodes (and for which a shorter path already exists) are taken into account when a new node is added to the border. This means that if we examine the node f1, for example, only the nodes e6, e5 and e7 were taken into account in determining the shortest path pointer. It is therefore possible to establish that this pointer must point to e7, but this solution will not be final until a solution for f2 has also been found. Indeed, it may happen that the shortest path to reach f1 goes through f2, invalidating the pointer to e7.

The problem is that, when Dijkstra’s algorithm processes the next green boundary node, it then moves it into the set of resolved nodes, in blue. This set is, let us remember, built on the postulate that all its nodes will never change again, their solution being final. Therefore, among all the green nodes that it is possible to treat, it is crucial to choose one for which we are sure to have already obtained the best solution. And for that, it is enough to take the border node with the lowest cost; because going through any other border node would then only lengthen the solution.

Let us imagine that f2 is this very node whose cost is minimum among all the nodes in {Fk}. It is then possible to move it to the set {Ei}, without forgetting to update all its neighbours beforehand. In this way, the node f1 will be reexamined to determine if its pointer should remain pointed at e7, or if it is necessary to update it to go through f2. This will also be the occasion to extend the border by adding the node o3 to it.

This iterative procedure clearly illustrates the efficiency of dynamic programming, consisting here in processing a node located in the boundary while totally ignoring the unexplored nodes, and by reusing the preceding solutions, encoded in the set of explored nodes.

The problem of large maps

Except for special graph cases, we have the assurance that there is nothing more efficient than Dijkstra to find the shortest path in a weighted graph. This is an algorithm widely used to calculate the route in a navigation system, such as the one in your car or in your smartphone.

For this kind of problem, the road network is typically modelled as a graph in which the nodes represent road intersections, while the edges identify with the roads joining these intersections. The weight of each edge (also called a metric) depends on the type of road you are looking for. If you are on foot, or just want to explore the countryside, then you can use the length of each link as a metric to find the shortest route. But more likely you'll just want to get to your destination quickly, in which case link travel time is a preferable metric. This time can simply be based on the legal authorized speed, but also on more interesting information such as statistics of historical data or even on real-time traffic information.

In this context, the legitimate question to ask comes down to knowing what performance Dijkstra's algorithm can have in practice on a road navigation map. And to tackle a very concrete case, let's deal directly with what we find in a navigation system: the map of an entire continent, like the one of Europe that we are going to use in this article.

The database of a typical European road map contains around 120 million links (the edges of the graph) and 90 million nodes (the intersections). It's a big playing field to explore, even for an efficient algorithm that only traverses each node once. And just because it looks nice, let me shoot below a picture to illustrate it.

Computer generated map from a European navigation map database.

Of course, Dijkstra’s algorithm does not need to explore every node on the map for every request. As it works by gradually extending its exploration neighbourhood from a starting point, it goes without saying that it can stop immediately once the destination node is reached. The compute time of a route between two points not too far apart is therefore completely uncorrelated from the size of the map itself.

It is therefore interesting to reformulate the performance of this algorithm differently for the purpose at hand. Imagine that we know in advance which is the shortest path between two points A and B, and that this one is composed of k links and k+1 nodes. What we want to know is how many nodes the algorithm is going to visit before finding this optimal solution. The resulting calculation time will then be directly proportional to this number.

It is easy to determine this number as a first approximation. First of all, the graph of a road network is an almost exclusively planar graph (it can be represented on a plane without the edges crossing each other). Then, we can simplify things by considering that the distribution of nodes is uniform in the plane (which is obviously not the case in reality). Finally, we will also forget the weight of the edges and give them a unique value. The shortest path will then be the one which totalizes the smallest number of these edges.

In this somewhat reductive framework, Dijkstra’s algorithm will behave exactly as in the example taken from Wikipedia below. Also, except when the border hits the gray wall, it extends in all directions like a circle.

Dijkstra algorithm at work. Credit to Wikipedia.

Our approximation then simply gives us that if the shortest path goes through k nodes, then the number of nodes explored to find the solution will be proportional to k² . The performances of Dijkstra’s algorithm decrease quadratically according to the distance which separates the starting point from the ending point.

Radius of the Dijkstra search area is typically proportional to the distance between the start and destination nodes (search area in red)

Squared numbers are always a bit scary in algorithms. To give you an idea of what it looks like in practice, here are some numbers I measured on a real router using Dijkstra. This is an average over 100 random requests across Europe, for 3 different optimal route length categories (in terms of links). The quadratic tendency is clearly present and follows our prediction.

+-------------+--------------------+
|   #links    | #visited | Routing |
| in solution |  nodes   | time (s)|
+-------------+----------+---------+
|    10       |     491  |  0.005  |
|   100       |   14074  |  0.730  |
|  1000       | 1581057  |  4.861  |
+-------------+----------+---------+

The time required to calculate routes over long distances is therefore clearly problematic, especially if this calculation is performed on a server that potentially has to process thousands of requests per second (think what Google Map does). A first way to improve things is to start the search both from the starting point and the end point simultaneously until the two exploration zones meet somewhere. Each ball having a radius halved, this saves us a factor of 2; interesting, but not sufficient to eliminate the quadratic factor.

An easy win by searching simultaneously from the start and destination nodes.

A heuristic to the rescue

This picture illustrates a decomposition of the problem through several layers of different importance, as well as my ability to use Paint.

Now let’s go back in time, before navigation systems became a convenience, in the days of good paper maps. Imagine you have to plan your route to get from, say, Madrid to Berlin. Obviously, your strategy would certainly not have been to analyse millions of routes! They weren’t even all represented on a small-scale paper map anyway. Since the road network is hierarchized in several levels of different importance, it is eminently easier to focus only on the highways to determine 95% of the trip. Then, all you have to do is find a valid path from the starting point to the nearest highway, and the same for the end point.

By focusing on the highest hierarchical level, the number of links to analyse drops drastically. For example, imagine that this reduces the length of the routes by a factor of 10. Then the number of nodes that the Dijkstra algorithm will have to explore will drop to (k/10)², which is 100 times less than before. There is also a quadratic effect here, but in the right direction this time.

This trick only works because the road network is already split in different layers, and this hierarchy has been designed to minimize travel time. But since most of the time, the users simply want to get to their destination as quickly as possible, it suits our needs. For those who knew the first personal navigation devices (PND), this explains why calculating the fastest route was quite swift, while calculating the shortest route over long distances took immeasurable time.

Therefore, it was very common for the first systems to use the natural decomposition of the road network to create different map layers of different levels of importance. At the bottom, level 0 layer includes all of the road infrastructure, while at the top, only the major axes are considered. A cartographic rendering engine also adopts the same approach to draw maps at different zoom levels.

A not-so-perfect heuristic

Although effective, this heuristic unfortunately brings a problem which is difficult to get rid of completely. Dijkstra’s algorithm must indeed be modified in order to operate with a stack of N maps instead of just one, and it will have to constantly ask itself at which level it has to work. Should he go up in layers in order to progress faster to the destination, at the risk of missing solutions that do not exist at such level of detail? Or is it better to go down, instead, to focus on a potential solution? Because the problem of the upper layers, which exclude the majority of the road network, comes from the assumption that finding the best solution doesn’t require any deeper investigation. This is sometimes true when the search criterion is time and when there is no traffic, but this is not always necessarily the case.

A motorway can be closed because of some road works carried out, or because of a car accident. The algorithm will then have to continuously aim at a compromise between the upper layers, which strive to calculate more quickly, or the lower layers that are more exhaustive. As you can imagine, this considerably complicates the initial research process because of the many criteria to be incorporated to trigger a level change, sometimes upwards, sometimes downwards.

The second problem comes from the strict and rigid hierarchy of the road network, entirely designed to minimize travel time. But in this time troubled by global warming and the imminent advent of electric cars, the situation is changing a bit.

First of all, even forgetting the kind of motorization, who would not be inclined to drop a little of his time, on the one hand, if he can gain more in fuel on the other? It is often possible to find an interesting leverage effect by choosing a shorter road at lower speed. Of course, you will lose a few minutes in additional travel time, but your energy gain will be increased tenfold. I am not talking here about seeking some ultimate energy-saving behaviour (which would consist of everyone driving at 30 km/h everywhere), but rather about a winning compromise.

Secondly, this energy consideration is no longer superfluous at all if we look at an electric motor. Indeed, where a good big diesel is at ease once launched on the highway at 120 km/h, this is not at all the case of an electric motor which will have to over-consume to overcome the aerodynamic effects. They are very high in this range of velocity because of the square of the speed. On the other hand, such vehicle is much better at dealing with speed variations thanks to its regenerative braking capability, and its performance excels if the aerodynamic effects are not too present. Going more peacefully could not be more important here, because the problem of battery autonomy and their recharging time has not been resolved yet.

Also, the need to calculate an Eco route according to the type of motorization becomes urgent. But this requires forgetting the heuristic established on the natural hierarchical decomposition of the road network, since this one is only based on the legal authorized speed.

What if you could have your cake and eat it too ?

The general idea of the heuristic presented above is a good idea to reduce the size of the graph to explore. Unfortunately it also increases the search procedure complexity and prevent to find the optimal solution. In addition, it is based on a fixed hierarchy that is not necessarily the best.

But what if I tell you that it exists a method that enables, simultaneously, to

build an hierarchy of road levels to reduce the searching time;
to have a mathematical guarantee to find the optimal solution;
that this hierarchy can be customized on demand for any possible metric (short, fast, eco route, whatever).

That precisely the object of the Contraction Hierarchies that I will cover in Part 2.