pgRouting

Forum #23 - Topic #24 - Message List

speeding up searching shortest path

I have made some testing on bulk data. Table "routing_roads" contains 328997 rows.

My software environment : WinXP, Postgresql 8.2.3,Postgis 1.2.1,pgRouting 1.0.0a

My hardware environment : Athlon 3 GHz, 2G ram

I have made 3 queries for searching shortest path using 3 algorithms:

1)Dijkstra algorithm

SELECT * FROM shortest_path('SELECT id,source,

target, cost,0 as reverse_cost FROM routing_roads order by 1,2,3',111465,112777,true,false);

Result : 6 rows fetched (4,14 sec)

vertex_id edge_id cost

111 465 1 3

112 766 3 3

112 761 5 3

111 685 7 3

112 773 9 3

112 777 -1 0

2)A* algorithm

SELECT * FROM shortest_path_astar('SELECT id,source,

target, cost,0 as reverse_cost, x1,y1,x2,y2 FROM routing_roads order by 1,2,3',111465,112777,true,false);

Result : 6 rows fetched (7,23 sec).

The output is the same as in Dijkstra algorithm

3)Shooting* algorithm (there are no restrictions)

SELECT * FROM shortest_path_shooting_star('SELECT id,source,

target, cost,0 as reverse_cost, x1,y1,x2,y2,null as rule,null as to_cost FROM routing_roads order by 1,2,3',1,9,true,false);

Result : 5 rows fetched (17,50 sec)

vertex_id edge_id cost

0 1 3

1 3 3

5 5 3

8 7 3

11 9 3

Conclusions: shortest path edges in results where found correctly. That's great.

But the time processing results (for example for Shooting* )for finding the directly connected edges is the same as for the edge which far from each other:

SELECT * FROM shortest_path_shooting_star('SELECT id,source,

target, cost,0 as reverse_cost, x1,y1,x2,y2,null as rule,null as to_cost FROM routing_roads order by 1,2,3',1,3,true,false);

1 rows fetched (18,02 sec)

vertex_id edge_id cost

0 1 3

1 3 3

Adding indexes to table doesn't improve the speed. I'm not c++ guru, but it seems that almost all time is spent for loading data from table into the memory (during processing postgres process takes up to 320Mb). After the processing of query all in memory data ( from "routing_roads" table) is removed and the next query has to wait for getting all data in memory again.

The question:

1)is it possible to make some sort of caching data in memory , so as to use it for next query?

2)Can all in memory data be shared across postgre processes ( for application using connection pools)?

3)Can be added some sort of api for manually removing cached in memory data ( in case of routing table updates)

  • Message #86

    Yes, you're right - loading data is a main time-eater here.

    The are two ways of how to solve it - reduce a number of edges to load or (as you said) load all edges at one time and keep them somewhere in a memory.

    In PostgreSQL each function has a memory context with the same lifetime as function itself. So, even if we won't delete all loaded data after the first run of a function, it is pretty problematic (if possible at all) to find a reference to that data when we will call the function again.

    So, we came by the first way. There are several ways of how to reduce a data to load. In 'routing_postgis.sql' file you can find wrapper functions which clip out a bounding box around you start and end points. That's the easiest way.

    Another way could be data layering. For example, if you want to go from one city to another, you don't need to take all small roads those city have. You probably will choose a highway. And the good new is that highway network is pretty sparse, so it won't take a lot of time to load it.

    • Message #87

      In PostgreSQL each function has a memory context with the same lifetime as function itself. So, even if we won't delete all loaded data after the first run of a function, it is pretty problematic (if possible at all) to find a reference to that data when we will call the function again.

      My offer may sound a little dummy. Is it possible to save all loaded data in shared buffers? so as each new function call can reuse it? You might look at http://archives.postgresql.org/pgsql-patches/2006-07/msg00143.php

      • Message #88

        Thanks a lot!

        I will try it. But I still think that it won't work for huge data sets (like entire NAVTEQ data which is about 30Gb) or you will need a kind of super server to load everything at once.

        But probably we can combine two approaches...