pgRouting

Forum #13 - Topic #15 - Message List

speeding up assign_vertex_id

Any ideas on speeding up assign_vertext_id()? I see in point_to_id() a mote about using && and index someday. Any progress on that?

I have a postgis layer with 30 million (really!) lines. I've been running assign_vertex_id() on that for over a week now (and on a slow machine). Luckily no power outages yet.

Even running it on the kanagawa sample takes hours, and on a much faster machine. (I actually didn't let it finish because I ran it from phppgadmin, and the page timed out, though postgres kept crunching at it in the background)

  • Message #55

    We've got this function as a legacy of pgDijkstra and we almost didn't change it since that time.

    Ok, I'll think on how it can be improved.

  • Message #58

    I don't know all that much about how Postgres does its stuff internally, but I wonder if the transaction is slowing it down at all? Like maybe the overhead of building up the transaction for so many records, then applying the massive update all at once?

    Could chopping it up into smaller transactions help? Like every 1000 or 10000 records (possibly configurable so the user can try to match it with their server configuration).

    It would certainly help make the operation more recoverable - there have been a couple storms in the area recently, and our building's power easily goes down (luckily power stayed up, no UPS on this machine). It would be nice to be able to start where it left off, assuming the source/target ids were null to start with.

  • Message #343

    one problem is in function point_to_id, it needs to use spatial index. currently the algo runs O(n2) so ur 30million record will not finish no matter how many years u wait or powerful machine use.

    i try hack this function and i was able to get much faster result. but for ur data of 30million, u need to overhaul the entire assign_vertex_id program so it runs O(n). should be possible but lots of effort :(

    • Message #453

      hi,

      the function below is more rapid.

      CREATE OR REPLACE FUNCTION point_to_id(p geometry, tolerance double precision) RETURNS BIGINT AS $$ DECLARE

      _r record; _id bigint; _srid integer;

      BEGIN

      _srid := Find_SRID('public','vertices_tmp','the_geom');

      SELECT

      ST_Distance(the_geom,ST_GeomFromText( AsText?(p), _srid)) AS d, id, the_geom AS

      INTO _r FROM vertices_tmp WHERE

      the_geom && ST_Expand(ST_GeomFromText(AsText?(p), _srid), tolerance ) AND ST_Distance(the_geom, ST_GeomFromText(AsText?(p), _srid)) < tolerance

      ORDER BY d LIMIT 1; IF FOUND THEN

      _id:= _r.id;

      ELSE

      INSERT INTO vertices_tmp(the_geom) VALUES (ST_SetSRID(p,_srid)); _id:=lastval();

      END IF;

      RETURN _id;

      END; $$ LANGUAGE 'plpgsql' VOLATILE STRICT;

    • Message #454

      CREATE OR REPLACE FUNCTION assign_vertex_id(geom_table varchar, tolerance double precision, geo_cname varchar, gid_cname varchar) RETURNS VARCHAR AS $$ DECLARE

      _r record; source_id int; target_id int; srid integer;

      BEGIN

      BEGIN

      DROP TABLE vertices_tmp; EXCEPTION

      WHEN UNDEFINED_TABLE THEN

      END; EXECUTE 'CREATE TABLE vertices_tmp (id serial)';

      FOR _r IN EXECUTE 'SELECT srid FROM geometry_columns WHERE f_table_name=' quote_ident(geom_table)';' LOOP

      srid := _r.srid;

      END LOOP;

      EXECUTE 'SELECT addGeometryColumn(vertices_tmp, the_geom, 'srid', POINT, 2)'; CREATE INDEX vertices_tmp_idx ON vertices_tmp USING GIST (the_geom);

      FOR _r IN EXECUTE 'SELECT ' quote_ident(gid_cname) ' AS id,'

      ' ST_StartPoint(' quote_ident(geo_cname) ') AS source,'
      ' ST_EndPoint(' quote_ident(geo_cname) ') as target'
      ' FROM ' quote_ident(geom_table)

      LOOP

      source_id := point_to_id(setsrid(_r.source, srid), tolerance); target_id := point_to_id(setsrid(_r.target, srid), tolerance);

      EXECUTE 'update ' quote_ident(geom_table)

      ' SET source = ' source_id

      ', target = ' target_id

      ' WHERE ' quote_ident(gid_cname) ' = ' _r.id;

      END LOOP; RETURN 'OK';

      END; $$ LANGUAGE 'plpgsql' VOLATILE STRICT;

      • Message #457

        Thank you so much!

        We tested your script and it is amazingly fast. It will be included to the next release.

      • Message #458

        Cool. I'll give it a try when I have a chance. Though it looks like it may still be the O(n2) case jontan6 mentioned.

        • Message #459

          It would rather be something around O(nâ‹…log(n)) as using && uses the index on the geometry, and thus dramatically reducing the number of elements where the distance is effectively measured.

          Note that it will only work if there is an index on the geometry column.

          Also there is a function st_dwithin(geomA, geomB, d) that does strictly the same thing as geomA && st_expand(geomB,d) AND st_distance(geomA, geomB) <= d