Forum #13 - Topic #15 - Message List
Any ideas on speeding up assign_vertext_id()? I see in point_to_id() a mote about using && and index someday. Any progress on that?
I have a postgis layer with 30 million (really!) lines. I've been running assign_vertex_id() on that for over a week now (and on a slow machine). Luckily no power outages yet.
Even running it on the kanagawa sample takes hours, and on a much faster machine. (I actually didn't let it finish because I ran it from phppgadmin, and the page timed out, though postgres kept crunching at it in the background)
-
Message #55
We've got this function as a legacy of pgDijkstra and we almost didn't change it since that time.
Ok, I'll think on how it can be improved.
anton06/06/07 15:14:46 -
Message #58
I don't know all that much about how Postgres does its stuff internally, but I wonder if the transaction is slowing it down at all? Like maybe the overhead of building up the transaction for so many records, then applying the massive update all at once?
Could chopping it up into smaller transactions help? Like every 1000 or 10000 records (possibly configurable so the user can try to match it with their server configuration).
It would certainly help make the operation more recoverable - there have been a couple storms in the area recently, and our building's power easily goes down (luckily power stayed up, no UPS on this machine). It would be nice to be able to start where it left off, assuming the source/target ids were null to start with.
kyngchaos06/09/07 01:16:34 -
Message #343
one problem is in function point_to_id, it needs to use spatial index. currently the algo runs O(n2) so ur 30million record will not finish no matter how many years u wait or powerful machine use.
i try hack this function and i was able to get much faster result. but for ur data of 30million, u need to overhaul the entire assign_vertex_id program so it runs O(n). should be possible but lots of effort :(
jontan612/29/07 00:19:22 -
Message #453
hi,
the function below is more rapid.
CREATE OR REPLACE FUNCTION point_to_id(p geometry, tolerance double precision) RETURNS BIGINT AS $$ DECLARE
_r record; _id bigint; _srid integer;
BEGIN
_srid := Find_SRID('public','vertices_tmp','the_geom');
SELECT
ST_Distance(the_geom,ST_GeomFromText( AsText?(p), _srid)) AS d, id, the_geom AS
INTO _r FROM vertices_tmp WHERE
the_geom && ST_Expand(ST_GeomFromText(AsText?(p), _srid), tolerance ) AND ST_Distance(the_geom, ST_GeomFromText(AsText?(p), _srid)) < tolerance
ORDER BY d LIMIT 1; IF FOUND THEN
_id:= _r.id;
ELSE
INSERT INTO vertices_tmp(the_geom) VALUES (ST_SetSRID(p,_srid)); _id:=lastval();
END IF;
RETURN _id;
END; $$ LANGUAGE 'plpgsql' VOLATILE STRICT;
christiangda04/15/08 05:09:06 -
Message #454
CREATE OR REPLACE FUNCTION assign_vertex_id(geom_table varchar, tolerance double precision, geo_cname varchar, gid_cname varchar) RETURNS VARCHAR AS $$ DECLARE
_r record; source_id int; target_id int; srid integer;
BEGIN
BEGIN
DROP TABLE vertices_tmp; EXCEPTION
WHEN UNDEFINED_TABLE THEN
END; EXECUTE 'CREATE TABLE vertices_tmp (id serial)';
FOR _r IN EXECUTE 'SELECT srid FROM geometry_columns WHERE f_table_name='
quote_ident(geom_table) ';' LOOP srid := _r.srid;
END LOOP;
EXECUTE 'SELECT addGeometryColumn(vertices_tmp, the_geom, '
srid ', POINT, 2)'; CREATE INDEX vertices_tmp_idx ON vertices_tmp USING GIST (the_geom); FOR _r IN EXECUTE 'SELECT '
quote_ident(gid_cname) ' AS id,' ' ST_StartPoint(' quote_ident(geo_cname) ') AS source,' ' ST_EndPoint(' quote_ident(geo_cname) ') as target' ' FROM ' quote_ident(geom_table) LOOP
source_id := point_to_id(setsrid(_r.source, srid), tolerance); target_id := point_to_id(setsrid(_r.target, srid), tolerance);
EXECUTE 'update '
quote_ident(geom_table) ' SET source = '
source_id ', target = '
target_id ' WHERE '
quote_ident(gid_cname) ' = ' _r.id; END LOOP; RETURN 'OK';
END; $$ LANGUAGE 'plpgsql' VOLATILE STRICT;
christiangda04/15/08 06:30:51-
Message #457
Thank you so much!
We tested your script and it is amazingly fast. It will be included to the next release.
anton04/15/08 13:35:11 -
Message #458
Cool. I'll give it a try when I have a chance. Though it looks like it may still be the O(n2) case jontan6 mentioned.
kyngchaos04/15/08 23:25:48 -
Message #459
It would rather be something around O(nâ‹…log(n)) as using && uses the index on the geometry, and thus dramatically reducing the number of elements where the distance is effectively measured.
Note that it will only work if there is an index on the geometry column.
Also there is a function st_dwithin(geomA, geomB, d) that does strictly the same thing as geomA && st_expand(geomB,d) AND st_distance(geomA, geomB) <= d
Tristram04/16/08 01:43:22
Powered by Trac 0.10.4
By Edgewall Software.
pgRouting is a project of PostLBS
This site is maintained by Orkney, Inc. -
