Short version: I have a way to get accurate PostGIS function costs and

will be doing a PR.

In an effort to allow setting better function costs, I ran some benchmarks.

PostgreSQL function costs are defined in units of cpu_operator_cost and

default to 1 for C functions and 100 for other language functions. Some

PostGIS functions are set to a cost of 100, but I'm not aware of any

prior efforts to measure the cost.

For a representative dataset, I used a table of

(id bigint, geom geometry(Geometry,3857)) with the geometries being from

a UK OSM extract. There were 5504680 rows and the planner estimated

5515609 rows. The test polygons were 1401 MB.

PostgreSQL 9.4.6 on Ubuntu 14.04 was used, and the full PostGIS version

was POSTGIS="2.1.8 r13780" GEOS="3.4.2-CAPI-1.8.2 r3921"

PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.10.1, released 2013/08/26"

LIBXML="2.9.1" LIBJSON="UNKNOWN" RASTER. The machine has 32GB RAM, and

an Intel i7-4770 @ 3.40GHz

To establish costs, I looked at costs and execution times for functions

a known cost, and compared it with PostGIS functions. This was done by

executing the statement EXPLAIN ANALYZE SELECT <function> FROM test_polys;

five times after first warming up the cache. From the EXPLAIN output,

I got planner costs and execution times.

First, I had to find out the time/cost for a known function. I started

with bigint addition, comparing the difference in time and cost between

<function> = id and <function> = id + 1∷bigint. I could then get a few

values

- Cost (from addition)

- Time (from addition)

- Cost per est. row

- Time per actual row

- Time per cost (row basis)

Repeating with <function> = geom and <function> = ST_Buffer(geom,0) let

me get time per actual row for ST_Buffer, which could then be divided by

time per cost of addition and cpu_operator_cost, giving me function cost

in units of cpu_operator_cost.

For greater accuracy, I averaged over bigint addition, bigint subtraction,

bigint modulo 3, bigint divided by 3, bigint squared, and square root.

Selected results

ST_Buffer(geom, 0): 740

ST_Buffer(geom, 10): 2102

ST_Boundary(geom): 59

ST_CoordDim(geom): 5

ST_Dimension(geom): 9

ST_IsValid(geom): 903

ST_IsValidDetail(geom): 1043

I’m going to prepare a PR which sets costs for most functions. These

initial costs will probably need adjustment over time, but I’d like to

get some evidence-based costs in place, even if they'll need later

adjustment.

_______________________________________________

postgis-devel mailing list

[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel