PostGIS function costs

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

PostGIS function costs

Paul Norman
Short version: I have a way to get accurate PostGIS function costs and
will be doing a PR.

In an effort to allow setting better function costs, I ran some benchmarks.

PostgreSQL function costs are defined in units of cpu_operator_cost and
default to 1 for C functions and 100 for other language functions. Some
PostGIS functions are set to a cost of 100, but I'm not aware of any
prior efforts to measure the cost.

For a representative dataset, I used a table of
(id bigint, geom geometry(Geometry,3857)) with the geometries being from
  a UK OSM extract. There were 5504680 rows and the planner estimated
  5515609 rows. The test polygons were 1401 MB.

PostgreSQL 9.4.6 on Ubuntu 14.04 was used, and the full PostGIS version
was POSTGIS="2.1.8 r13780" GEOS="3.4.2-CAPI-1.8.2 r3921"
PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.10.1, released 2013/08/26"
LIBXML="2.9.1" LIBJSON="UNKNOWN" RASTER. The machine has 32GB RAM, and
an Intel i7-4770 @ 3.40GHz

To establish costs, I looked at costs and execution times for functions
a known cost, and compared it with PostGIS functions. This was done by
executing the statement EXPLAIN ANALYZE SELECT <function> FROM test_polys;
  five times after first warming up the cache. From the EXPLAIN output,
  I got planner costs and execution times.

First, I had to find out the time/cost for a known function. I started
with bigint addition, comparing the difference in time and cost between
<function> = id and <function> = id + 1∷bigint. I could then get a few
values

-    Cost (from addition)
-    Time (from addition)
-    Cost per est. row
-    Time per actual row
-    Time per cost (row basis)

Repeating with <function> = geom and <function> = ST_Buffer(geom,0) let
me get time per actual row for ST_Buffer, which could then be divided by
time per cost of addition and cpu_operator_cost, giving me function cost
in units of cpu_operator_cost.

For greater accuracy, I averaged over bigint addition, bigint subtraction,
bigint modulo 3, bigint divided by 3, bigint squared, and square root.

Selected results

ST_Buffer(geom, 0): 740
ST_Buffer(geom, 10): 2102
ST_Boundary(geom): 59
ST_CoordDim(geom): 5
ST_Dimension(geom): 9
ST_IsValid(geom): 903
ST_IsValidDetail(geom): 1043

I’m going to prepare a PR which sets costs for most functions. These
initial costs will probably need adjustment over time, but I’d like to
get some evidence-based costs in place, even if they'll need later
adjustment.

_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: PostGIS function costs

Sandro Santilli-2
On Thu, May 05, 2016 at 11:35:37PM -0700, Paul Norman wrote:
> Short version: I have a way to get accurate PostGIS function costs and
> will be doing a PR.

Good job, Paul.

My concern is that the same function (say ST_Buffer) may cost very
few cycles or a lot of cycle depending on the input (a point vs. a
line with hundreds or thousand of vertices). I guess this is a common
problem with variable-size types.

I wonder if it would make sense to have the work you did packaged
as an extention to run on the *real* data for a given database
and tweak costs accordingly.

--strk;
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: PostGIS function costs

Paul Ramsey
Since costs are only roughly used at the moment, I’d like to see them used in an order-or-magnitude sense… 10, 100, 1000, 10000

P.

> On May 6, 2016, at 11:10 AM, Sandro Santilli <[hidden email]> wrote:
>
> On Thu, May 05, 2016 at 11:35:37PM -0700, Paul Norman wrote:
>> Short version: I have a way to get accurate PostGIS function costs and
>> will be doing a PR.
>
> Good job, Paul.
>
> My concern is that the same function (say ST_Buffer) may cost very
> few cycles or a lot of cycle depending on the input (a point vs. a
> line with hundreds or thousand of vertices). I guess this is a common
> problem with variable-size types.
>
> I wonder if it would make sense to have the work you did packaged
> as an extention to run on the *real* data for a given database
> and tweak costs accordingly.
>
> --strk;
> _______________________________________________
> postgis-devel mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/postgis-devel

_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: PostGIS function costs

Paul Norman
In reply to this post by Sandro Santilli-2
On 5/6/2016 11:10 AM, Sandro Santilli wrote:
My concern is that the same function (say ST_Buffer) may cost very
few cycles or a lot of cycle depending on the input (a point vs. a
line with hundreds or thousand of vertices). I guess this is a common
problem with variable-size types.

Not just variable size types, but functions where the cost depends on the input. ST_Buffer(geom, 10) is much more expensive than ST_Buffer(geom, 0), and I could probably make it an order of magnitude more expensive with suitable parameters.

Once 9.6 lands and these parameters start to matter more, we can see if it's best to have a cost from the low or high end of the possible range.

I wonder if it would make sense to have the work you did packaged
as an extention to run on the *real* data for a given database
and tweak costs accordingly.

It's difficult to do this. It requires an idle system, a dataset that can be entirely CPU bound, and a fair amount of time to run. Given how few people tune the cost GUCs, I don't think this is worth it.

_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: PostGIS function costs

Paul Norman
In reply to this post by Paul Norman
On 5/5/2016 11:35 PM, Paul Norman wrote:
> I’m going to prepare a PR which sets costs for most functions. These
> initial costs will probably need adjustment over time, but I’d like to
> get some evidence-based costs in place, even if they'll need later
> adjustment.

I've done an initial pass adding costs to functions which take a single
polygon, or a single polygon and other non-geometry arguments. You can
see the diff in the form of a PR at
https://github.com/postgis/postgis/pull/104. Some comments would be
appreciated before I look at non-polygon functions and functions which
take multiple geometries.
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel