Advice on space efficient grid indexing

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Advice on space efficient grid indexing

Arnaud L.
Hi all

We have quite large tables containing geometries (one table per geometry
type), and we need to use them in a legacy application that has it's own
locking system.
Basically, rows are locked on a grid basis (i.e. all geometries in a
square cell are locked at once, the size of the cell cannot be changed).
Geometries are attributed to a cell according to the coordinates of the
center of their bounding boxes, so for a given geometry we have only one
and exactly one matching cell.

For now, we have added a CHAR column with the coordinates of the
lower/left corner of the cell formated to always fit the column width (x
coordinate left padded with zeros up to 9 characters, underscore, y
coordinate left padded with zeros up to 9 characters).

It works, and computing this string is very fast (min/max coordinates of
bouding box divided by two and "floored" to the closest grid point), but
it takes a lot of storage (the column is a char(19)), and we now need to
index it for faster access (it's not indexed for the moment).

I thought about a geohash, but postgis' geohash implementation is the
standard one with a grid size that cannot be changed (and our data is
not in lat/lon, it's in UTM coordinates).

What would be a more efficient way to store this cell coordinates so
that disk usage would be limited and accessing the rows would be using
the index efficiently ?
Using a point geometry would not be space efficient I believe, and it
seems overkill since we do not need to do any spatial operation on this
column (only equality queries, i.e. retrieving all rows in a given cell).
I also thought about a bigint array of two elements, but our database is
mainly accessed through psqlodbc and I'm not sure that arrays are very
efficient in this case.

Any idea or advice would be very welcome.

Regards
Arnaud
_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Advice on space efficient grid indexing

Arnaud L.
Le 29/09/2016 à 10:11, Arnaud L. a écrit :
> What would be a more efficient way to store this cell coordinates so
> that disk usage would be limited and accessing the rows would be using
> the index efficiently ?
> Using a point geometry would not be space efficient I believe, and it
> seems overkill since we do not need to do any spatial operation on this
> column (only equality queries, i.e. retrieving all rows in a given cell).
> I also thought about a bigint array of two elements, but our database is
> mainly accessed through psqlodbc and I'm not sure that arrays are very
> efficient in this case.

I should also have pointed out that there is a third parameter, because
we have two overlapping grids (but a given geomtry can only be in one of
them). Sot it is more a 3 columns reference than a 2 columns one.
Also, the obvious choice of adding 3 columns (gridx, gridy, gridlevel)
would be that last choice, even after the ugly "char" column. We have
functions returning this "grid id", and it would be very inconvenient if
they should return setof bigints or records.
I really like the idea that a grid cell has a unique "id" that can be
easily computed, and that a geometry can be also easily retrieved from
this id with having to write complex sql like (gridx, gridy, gridlevel)
= (..., ..., ...).

Regards
--
Arnaud
_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Advice on space efficient grid indexing

Martijn Meijers
Hi Arnaud,

Would the idea of the Spatial Location Code help you?

http://www.gdmc.nl/oosterom/slc.pdf


Martijn


On 29-09-16 10:49, Arnaud L. wrote:

> Le 29/09/2016 à 10:11, Arnaud L. a écrit :
>> What would be a more efficient way to store this cell coordinates so
>> that disk usage would be limited and accessing the rows would be using
>> the index efficiently ?
>> Using a point geometry would not be space efficient I believe, and it
>> seems overkill since we do not need to do any spatial operation on this
>> column (only equality queries, i.e. retrieving all rows in a given
>> cell).
>> I also thought about a bigint array of two elements, but our database is
>> mainly accessed through psqlodbc and I'm not sure that arrays are very
>> efficient in this case.
>
> I should also have pointed out that there is a third parameter,
> because we have two overlapping grids (but a given geomtry can only be
> in one of them). Sot it is more a 3 columns reference than a 2 columns
> one.
> Also, the obvious choice of adding 3 columns (gridx, gridy, gridlevel)
> would be that last choice, even after the ugly "char" column. We have
> functions returning this "grid id", and it would be very inconvenient
> if they should return setof bigints or records.
> I really like the idea that a grid cell has a unique "id" that can be
> easily computed, and that a geometry can be also easily retrieved from
> this id with having to write complex sql like (gridx, gridy,
> gridlevel) = (..., ..., ...).
>
> Regards
> --
> Arnaud
> _______________________________________________
> postgis-users mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Advice on space efficient grid indexing

Arnaud L.
Le 29/09/2016 à 10:59, Martijn Meijers a écrit :
> Would the idea of the Spatial Location Code help you?
> http://www.gdmc.nl/oosterom/slc.pdf


Hi Martijn,
thanks a lot for your answer.

That's quite a long read but it looks very interesting, I'll look at
this tomorrow. Thanks in advance !

I have to mention that I made some progress in my quest by finding out
about pairing functions. https://en.wikipedia.org/wiki/Pairing_function

I'll let you know how it went.

Regards
--
Arnaud





>
>
> Martijn
>
>
> On 29-09-16 10:49, Arnaud L. wrote:
>> Le 29/09/2016 à 10:11, Arnaud L. a écrit :
>>> What would be a more efficient way to store this cell coordinates so
>>> that disk usage would be limited and accessing the rows would be using
>>> the index efficiently ?
>>> Using a point geometry would not be space efficient I believe, and it
>>> seems overkill since we do not need to do any spatial operation on this
>>> column (only equality queries, i.e. retrieving all rows in a given
>>> cell).
>>> I also thought about a bigint array of two elements, but our database is
>>> mainly accessed through psqlodbc and I'm not sure that arrays are very
>>> efficient in this case.
>>
>> I should also have pointed out that there is a third parameter,
>> because we have two overlapping grids (but a given geomtry can only be
>> in one of them). Sot it is more a 3 columns reference than a 2 columns
>> one.
>> Also, the obvious choice of adding 3 columns (gridx, gridy, gridlevel)
>> would be that last choice, even after the ugly "char" column. We have
>> functions returning this "grid id", and it would be very inconvenient
>> if they should return setof bigints or records.
>> I really like the idea that a grid cell has a unique "id" that can be
>> easily computed, and that a geometry can be also easily retrieved from
>> this id with having to write complex sql like (gridx, gridy,
>> gridlevel) = (..., ..., ...).
>>
>> Regards
>> --
>> Arnaud
>> _______________________________________________
>> postgis-users mailing list
>> [hidden email]
>> http://lists.osgeo.org/mailman/listinfo/postgis-users
>

_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users