Raster time-series data

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Raster time-series data

Tom Cook
I'm trying to import (what I think of as) large timeseries of gridded data in postgis 2.0 (on PostgreSQL.  The data comes as HDF-EOS files (basically HDF4).  Each file has the whole grid, which is 22680 points, and 24 bands, one for each hour of the day.  Each file covers one day, and I'm trying to import 16 years (5,844 files).

My strategy at present is to put each day into a raster with 24 bands, really because this is the easiest to implement.  So for the first file:

raster2pgsql -I -t auto -c 'HDF4_EOS:EOS_GRID:"file1.hdf":EOSGRID:SWGDN' swgdn | psql
psql -c 'alter table swgdn add raster_date date;
psql -c 'update swgdn set raster_date = '20000101';

And then for subsequent files:

raster2pgsql -I -t auto -a 'HDF4_EOS:EOS_GRID:"fileX.hdf":EOSGRID:SWGDN' swgdn | psql
psql -c 'update swgdn set raster_date = 'XXXXXXXX' where raster_date is null;

In a word, it's slow.  I've so far been running the import script for about an hour and it's processed 146 input files.  I can't really quantify this, but it feels like it's getting slower.

Is this a reasonable strategy for storing this data, and my performance expectations just unrealistic?  Or is there a better structure to use for this?

Thanks for any suggestions,
Tom

_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Raster time-series data

Pierre Racine-2
Sounds reasonable to me. You could try splitting the process between your 4 or 8 processors, each importing a selection of files. Maybe this could help, maybe this could make the whole process even slower (depend on what is slow the CPU or the disk)...

If you don't necessarily need all the pixel values in the DB, try the -R option when importing. This is definitely faster.

Pierre

> -----Original Message-----
> From: postgis-users [mailto:[hidden email]] On Behalf
> Of Tom Cook
> Sent: Wednesday, August 31, 2016 12:46 PM
> To: [hidden email]
> Subject: [postgis-users] Raster time-series data
>
> I'm trying to import (what I think of as) large timeseries of gridded data in
> postgis 2.0 (on PostgreSQL.  The data comes as HDF-EOS files (basically HDF4).
> Each file has the whole grid, which is 22680 points, and 24 bands, one for each
> hour of the day.  Each file covers one day, and I'm trying to import 16 years
> (5,844 files).
>
> My strategy at present is to put each day into a raster with 24 bands, really
> because this is the easiest to implement.  So for the first file:
>
> raster2pgsql -I -t auto -c 'HDF4_EOS:EOS_GRID:"file1.hdf":EOSGRID:SWGDN'
> swgdn | psql
> psql -c 'alter table swgdn add raster_date date;
> psql -c 'update swgdn set raster_date = '20000101';
>
> And then for subsequent files:
>
> raster2pgsql -I -t auto -a 'HDF4_EOS:EOS_GRID:"fileX.hdf":EOSGRID:SWGDN'
> swgdn | psql
> psql -c 'update swgdn set raster_date = 'XXXXXXXX' where raster_date is null;
>
> In a word, it's slow.  I've so far been running the import script for about an hour
> and it's processed 146 input files.  I can't really quantify this, but it feels like it's
> getting slower.
>
> Is this a reasonable strategy for storing this data, and my performance
> expectations just unrealistic?  Or is there a better structure to use for this?
>
> Thanks for any suggestions,
> Tom
_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Raster time-series data

Robert Burgholzer-2
I wonder if it is getting slower because of the date query that you're doing?  You may try indexing raster_date field.  Putting an index on that might slow down inserts but it would improve updates.  I would certainly expect that your updates would get slower overtime for a non-indexed date match. 
Hth - robert


--
--
Robert W. Burgholzer
 'Making the simple complicated is commonplace; making the complicated simple, awesomely simple, that's creativity.'  - Charles Mingus


_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Raster time-series data

Robert Burgholzer-2
And further, waiting to set the date default on nulls until after the entire import completed could help too (if your requirement will allowmthat).

On Friday, September 2, 2016, Robert Burgholzer <[hidden email]> wrote:
I wonder if it is getting slower because of the date query that you're doing?  You may try indexing raster_date field.  Putting an index on that might slow down inserts but it would improve updates.  I would certainly expect that your updates would get slower overtime for a non-indexed date match. 
Hth - robert


--
--
Robert W. Burgholzer
 'Making the simple complicated is commonplace; making the complicated simple, awesomely simple, that's creativity.'  - Charles Mingus



--
--
Robert W. Burgholzer
 'Making the simple complicated is commonplace; making the complicated simple, awesomely simple, that's creativity.'  - Charles Mingus


_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Raster time-series data

Tumasgiu Rossini
In reply to this post by Robert Burgholzer-2
Also maybe use COPY instead of INSERT statements
using -Y switch could speed up the process a bit.

2016-09-02 22:02 GMT+02:00 Robert Burgholzer <[hidden email]>:
I wonder if it is getting slower because of the date query that you're doing?  You may try indexing raster_date field.  Putting an index on that might slow down inserts but it would improve updates.  I would certainly expect that your updates would get slower overtime for a non-indexed date match. 
Hth - robert


--
--
Robert W. Burgholzer
 'Making the simple complicated is commonplace; making the complicated simple, awesomely simple, that's creativity.'  - Charles Mingus


_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users


_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users