[gdal-dev] ReadRaster memory hit

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[gdal-dev] ReadRaster memory hit

Nicolas Cadieux
Hi,

I am writing a python script and I need to read certain pixels form a  huge Hyperspectral images with  over 200 band.  If I use the method below from the tutorial,  if I understand correctly, memory hits will only come from what I place in the buffer? In this case, only one line of pixels? Therefore, even if the raster is huge, only that one line of pixels will actually be in memory?

Thanks
Nicolas


scanline = band.ReadRaster(xoff=0, yoff=0,
xsize=band.XSize, ysize=1,
buf_xsize=band.XSize, buf_ysize=1,
buf_type=gdal.GDT_Float32)
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: ReadRaster memory hit

Even Rouault-2
On mardi 12 mars 2019 01:56:34 CET Nicolas Cadieux wrote:
> Hi,
>
> I am writing a python script and I need to read certain pixels form a  huge
> Hyperspectral images with  over 200 band.  If I use the method below from
> the tutorial,  if I understand correctly, memory hits will only come from
> what I place in the buffer? In this case, only one line of pixels?
> Therefore, even if the raster is huge, only that one line of pixels will
> actually be in memory?

In theory yes, but that might depend on the driver and actual data
organization. If the data is pixel-interleaved, then the driver might buffer
the data for the line for all the bands of the raster, even if you request
reading just one like you do.

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: ReadRaster memory hit

Nicolas Cadieux
Thanks again Even,

Is there a way I can test for this data organization and warn the user? I know very little about this aspect but I enjoy the learning curve.

As a follow up (if I may abuse!),  I have one file with thousands points and one raster with hundreds of bands, both huge.

I have been querying the raster using the geotransform parameters one band at a time using a pool of workers “p.imap()” to call a function (python 2.7).   This way, I can multithread 6 bands at a time.

Would it be a better approach to loop through the 200 bands inside this function?  So basically, is it faster to loop the 200 bands or to loop the thousands of points one band at a time inside this function?

Merci, Your wisdom and help is always appreciated!
Nicolas

> Le 12 mars 2019 à 07:58, Even Rouault <[hidden email]> a écrit :
>
>> On mardi 12 mars 2019 01:56:34 CET Nicolas Cadieux wrote:
>> Hi,
>>
>> I am writing a python script and I need to read certain pixels form a  huge
>> Hyperspectral images with  over 200 band.  If I use the method below from
>> the tutorial,  if I understand correctly, memory hits will only come from
>> what I place in the buffer? In this case, only one line of pixels?
>> Therefore, even if the raster is huge, only that one line of pixels will
>> actually be in memory?
>
> In theory yes, but that might depend on the driver and actual data
> organization. If the data is pixel-interleaved, then the driver might buffer
> the data for the line for all the bands of the raster, even if you request
> reading just one like you do.
>
> Even
>
> --
> Spatialys - Geospatial professional services
> http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: ReadRaster memory hit

Nicolas Cadieux
Hi,

I did not get an answer for my question bellow so I did some testing.  I write to help others that will find this post.  (I am making a point to raster query tool.) It’s much faster to load a cloud of point to a function and to loop a query to each band than to load a single band to a function and to loop through the points.  Tested on 6 million points and a 288 band raster.  This might be obvious to some but I needed to test it.  

Cheers
Nicolas

“Would it be a better approach to loop through the 200 bands inside this function?  So basically, is it faster to loop the 200 bands or to loop the thousands of points one band at a time inside this function? “

> Le 12 mars 2019 à 08:49, Nicolas Cadieux <[hidden email]> a écrit :
>
> Would it be a better approach to loop through the 200 bands inside this function?  So basically, is it faster to loop the 200 bands or to loop the thousands of points one band at a time inside this function?
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev