[pdal] Colorization using a large VRT raster is slow, but using a small VRT raster is fast ... but why?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[pdal] Colorization using a large VRT raster is slow, but using a small VRT raster is fast ... but why?

mrosen



I am using the filters.colorization with a .vrt built by gdalbuiltvrt.  The raster dataset actually contains 2200 files which I believe are all correctly georeferenced and non-overlapping.  Of those, there are only 19 that actually intersect the point cloud being colorized.

What I'm observing is that if I build the vrt with only the 19 files, the colorization runs in less than two minutes.  However, if build the vrt with all of them, then it takes an unacceptably long time (I've not watched it finish but I'm keeping an eye on the open file handles ... it's finding right files but it's just moving through them really slowly).

I would expect that the VRT would be able to immediately provide the pixels from the right raster tile, making the number of tiles in the mosaic irrelevant.  That's clearly not the case (does it not do some sort of indexing here?).  Can anyone offer an explanation / fix?  This is important because I actually have many LAS tiles to colorize and while all of them are contained in the bounds of the large mosaic, they each have a different extent.

I guess a work around would be to build small VRTs based on the geographic extent of each LAS tile.  But how?

_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] Colorization using a large VRT raster is slow, but using a small VRT raster is fast ... but why?

Michael Smith
Michael,

VRTs are not spatial indexed. I wonder if you did a tileindex and created a vrt on that (as pdal wouldn't open a tileindex directly) if it would work a lot faster since a tileindex is spatially indexed.

Mike

-- 
Michael Smith
Remote Sensing/GIS Center
US Army Corps of Engineers

From: pdal <[hidden email]> on behalf of Michael Rosen <[hidden email]>
Date: Wednesday, October 11, 2017 at 12:48 PM
To: pdal <[hidden email]>
Subject: [pdal] Colorization using a large VRT raster is slow, but using a small VRT raster is fast ... but why?




I am using the filters.colorization with a .vrt built by gdalbuiltvrt.  The raster dataset actually contains 2200 files which I believe are all correctly georeferenced and non-overlapping.  Of those, there are only 19 that actually intersect the point cloud being colorized.

What I'm observing is that if I build the vrt with only the 19 files, the colorization runs in less than two minutes.  However, if build the vrt with all of them, then it takes an unacceptably long time (I've not watched it finish but I'm keeping an eye on the open file handles ... it's finding right files but it's just moving through them really slowly).

I would expect that the VRT would be able to immediately provide the pixels from the right raster tile, making the number of tiles in the mosaic irrelevant.  That's clearly not the case (does it not do some sort of indexing here?).  Can anyone offer an explanation / fix?  This is important because I actually have many LAS tiles to colorize and while all of them are contained in the bounds of the large mosaic, they each have a different extent.

I guess a work around would be to build small VRTs based on the geographic extent of each LAS tile.  But how?
_______________________________________________ pdal mailing list [hidden email] https://lists.osgeo.org/mailman/listinfo/pdal

_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] Colorization using a large VRT raster is slow, but using a small VRT raster is fast ... but why?

Andrew Bell
In reply to this post by mrosen
PDAL reads the entire raster into memory, assuming that it pretty much matches the extent of the point set that you're coloring.  Seems that this is a bad assumption in your case.  The code would have to be changed to limit the portion of the raster that you've created that gets read.  Feel free to create a ticket.

On Wed, Oct 11, 2017 at 12:48 PM, Michael Rosen <[hidden email]> wrote:



I am using the filters.colorization with a .vrt built by gdalbuiltvrt.  The raster dataset actually contains 2200 files which I believe are all correctly georeferenced and non-overlapping.  Of those, there are only 19 that actually intersect the point cloud being colorized.

What I'm observing is that if I build the vrt with only the 19 files, the colorization runs in less than two minutes.  However, if build the vrt with all of them, then it takes an unacceptably long time (I've not watched it finish but I'm keeping an eye on the open file handles ... it's finding right files but it's just moving through them really slowly).

I would expect that the VRT would be able to immediately provide the pixels from the right raster tile, making the number of tiles in the mosaic irrelevant.  That's clearly not the case (does it not do some sort of indexing here?).  Can anyone offer an explanation / fix?  This is important because I actually have many LAS tiles to colorize and while all of them are contained in the bounds of the large mosaic, they each have a different extent.

I guess a work around would be to build small VRTs based on the geographic extent of each LAS tile.  But how?

_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal



--
Andrew Bell
[hidden email]

_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] Colorization using a large VRT raster is slow, but using a small VRT raster is fast ... but why?

Andrew Bell
Correcting myself here.  I'm wrong that we read the entire raster.  We let GDAL do all the work and assume it's efficient.

On Wed, Oct 11, 2017 at 12:56 PM, Andrew Bell <[hidden email]> wrote:
PDAL reads the entire raster into memory, assuming that it pretty much matches the extent of the point set that you're coloring.  Seems that this is a bad assumption in your case.  The code would have to be changed to limit the portion of the raster that you've created that gets read.  Feel free to create a ticket.

On Wed, Oct 11, 2017 at 12:48 PM, Michael Rosen <[hidden email]> wrote:



I am using the filters.colorization with a .vrt built by gdalbuiltvrt.  The raster dataset actually contains 2200 files which I believe are all correctly georeferenced and non-overlapping.  Of those, there are only 19 that actually intersect the point cloud being colorized.

What I'm observing is that if I build the vrt with only the 19 files, the colorization runs in less than two minutes.  However, if build the vrt with all of them, then it takes an unacceptably long time (I've not watched it finish but I'm keeping an eye on the open file handles ... it's finding right files but it's just moving through them really slowly).

I would expect that the VRT would be able to immediately provide the pixels from the right raster tile, making the number of tiles in the mosaic irrelevant.  That's clearly not the case (does it not do some sort of indexing here?).  Can anyone offer an explanation / fix?  This is important because I actually have many LAS tiles to colorize and while all of them are contained in the bounds of the large mosaic, they each have a different extent.

I guess a work around would be to build small VRTs based on the geographic extent of each LAS tile.  But how?

_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal



--
Andrew Bell
[hidden email]



--
Andrew Bell
[hidden email]

_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] Colorization using a large VRT raster is slow, but using a small VRT raster is fast ... but why?

Howard Butler-3
Another sensitivity is PDAL reads single pixels for each point. If your data are organized in a way that has them being read in an order that scatters reads all over the raster, it wrecks the efficiency of GDAL's default raster cache. A way to overcome that is to adjust GDAL's GDAL_CACHEMAX variable as https://www.pdal.io/stages/filters.colorization.html#considerations talks about.

That won't do much for your huge VRT scenario, but increasing it is likely to significantly speed up your 19 file one.

Your huge VRT scenario should be attacked with a GDAL tileindex http://www.gdal.org/gdaltindex.html like Mike describes. 

On Oct 11, 2017, at 1:04 PM, Andrew Bell <[hidden email]> wrote:

Correcting myself here.  I'm wrong that we read the entire raster.  We let GDAL do all the work and assume it's efficient.

On Wed, Oct 11, 2017 at 12:56 PM, Andrew Bell <[hidden email]> wrote:
PDAL reads the entire raster into memory, assuming that it pretty much matches the extent of the point set that you're coloring.  Seems that this is a bad assumption in your case.  The code would have to be changed to limit the portion of the raster that you've created that gets read.  Feel free to create a ticket.

On Wed, Oct 11, 2017 at 12:48 PM, Michael Rosen <[hidden email]> wrote:



I am using the filters.colorization with a .vrt built by gdalbuiltvrt.  The raster dataset actually contains 2200 files which I believe are all correctly georeferenced and non-overlapping.  Of those, there are only 19 that actually intersect the point cloud being colorized.

What I'm observing is that if I build the vrt with only the 19 files, the colorization runs in less than two minutes.  However, if build the vrt with all of them, then it takes an unacceptably long time (I've not watched it finish but I'm keeping an eye on the open file handles ... it's finding right files but it's just moving through them really slowly).

I would expect that the VRT would be able to immediately provide the pixels from the right raster tile, making the number of tiles in the mosaic irrelevant.  That's clearly not the case (does it not do some sort of indexing here?).  Can anyone offer an explanation / fix?  This is important because I actually have many LAS tiles to colorize and while all of them are contained in the bounds of the large mosaic, they each have a different extent.

I guess a work around would be to build small VRTs based on the geographic extent of each LAS tile.  But how?

_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal



--
Andrew Bell
[hidden email]



--
Andrew Bell
[hidden email]
_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal


_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal