[gdal-dev] Outdated external overviews

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[gdal-dev] Outdated external overviews

Julien Michel-2
Dear all,

I recently came accross an issue in our software which uses overviews to
speed-up navigation in image : the image can change after an external
overview has been generated (for instance it has been re-generated by an
upstream processing chain with different parameters), this can lead to
display bugs or even crash in client code (if for instance the number of
bands has changed). Of course this is not a problem for the user that
knows what she is doing : simply removing and generating the overview
again fixes the problem. But for software that offers overview
generation to the end user, this might become an issue, as the software
has no clue wether the ovr file is outdated or not.

Possible mecanism to prevent this would be :

- Check file last modification time between external overview and image
file. If image file is newer than overview files, it is probably outdated.

- Encode image checksum in ovr file, and compare it upon loading (might
be a bit intensive).

With those checks, gdal could detect that the ovr file is outdated and
simply ignore it. Then client code could be aware that there are
actually no overviews for this image and take actions to generate new ones.

Any thought ?

Regards,

Julien

PS: This of course does not apply if overviews are internal

--
Julien MICHEL
CNES - DCT/SI/AP

_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Outdated external overviews

Even Rouault-2
Hi Julien,
>
> I recently came accross an issue in our software which uses overviews to
> speed-up navigation in image : the image can change after an external
> overview has been generated (for instance it has been re-generated by an
> upstream processing chain with different parameters), this can lead to
> display bugs or even crash in client code (if for instance the number of
> bands has changed)

Are the crashes you mention occuring in GDAL ? If so that should be fixed. From
a quick try, RasterIO() requests that involve overviews trigger a proper error
if the .ovr has less bands than the full resolution dataset. Of course if
client code directly uses GetOverview(), it must be careful to check for NULL
pointer.

> . Of course this is not a problem for the user that
> knows what she is doing : simply removing and generating the overview
> again fixes the problem. But for software that offers overview
> generation to the end user, this might become an issue, as the software
> has no clue wether the ovr file is outdated or not.
>
> Possible mecanism to prevent this would be :
>
> - Check file last modification time between external overview and image
> file. If image file is newer than overview files, it is probably outdated.

"Probably" yes :-) But I can imagine workflows where the overviews are not
necessarily a subsampling of the full resolution dataset, in which case such
behaviour wouldn't be desirable. I'm also wondering if that wouldn't cause
problems when datasets are copied. Hopefully timestamps should be preserved
most of the time, but I guess you could find situations where they are not and
where the overview could be copied before the main dataset.

>
> - Encode image checksum in ovr file, and compare it upon loading (might
> be a bit intensive).

We definitely don't want to do that on gigabyte sized datasets...

>
> With those checks, gdal could detect that the ovr file is outdated and
> simply ignore it. Then client code could be aware that there are
> actually no overviews for this image and take actions to generate new ones.

The client code can also use the GDALDataset::GetFileList() API and see if
there's a .ovr file listed in there. And thus decide to apply a timestamp based
logic if it whishes.

Or one could imagine to put that logic into GDAL itself, but I think that
should be an option explicitly set.

There would be a subtelty also. Imagine that the external overview is
outdated, and you want to update it, but not recreate it from scratch. For
example if you know which area has been updated in full resolution dataset.
Then you'd want the overview to be still accessible. So hiding the overview
should only apply for datasets opened in read-only mode.

>
> PS: This of course does not apply if overviews are internal

You probably meant that detecting that overviews are outdated is even less
doable than for the external case, but that situation can still happen.

Even


--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Outdated external overviews

Julien Michel-2
Hi Even,

Thanks for your response. I understand all your points, and that Gdal
requirements usually fall far beyond our limited use cases ... We can of
course do a lot in the client code (starting with comparing timestamps),
but this is not specific to our software, I bet Qgis for instance will
happily use outdated overviews without notice. This is why I am
wondering if it could not be handled more cleanly at the Gdal level. I
understand though that this is not an easy problem to fix.

Regards,

Julien

Le 07/10/2016 à 16:46, Even Rouault a écrit :

> Hi Julien,
>> I recently came accross an issue in our software which uses overviews to
>> speed-up navigation in image : the image can change after an external
>> overview has been generated (for instance it has been re-generated by an
>> upstream processing chain with different parameters), this can lead to
>> display bugs or even crash in client code (if for instance the number of
>> bands has changed)
> Are the crashes you mention occuring in GDAL ? If so that should be fixed. From
> a quick try, RasterIO() requests that involve overviews trigger a proper error
> if the .ovr has less bands than the full resolution dataset. Of course if
> client code directly uses GetOverview(), it must be careful to check for NULL
> pointer.
>
>> . Of course this is not a problem for the user that
>> knows what she is doing : simply removing and generating the overview
>> again fixes the problem. But for software that offers overview
>> generation to the end user, this might become an issue, as the software
>> has no clue wether the ovr file is outdated or not.
>>
>> Possible mecanism to prevent this would be :
>>
>> - Check file last modification time between external overview and image
>> file. If image file is newer than overview files, it is probably outdated.
> "Probably" yes :-) But I can imagine workflows where the overviews are not
> necessarily a subsampling of the full resolution dataset, in which case such
> behaviour wouldn't be desirable. I'm also wondering if that wouldn't cause
> problems when datasets are copied. Hopefully timestamps should be preserved
> most of the time, but I guess you could find situations where they are not and
> where the overview could be copied before the main dataset.
>
>> - Encode image checksum in ovr file, and compare it upon loading (might
>> be a bit intensive).
> We definitely don't want to do that on gigabyte sized datasets...
>
>> With those checks, gdal could detect that the ovr file is outdated and
>> simply ignore it. Then client code could be aware that there are
>> actually no overviews for this image and take actions to generate new ones.
> The client code can also use the GDALDataset::GetFileList() API and see if
> there's a .ovr file listed in there. And thus decide to apply a timestamp based
> logic if it whishes.
>
> Or one could imagine to put that logic into GDAL itself, but I think that
> should be an option explicitly set.
>
> There would be a subtelty also. Imagine that the external overview is
> outdated, and you want to update it, but not recreate it from scratch. For
> example if you know which area has been updated in full resolution dataset.
> Then you'd want the overview to be still accessible. So hiding the overview
> should only apply for datasets opened in read-only mode.
>
>> PS: This of course does not apply if overviews are internal
> You probably meant that detecting that overviews are outdated is even less
> doable than for the external case, but that situation can still happen.
>
> Even
>
>


--
Julien MICHEL
CNES - DCT/SI/AP - BPI 1219
18, avenue Edouard Belin
31401 Toulouse Cedex 09 - France
Tel: +33 561 282 894 - Fax: +33 561 283 109

_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Outdated external overviews

Even Rouault-2
Julien,

>
> Thanks for your response. I understand all your points, and that Gdal
> requirements usually fall far beyond our limited use cases ...

I was mostly playing the devil advocate. I don't have in mind real world use
cases that would rely on .ovr being potentially older that the main file, while
still considering them valid (anyone having such use cases ?). And I'd suspect
they are probably very niche use cases, compared to the cases where the main
image has been refreshed but an outdated overview file is hanging around.
On the other hand, my main fear with your proposal would be timestamp issues
when copying datasets around that could make GDAL think that an .ovr is
outdated whereas it is not. But that's perhaps a unusual case too.
(That said we have already timestamp based logic in the GML driver where the
OGR generated .gfs file is only taken into account if more recent than the
.gml. I can remember having had issues with that with the autotest suite that
have required adding an explicit touch of the .gfs file before running the
test. Might be just a SVN / git thing, or perhaps the content of the .gml was
indeed minorly edited after the .gfs had been created.)

Thinking loud: would a USE_EVEN_IF_OLDER_THAN_MAIN_FILE=YES/NO metadata item
added to the .ovr would make sense to determine the later behaviour ? I dunno.
If folks are sufficiently aware to set this metadata to YES (assuming the new
default would be NO), then they might as well touch the .ovr file after
modifying the main file.

Anyway if you feel strong about changes in that area that have implications
regarding backward compatibile behaviour, a RFC would be appropriate.

Even


> We can of
> course do a lot in the client code (starting with comparing timestamps),
> but this is not specific to our software, I bet Qgis for instance will
> happily use outdated overviews without notice. This is why I am
> wondering if it could not be handled more cleanly at the Gdal level. I
> understand though that this is not an easy problem to fix.
>
> Regards,
>
> Julien
>
> Le 07/10/2016 à 16:46, Even Rouault a écrit :
> > Hi Julien,
> >
> >> I recently came accross an issue in our software which uses overviews to
> >> speed-up navigation in image : the image can change after an external
> >> overview has been generated (for instance it has been re-generated by an
> >> upstream processing chain with different parameters), this can lead to
> >> display bugs or even crash in client code (if for instance the number of
> >> bands has changed)
> >
> > Are the crashes you mention occuring in GDAL ? If so that should be
> > fixed. From a quick try, RasterIO() requests that involve overviews
> > trigger a proper error if the .ovr has less bands than the full
> > resolution dataset. Of course if client code directly uses
> > GetOverview(), it must be careful to check for NULL pointer.
> >
> >> . Of course this is not a problem for the user that
> >> knows what she is doing : simply removing and generating the overview
> >> again fixes the problem. But for software that offers overview
> >> generation to the end user, this might become an issue, as the software
> >> has no clue wether the ovr file is outdated or not.
> >>
> >> Possible mecanism to prevent this would be :
> >>
> >> - Check file last modification time between external overview and image
> >> file. If image file is newer than overview files, it is probably
> >> outdated.
> >
> > "Probably" yes :-) But I can imagine workflows where the overviews are
> > not necessarily a subsampling of the full resolution dataset, in which
> > case such behaviour wouldn't be desirable. I'm also wondering if that
> > wouldn't cause problems when datasets are copied. Hopefully timestamps
> > should be preserved most of the time, but I guess you could find
> > situations where they are not and where the overview could be copied
> > before the main dataset.
> >
> >> - Encode image checksum in ovr file, and compare it upon loading (might
> >> be a bit intensive).
> >
> > We definitely don't want to do that on gigabyte sized datasets...
> >
> >> With those checks, gdal could detect that the ovr file is outdated and
> >> simply ignore it. Then client code could be aware that there are
> >> actually no overviews for this image and take actions to generate new
> >> ones.
> >
> > The client code can also use the GDALDataset::GetFileList() API and see
> > if there's a .ovr file listed in there. And thus decide to apply a
> > timestamp based logic if it whishes.
> >
> > Or one could imagine to put that logic into GDAL itself, but I think that
> > should be an option explicitly set.
> >
> > There would be a subtelty also. Imagine that the external overview is
> > outdated, and you want to update it, but not recreate it from scratch.
> > For example if you know which area has been updated in full resolution
> > dataset. Then you'd want the overview to be still accessible. So hiding
> > the overview should only apply for datasets opened in read-only mode.
> >
> >> PS: This of course does not apply if overviews are internal
> >
> > You probably meant that detecting that overviews are outdated is even
> > less doable than for the external case, but that situation can still
> > happen.
> >
> > Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Outdated external overviews

jratike80
Hi,

Hard to say if such use case exists. Perhaps if .ovr is actually not subsampled from the main data at all but tweaked from let's say 1:50000 scale raster maps when the main maps are 1:20000.  I have created such overview stacks sometimes through .vrt. I am not sure if the automatic invalidating would have any effect in that case.

-Jukka Rahkonen-

Even Rouault-2 wrote
Julien,

>
> Thanks for your response. I understand all your points, and that Gdal
> requirements usually fall far beyond our limited use cases ...

I was mostly playing the devil advocate. I don't have in mind real world use
cases that would rely on .ovr being potentially older that the main file, while
still considering them valid (anyone having such use cases ?). And I'd suspect
they are probably very niche use cases, compared to the cases where the main
image has been refreshed but an outdated overview file is hanging around.
On the other hand, my main fear with your proposal would be timestamp issues
when copying datasets around that could make GDAL think that an .ovr is
outdated whereas it is not. But that's perhaps a unusual case too.
(That said we have already timestamp based logic in the GML driver where the
OGR generated .gfs file is only taken into account if more recent than the
.gml. I can remember having had issues with that with the autotest suite that
have required adding an explicit touch of the .gfs file before running the
test. Might be just a SVN / git thing, or perhaps the content of the .gml was
indeed minorly edited after the .gfs had been created.)

Thinking loud: would a USE_EVEN_IF_OLDER_THAN_MAIN_FILE=YES/NO metadata item
added to the .ovr would make sense to determine the later behaviour ? I dunno.
If folks are sufficiently aware to set this metadata to YES (assuming the new
default would be NO), then they might as well touch the .ovr file after
modifying the main file.

Anyway if you feel strong about changes in that area that have implications
regarding backward compatibile behaviour, a RFC would be appropriate.

Even


> We can of
> course do a lot in the client code (starting with comparing timestamps),
> but this is not specific to our software, I bet Qgis for instance will
> happily use outdated overviews without notice. This is why I am
> wondering if it could not be handled more cleanly at the Gdal level. I
> understand though that this is not an easy problem to fix.
>
> Regards,
>
> Julien
>
> Le 07/10/2016 à 16:46, Even Rouault a écrit :
> > Hi Julien,
> >
> >> I recently came accross an issue in our software which uses overviews to
> >> speed-up navigation in image : the image can change after an external
> >> overview has been generated (for instance it has been re-generated by an
> >> upstream processing chain with different parameters), this can lead to
> >> display bugs or even crash in client code (if for instance the number of
> >> bands has changed)
> >
> > Are the crashes you mention occuring in GDAL ? If so that should be
> > fixed. From a quick try, RasterIO() requests that involve overviews
> > trigger a proper error if the .ovr has less bands than the full
> > resolution dataset. Of course if client code directly uses
> > GetOverview(), it must be careful to check for NULL pointer.
> >
> >> . Of course this is not a problem for the user that
> >> knows what she is doing : simply removing and generating the overview
> >> again fixes the problem. But for software that offers overview
> >> generation to the end user, this might become an issue, as the software
> >> has no clue wether the ovr file is outdated or not.
> >>
> >> Possible mecanism to prevent this would be :
> >>
> >> - Check file last modification time between external overview and image
> >> file. If image file is newer than overview files, it is probably
> >> outdated.
> >
> > "Probably" yes :-) But I can imagine workflows where the overviews are
> > not necessarily a subsampling of the full resolution dataset, in which
> > case such behaviour wouldn't be desirable. I'm also wondering if that
> > wouldn't cause problems when datasets are copied. Hopefully timestamps
> > should be preserved most of the time, but I guess you could find
> > situations where they are not and where the overview could be copied
> > before the main dataset.
> >
> >> - Encode image checksum in ovr file, and compare it upon loading (might
> >> be a bit intensive).
> >
> > We definitely don't want to do that on gigabyte sized datasets...
> >
> >> With those checks, gdal could detect that the ovr file is outdated and
> >> simply ignore it. Then client code could be aware that there are
> >> actually no overviews for this image and take actions to generate new
> >> ones.
> >
> > The client code can also use the GDALDataset::GetFileList() API and see
> > if there's a .ovr file listed in there. And thus decide to apply a
> > timestamp based logic if it whishes.
> >
> > Or one could imagine to put that logic into GDAL itself, but I think that
> > should be an option explicitly set.
> >
> > There would be a subtelty also. Imagine that the external overview is
> > outdated, and you want to update it, but not recreate it from scratch.
> > For example if you know which area has been updated in full resolution
> > dataset. Then you'd want the overview to be still accessible. So hiding
> > the overview should only apply for datasets opened in read-only mode.
> >
> >> PS: This of course does not apply if overviews are internal
> >
> > You probably meant that detecting that overviews are outdated is even
> > less doable than for the external case, but that situation can still
> > happen.
> >
> > Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev