On Sat, Jun 16, 2018 at 10:00 PM, Even Rouault <

[hidden email]> wrote:

>

> >

> > I checked, results with gdalinfo -stats are wrong because existing

> > STATISTICS_* metadata are reported even if approximate statistics are not

> > allowed.

>

> No, if STATISTICS_APPROXIMATE=YES and is set in .aux.xml (because initial

> computation was done with -approx_stats) and you do gdalinfo -stats after,

> then statistics will be recomputed on all samples and

> STATISTICS_APPROXIMATE=YES will be cleared

IMHO, metadata can not be trusted because it is (often) not known which software generated these metadata with which method. The only safe assumption is that statistics in metadata can be approximations. Therefore statistics from metadata should only be reported if approximations are ok, no matter if STATISTICS_APPROXIMATE=YES exists or not.

>

> > The problem is, STATISTICS_APPROXIMATE is not set. Other software

> > using GDAL to create raster datasets may use

> > GDALRasterBand::SetStatistics() which does not indicate if stats are

> > approximations., i.e. stats are approximations but there is no

> > STATISTICS_APPROXIMATE=YES.

>

> The idea is that if you use GDALRasterBand::SetStatistics() then you are

> assumed to provide exact statistics. If they are only approximate, then you

> should also set STATISTICS_APPROXIMATE=YES with GDALSetMetadataItem()

What about 1) all the datasets that have already been created, 2) all the third-party software packages providing some sort of statistics in metadata?

IMHO, the STATISTICS_APPROXIMATE=YES mechanism does not work because of 1) and 2).

>

> >

> > GDAL assumes that STATISTICS_* metadata represent stats on all pixels, this

> > is IMHO wrong. You can only hope that STATISTICS_* metadata represent stats

> > on all pixels if a respective metadata item has been set to boolean true,

> > something like STATISTICS_ALL_PIXELS=YES.

>

> I'm really confused. Why introducing yet another item whereas

> STATISTICS_APPROXIMATE=YES is there for that purpose ?

Simpler: statistics in metadata can be approximations, also if STATISTICS_APPROXIMATE=YES is not set. If exact statistics are requested, scan all pixels. A new metadata item like STATISTICS_APPROXIMATE or STATISTICS_ALL_PIXELS is not needed.

>

> > Even in this case, an option to

> > force recomputing raster band stats would be very nice to have (verifying

> > metadata).

>

> ComputeStatistics() will recompute statistics. It is true that with gdalinfo -

> stats, they are not recomputed if they already exist and were not approximate

> since it calls GetStatistics() and not ComputeStatistics(). An easy workaround

> is to delete the .aux.xml to force recomputation.

At least recomputation of min/max can already be forced with gdalinfo -mm.

It would be nice if gdalinfo -stats would also trigger forced recomputation of exact statistics. Currently there is no difference between gdalinfo with and without -stats if statistics already exist in metadata and STATISTICS_APPROXIMATE=YES is absent (standard case for existing data): -stats has no effect here. Forced recomputation would be a change in the behaviour of gdalinfo -stats.

The purpose of gdalinfo -approx-stats would be (already is?) to quickly get stats for a raster band, either from metadata or by approximation.

Trying to avoid another option like gdalinfo -exact-stats.

Markus