[gdal-dev] gdalinfo -mm also report n (number of grid cells that are not nodata)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[gdal-dev] gdalinfo -mm also report n (number of grid cells that are not nodata)

Markus Metz-3
Many raster bands contain nodata, therefore it would be helpful if the number of non-nodata grid cells could be reported by gdalinfo in some way.

The number of non-nodata grid cells is not available through GDALGetRasterStatistics() / GDALRasterBand::GetStatistics(), probably because approximations can be stored as raster band statistics, and the number of non-nodata grid cells is unknown in case of approximations.

Therefore I suggest to report the number of non-nodata grid cells only if all grid cells are evaluated, i.e. with

gdalinfo -mm
Force computation of the actual min/max values for each band in the dataset.

The number of non-nodata grid cells would be a small piece of quite useful information to quickly evaluate the usefulness of a raster band.

Asking for the possibility to enhance a bit the output of gdalinfo,

Markus

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: gdalinfo -mm also report n (number of grid cells that are not nodata)

Even Rouault-2
On mercredi 13 juin 2018 22:14:07 CEST Markus Metz wrote:

> Many raster bands contain nodata, therefore it would be helpful if the
> number of non-nodata grid cells could be reported by gdalinfo in some way.
>
> The number of non-nodata grid cells is not available through
> GDALGetRasterStatistics() / GDALRasterBand::GetStatistics(), probably
> because approximations can be stored as raster band statistics, and the
> number of non-nodata grid cells is unknown in case of approximations.
>
> Therefore I suggest to report the number of non-nodata grid cells only if
> all grid cells are evaluated, i.e. with
>
> *gdalinfo -mm* Force computation of the actual min/max values for each band
> in the dataset.
>
> The number of non-nodata grid cells would be a small piece of quite useful
> information to quickly evaluate the usefulness of a raster band.
>
> Asking for the possibility to enhance a bit the output of gdalinfo,

Markus,

That would be more something triggered by -stats since this number of non-
nondata cells is an intermediate result of GDALRasterBand::ComputeStatistics()
in gcore/gdalrasterband.cpp. To make it sensical even in the case of
approximate statistics, why not computing a percentage instead ? That would
need to be saved in a metadata item, let's say STATISTICS_VALID_RATIO (between
0 and 1) ? Wanna take a crack at implementing this ?Otherwise file an
enhancement ticket about that.
By the way, since GDAL 2.3.0, there's a STATISTICS_APPROXIMATE=YES metadata
item so as to know if the stats have been computed in an approx way or not

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: gdalinfo -mm also report n (number of grid cells that are not nodata)

Markus Metz-3


On Wed, Jun 13, 2018 at 11:03 PM, Even Rouault <[hidden email]> wrote:

>
> On mercredi 13 juin 2018 22:14:07 CEST Markus Metz wrote:
> > Many raster bands contain nodata, therefore it would be helpful if the
> > number of non-nodata grid cells could be reported by gdalinfo in some way.
> >
> > The number of non-nodata grid cells is not available through
> > GDALGetRasterStatistics() / GDALRasterBand::GetStatistics(), probably
> > because approximations can be stored as raster band statistics, and the
> > number of non-nodata grid cells is unknown in case of approximations.
> >
> > Therefore I suggest to report the number of non-nodata grid cells only if
> > all grid cells are evaluated, i.e. with
> >
> > *gdalinfo -mm* Force computation of the actual min/max values for each band
> > in the dataset.
> >
> > The number of non-nodata grid cells would be a small piece of quite useful
> > information to quickly evaluate the usefulness of a raster band.
> >
> > Asking for the possibility to enhance a bit the output of gdalinfo,
>
> Markus,
>
> That would be more something triggered by -stats since this number of non-
> nondata cells is an intermediate result of GDALRasterBand::ComputeStatistics()
> in gcore/gdalrasterband.cpp. To make it sensical even in the case of
> approximate statistics, why not computing a percentage instead ? That would
> need to be saved in a metadata item, let's say STATISTICS_VALID_RATIO (between
> 0 and 1) ?

Considering approximate statistics, something like STATISTICS_VALID_RATIO would make sense.

However, I am not interested in approximate statistics, therefore my suggestion to tie it to -mm.

> Wanna take a crack at implementing this ?

Too many API changes required, starting with GDALGetRasterStatistics(), GDALComputeRasterStatistics() and all else following. There must be an easier solution. But thanks for your offer:-)

My workaround is to use GRASS r.external to get correct raster band statistics.

> Otherwise file an
> enhancement ticket about that.
> By the way, since GDAL 2.3.0, there's a STATISTICS_APPROXIMATE=YES metadata
> item so as to know if the stats have been computed in an approx way or not

Enhancement request: set STATISTICS_APPROXIMATE=UNKNOWN if it is unknown. I checked with some externally provided sample data and GDAL 2.3.0 does not report STATISTICS_APPROXIMATE.

I understand that GDAL has to cater for approximate statistics and should not override metadata provided with raster bands created by other software. It would help if STATISTICS_APPROXIMATE is always set to NO|YES|UNKNOWN.

Markus


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: gdalinfo -mm also report n (number of grid cells that are not nodata)

Even Rouault-2
> > Wanna take a crack at implementing this ?
>
> Too many API changes required, starting with GDALGetRasterStatistics(),
> GDALComputeRasterStatistics() and all else following. There must be an
> easier solution. But thanks for your offer:-)

We don't necessary need to change the API. The function could just set the
metadata item, and the client would read it with GDALGetMetadataItem()

> Enhancement request: set STATISTICS_APPROXIMATE=UNKNOWN if it is unknown. I
> checked with some externally provided sample data and GDAL 2.3.0 does not
> report STATISTICS_APPROXIMATE.

I didn't want to pollute statistics with STATISTICS_APPROXIMATE=NO when exact
computation has been done. So basically you have STATISTICS_APPROXIMATE=YES or
no STATISTICS_APPROXIMATE at all when exact computation has been done (which
is the normal assumption)

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: gdalinfo -mm also report n (number of grid cells that are not nodata)

Markus Metz-3


On Thu, Jun 14, 2018 at 10:55 PM, Even Rouault <[hidden email]> wrote:
>
> > > Wanna take a crack at implementing this ?
> >
> > Too many API changes required, starting with GDALGetRasterStatistics(),
> > GDALComputeRasterStatistics() and all else following. There must be an
> > easier solution. But thanks for your offer:-)
>
> We don't necessary need to change the API. The function could just set the
> metadata item, and the client would read it with GDALGetMetadataItem()

Thinking about it, I do not want to support approximate statistics, therefore something like STATISTICS_VALID_RATIO does not work for me, only something like STATISTICS_N_VALID which requires exact statistics. Approximate statistics are confusing for users, unless it is made clear that these statistics are approximations. E.g. a user calculates exact statistics with some software other than GDAL and then wonders why results are different between gdalinfo and that some software known to calculate exact statistics.
>
> > Enhancement request: set STATISTICS_APPROXIMATE=UNKNOWN if it is unknown. I
> > checked with some externally provided sample data and GDAL 2.3.0 does not
> > report STATISTICS_APPROXIMATE.
>
> I didn't want to pollute statistics with STATISTICS_APPROXIMATE=NO when exact
> computation has been done. So basically you have STATISTICS_APPROXIMATE=YES or
> no STATISTICS_APPROXIMATE at all when exact computation has been done (which
> is the normal assumption)

Looking at random samples, the normal assumption must be STATISTICS_APPROXIMATE=YES if STATISTICS_APPROXIMATE is not set. IMHO, GDAL should set STATISTICS_APPROXIMATE=YES unless GDAL itself has computed exact statistics. If in doubt set STATISTICS_APPROXIMATE=YES.

I am aware it's a conflict between providing results quickly to users and providing exact results to users which might take longer to get. I am for exact results.

Markus


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: gdalinfo -mm also report n (number of grid cells that are not nodata)

Even Rouault-2
> Thinking about it, I do not want to support approximate statistics,
> therefore something like STATISTICS_VALID_RATIO does not work for me, only
> something like STATISTICS_N_VALID which requires exact statistics.

STATISTICS_VALID_RATIO makes more sense to me that absolute number of pixels.
I assume you want to know if you have only 10% or 99.5% valid pixels to decide
if you want to process the image, rather than knowning if it is 10 or 1
million (similarly to cloudiness value that is usually given as a percentage).
For exact statistics, both relative or absolute number are strictly
equivalent.

The advantage of using the ratio is that it still makes sense for approximate
statistics.

For your use case, you check if STATISTICS_APPROXIMATE=YES is present or not
to decide if you can trust STATISTICS_VALID_RATIO

> Approximate statistics are confusing for users, unless it is made clear
> that these statistics are approximations.

It is know, since STATISTICS_APPROXIMATE=YES is now set if you compute
approximate statistics.

> Looking at random samples, the normal assumption must be
> STATISTICS_APPROXIMATE=YES if STATISTICS_APPROXIMATE is not set. IMHO, GDAL
> should set STATISTICS_APPROXIMATE=YES unless GDAL itself has computed exact
> statistics.

That's what GDAL 2.3.0 now does. Check the output of gdalinfo -stats vs
gdalinfo -approx_stats.

Even


--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: gdalinfo -mm also report n (number of grid cells that are not nodata)

jratike80
In reply to this post by Markus Metz-3
Markus Metz-3 wrote
> My workaround is to use GRASS r.external to get correct raster band
> statistics.

Hi,

Wouldn't it be good to have an option in GDAL to force it to re-compute
statistics? After all, statistics is just metadata and they can be all wrong
if you have touched the image with a third-party software.

-Jukka Rahkonen-



--
Sent from: http://osgeo-org.1560.x6.nabble.com/GDAL-Dev-f3742093.html
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: gdalinfo -mm also report n (number of grid cells that are not nodata)

Markus Metz-3


On Sat, Jun 16, 2018 at 6:38 PM, jratike80 <[hidden email]> wrote:
>
> Markus Metz-3 wrote
> > My workaround is to use GRASS r.external to get correct raster band
> > statistics.
>
> Hi,
>
> Wouldn't it be good to have an option in GDAL to force it to re-compute
> statistics? After all, statistics is just metadata and they can be all wrong
> if you have touched the image with a third-party software.

That would be helpful.

Regarding gdalinfo,
 - gdalinfo would report what it does right now (includes any STATISTICS_* metadata items)
 - gdalinfo -stats would compute raster band stats using all grid cells, ignoring any STATISTICS_* metadata items
 - gdalinfo -approx-stats would compute raster band stats using a sample of all grid cells, ignoring any STATISTICS_* metadata items

Markus
>
> -Jukka Rahkonen-
>
>
>
> --
> Sent from: http://osgeo-org.1560.x6.nabble.com/GDAL-Dev-f3742093.html
> _______________________________________________
> gdal-dev mailing list
> [hidden email]
> https://lists.osgeo.org/mailman/listinfo/gdal-dev


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: gdalinfo -mm also report n (number of grid cells that are not nodata)

Markus Metz-3
In reply to this post by Even Rouault-2


On Fri, Jun 15, 2018 at 10:43 PM, Even Rouault <[hidden email]> wrote:
>
> > Thinking about it, I do not want to support approximate statistics,
> > therefore something like STATISTICS_VALID_RATIO does not work for me, only
> > something like STATISTICS_N_VALID which requires exact statistics.
>
> STATISTICS_VALID_RATIO makes more sense to me that absolute number of pixels.

OK, considering that approximate statistics need to be supported, something like STATISTICS_VALID_RATIO is the only option.

Setting such a metadata item would be relatively easy to implement.

>

> > Approximate statistics are confusing for users, unless it is made clear
> > that these statistics are approximations.
>
> It is know, since STATISTICS_APPROXIMATE=YES is now set if you compute
> approximate statistics.
>
> > Looking at random samples, the normal assumption must be
> > STATISTICS_APPROXIMATE=YES if STATISTICS_APPROXIMATE is not set. IMHO, GDAL
> > should set STATISTICS_APPROXIMATE=YES unless GDAL itself has computed exact
> > statistics.
>
> That's what GDAL 2.3.0 now does. Check the output of gdalinfo -stats vs
> gdalinfo -approx_stats.

I checked, results with gdalinfo -stats are wrong because existing STATISTICS_* metadata are reported even if approximate statistics are not allowed. The problem is, STATISTICS_APPROXIMATE is not set. Other software using GDAL to create raster datasets may use GDALRasterBand::SetStatistics() which does not indicate if stats are approximations., i.e. stats are approximations but there is no STATISTICS_APPROXIMATE=YES.

GDAL assumes that STATISTICS_* metadata represent stats on all pixels, this is IMHO wrong. You can only hope that STATISTICS_* metadata represent stats on all pixels if a respective metadata item has been set to boolean true, something like STATISTICS_ALL_PIXELS=YES. Even in this case, an option to force recomputing raster band stats would be very nice to have (verifying metadata).

STATISTICS_EXACT is not an option because there are different ways to calculate mean and stddev using a fixed set of values. The different methods are all correct (exact) in their own way, but results may be different.

Markus


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: gdalinfo -mm also report n (number of grid cells that are not nodata)

Even Rouault-2
>
> I checked, results with gdalinfo -stats are wrong because existing
> STATISTICS_* metadata are reported even if approximate statistics are not
> allowed.

No, if STATISTICS_APPROXIMATE=YES and is set in .aux.xml (because initial
computation was done with -approx_stats) and you do gdalinfo -stats after,
then statistics will be recomputed on all samples and
STATISTICS_APPROXIMATE=YES  will be cleared

Demo:
$ gdalinfo -approx_stats test.tif
[...]
  Metadata:
    STATISTICS_APPROXIMATE=YES
    STATISTICS_MAXIMUM=206
    STATISTICS_MEAN=126.17083333333
    STATISTICS_MINIMUM=74
    STATISTICS_STDDEV=21.548781465291

$ gdalinfo -stats  test.tif
[...]
  Metadata:
    STATISTICS_MAXIMUM=255
    STATISTICS_MEAN=126.765
    STATISTICS_MINIMUM=74
    STATISTICS_STDDEV=22.928470838676



> The problem is, STATISTICS_APPROXIMATE is not set. Other software
> using GDAL to create raster datasets may use
> GDALRasterBand::SetStatistics() which does not indicate if stats are
> approximations., i.e. stats are approximations but there is no
> STATISTICS_APPROXIMATE=YES.

The idea is that if you use GDALRasterBand::SetStatistics()  then you are
assumed to provide exact statistics. If they are only approximate, then you
should also set STATISTICS_APPROXIMATE=YES with GDALSetMetadataItem()

>
> GDAL assumes that STATISTICS_* metadata represent stats on all pixels, this
> is IMHO wrong. You can only hope that STATISTICS_* metadata represent stats
> on all pixels if a respective metadata item has been set to boolean true,
> something like STATISTICS_ALL_PIXELS=YES.

I'm really confused. Why introducing yet another item whereas
STATISTICS_APPROXIMATE=YES is there for that purpose ?

> Even in this case, an option to
> force recomputing raster band stats would be very nice to have (verifying
> metadata).

ComputeStatistics() will recompute statistics. It is true that with gdalinfo -
stats, they are not recomputed if they already exist and were not approximate
since it calls GetStatistics() and not ComputeStatistics(). An easy workaround
is to delete the .aux.xml to force recomputation.

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: gdalinfo -mm also report n (number of grid cells that are not nodata)

Markus Metz-3


On Sat, Jun 16, 2018 at 10:00 PM, Even Rouault <[hidden email]> wrote:
>
> >
> > I checked, results with gdalinfo -stats are wrong because existing
> > STATISTICS_* metadata are reported even if approximate statistics are not
> > allowed.
>
> No, if STATISTICS_APPROXIMATE=YES and is set in .aux.xml (because initial
> computation was done with -approx_stats) and you do gdalinfo -stats after,
> then statistics will be recomputed on all samples and
> STATISTICS_APPROXIMATE=YES  will be cleared

IMHO, metadata can not be trusted because it is (often) not known which software generated these metadata with which method. The only safe assumption is that statistics in metadata can be approximations. Therefore statistics from metadata should only be reported if approximations are ok, no matter if STATISTICS_APPROXIMATE=YES exists or not.

>
> > The problem is, STATISTICS_APPROXIMATE is not set. Other software
> > using GDAL to create raster datasets may use
> > GDALRasterBand::SetStatistics() which does not indicate if stats are
> > approximations., i.e. stats are approximations but there is no
> > STATISTICS_APPROXIMATE=YES.
>
> The idea is that if you use GDALRasterBand::SetStatistics()  then you are
> assumed to provide exact statistics. If they are only approximate, then you
> should also set STATISTICS_APPROXIMATE=YES with GDALSetMetadataItem()

What about 1) all the datasets that have already been created, 2) all the third-party software packages providing some sort of statistics in metadata?

IMHO, the STATISTICS_APPROXIMATE=YES mechanism does not work because of 1) and 2).
>
> >
> > GDAL assumes that STATISTICS_* metadata represent stats on all pixels, this
> > is IMHO wrong. You can only hope that STATISTICS_* metadata represent stats
> > on all pixels if a respective metadata item has been set to boolean true,
> > something like STATISTICS_ALL_PIXELS=YES.
>
> I'm really confused. Why introducing yet another item whereas
> STATISTICS_APPROXIMATE=YES is there for that purpose ?

Simpler: statistics in metadata can be approximations, also if STATISTICS_APPROXIMATE=YES is not set. If exact statistics are requested, scan all pixels. A new metadata item like STATISTICS_APPROXIMATE or STATISTICS_ALL_PIXELS is not needed.

>
> > Even in this case, an option to
> > force recomputing raster band stats would be very nice to have (verifying
> > metadata).
>
> ComputeStatistics() will recompute statistics. It is true that with gdalinfo -
> stats, they are not recomputed if they already exist and were not approximate
> since it calls GetStatistics() and not ComputeStatistics(). An easy workaround
> is to delete the .aux.xml to force recomputation.

At least recomputation of min/max can already be forced with gdalinfo -mm.

It would be nice if gdalinfo -stats would also trigger forced recomputation of exact statistics. Currently there is no difference between gdalinfo with and without -stats if statistics already exist in metadata and STATISTICS_APPROXIMATE=YES is absent (standard case for existing data): -stats has no effect here. Forced recomputation would be a change in the behaviour of gdalinfo -stats.

The purpose of gdalinfo -approx-stats would be (already is?) to quickly get stats for a raster band, either from metadata or by approximation.

Trying to avoid another option like gdalinfo -exact-stats.

Markus


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: gdalinfo -mm also report n (number of grid cells that are not nodata)

Even Rouault-2
> It would be nice if gdalinfo -stats would also trigger forced recomputation
> of exact statistics. Currently there is no difference between gdalinfo with
> and without -stats if statistics already exist in metadata and
> STATISTICS_APPROXIMATE=YES is absent (standard case for existing data):
> -stats has no effect here. Forced recomputation would be a change in the
> behaviour of gdalinfo -stats.

gdalinfo already reports existing statistics, so indeed gdalinfo -stats could
call GDALComputeRasterStatistics() to force regeneration.

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: gdalinfo -mm also report n (number of grid cells that are not nodata)

Markus Metz-3


On Sun, Jun 17, 2018 at 7:51 PM, Even Rouault <[hidden email]> wrote:

>
> > It would be nice if gdalinfo -stats would also trigger forced recomputation
> > of exact statistics. Currently there is no difference between gdalinfo with
> > and without -stats if statistics already exist in metadata and
> > STATISTICS_APPROXIMATE=YES is absent (standard case for existing data):
> > -stats has no effect here. Forced recomputation would be a change in the
> > behaviour of gdalinfo -stats.
>
> gdalinfo already reports existing statistics, so indeed gdalinfo -stats could
> call GDALComputeRasterStatistics() to force regeneration.

:-)

I will take a crack at implementing STATISTICS_VALID_RATIO (no API change, but some mechanism to compute statistics also if the standard STATISTICS_* metadata items exist is needed)

Markus

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev