[gdal-dev] Raster statistics

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[gdal-dev] Raster statistics

Paul Meems
I have a drone raster file which I want to use for some calculation.
Before the calculation, I need to loose some extreme values.
I want to do something like a percentile calculation where you get all values, order them and loose the top 10%.
For this, I need to get all values first which can be slow when using a large file.

I looked at the statistics (band.GetStatistics) but that doesn't work well.
I thought I could use 2 times the standard deviation added to the mean to get roughly 97%.
But with these statistics:
    STATISTICS_MAXIMUM=33.186080932617
    STATISTICS_MEAN=24.840205979603
    STATISTICS_MINIMUM=1.5951598882675
    STATISTICS_STDDEV=4.7285348016053
Mean + 2*std is larger than the max.

So I moved to the histogram. It is also very fast, but I'm not sure how to use it.
I have this:
  256 buckets from 1.53322 to 33.248:
  410 77 66 66 65 58 56 45 42 87 57 72 61 65 68 70 73 82 93 ...
Does this mean, bucket 1 = 410 that I have 410 pixels of value 1.53322 and the second bucket means I have 77 pixels between 1.53322 and 1.657? 1.657 = 1.53322 + ((33.248 - 1.53322)/256)

Is this a good approach? Or can/should I use a different one.

Thanks,

Paul

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Raster statistics

Chris Waigl
I would not use gdal for this particular task. I presume you have the band data in a 2D numpy array. Then I’d get the 80th percentile for example with np.percentile() and use a boolean expression to generate a mask for the array (droneraster > perc80value ).

Chris

-- 
Christine (Chris) Waigl - [hidden email] -  +1-907-474-5483 - Skype: cwaigl_work
Geophysical Institute, UAF, 903 Koyukuk Drive, Fairbanks, AK 99775-7320, USA







On Aug 3, 2017, at 5:43 AM, Paul Meems <[hidden email]> wrote:

I have a drone raster file which I want to use for some calculation.
Before the calculation, I need to loose some extreme values.
I want to do something like a percentile calculation where you get all values, order them and loose the top 10%.
For this, I need to get all values first which can be slow when using a large file.

I looked at the statistics (band.GetStatistics) but that doesn't work well.
I thought I could use 2 times the standard deviation added to the mean to get roughly 97%.
But with these statistics:
    STATISTICS_MAXIMUM=33.186080932617
    STATISTICS_MEAN=24.840205979603
    STATISTICS_MINIMUM=1.5951598882675
    STATISTICS_STDDEV=4.7285348016053
Mean + 2*std is larger than the max.

So I moved to the histogram. It is also very fast, but I'm not sure how to use it.
I have this:
  256 buckets from 1.53322 to 33.248:
  410 77 66 66 65 58 56 45 42 87 57 72 61 65 68 70 73 82 93 ...
Does this mean, bucket 1 = 410 that I have 410 pixels of value 1.53322 and the second bucket means I have 77 pixels between 1.53322 and 1.657? 1.657 = 1.53322 + ((33.248 - 1.53322)/256)

Is this a good approach? Or can/should I use a different one.

Thanks,

Paul
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Raster statistics

Paul Meems
Thanks Chris for your reply.

I forgot to mention I'm not using GDAL with Python.
I use it with C++ and/or C#.




Paul

Paul Meems 
Release manager, configuration manager
and forum moderator of MapWindow GIS.
www.mapwindow.org

Owner of MapWindow.nl - Support for
Dutch speaking users.
www.mapwindow.nl


The MapWindow GIS project has moved to GitHub!


Download the latest MapWinGIS mapping engine.

Download the latest MapWindow 5 open source desktop application.


2017-08-03 20:05 GMT+02:00 Chris Waigl <[hidden email]>:
I would not use gdal for this particular task. I presume you have the band data in a 2D numpy array. Then I’d get the 80th percentile for example with np.percentile() and use a boolean expression to generate a mask for the array (droneraster > perc80value ).

Chris

-- 
Christine (Chris) Waigl - [hidden email] -  <a href="tel:(907)%20474-5483" value="+19074745483" target="_blank">+1-907-474-5483 - Skype: cwaigl_work
Geophysical Institute, UAF, 903 Koyukuk Drive, Fairbanks, AK 99775-7320, USA







On Aug 3, 2017, at 5:43 AM, Paul Meems <[hidden email]> wrote:

I have a drone raster file which I want to use for some calculation.
Before the calculation, I need to loose some extreme values.
I want to do something like a percentile calculation where you get all values, order them and loose the top 10%.
For this, I need to get all values first which can be slow when using a large file.

I looked at the statistics (band.GetStatistics) but that doesn't work well.
I thought I could use 2 times the standard deviation added to the mean to get roughly 97%.
But with these statistics:
    STATISTICS_MAXIMUM=33.186080932617
    STATISTICS_MEAN=24.840205979603
    STATISTICS_MINIMUM=1.5951598882675
    STATISTICS_STDDEV=4.7285348016053
Mean + 2*std is larger than the max.

So I moved to the histogram. It is also very fast, but I'm not sure how to use it.
I have this:
  256 buckets from 1.53322 to 33.248:
  410 77 66 66 65 58 56 45 42 87 57 72 61 65 68 70 73 82 93 ...
Does this mean, bucket 1 = 410 that I have 410 pixels of value 1.53322 and the second bucket means I have 77 pixels between 1.53322 and 1.657? 1.657 = 1.53322 + ((33.248 - 1.53322)/256)

Is this a good approach? Or can/should I use a different one.

Thanks,

Paul
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev



_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Raster statistics

ewenger

Paul,

 

You could call gdal_calc.py and pass it the numpy formulas on the command line…Otherwise it might be best to bring the raster in to OpenCV.

 

--Eric

 

From: gdal-dev [mailto:[hidden email]] On Behalf Of Paul Meems
Sent: Thursday, August 03, 2017 2:51 PM
To: Chris Waigl
Cc: [hidden email]
Subject: Re: [gdal-dev] Raster statistics

 

Thanks Chris for your reply.

 

I forgot to mention I'm not using GDAL with Python.

I use it with C++ and/or C#.

 




Paul

Paul Meems 
Release manager, configuration manager
and forum moderator of MapWindow GIS.
www.mapwindow.org

Owner of MapWindow.nl - Support for
Dutch speaking users.
www.mapwindow.nl

 

The MapWindow GIS project has moved to GitHub!


Download the latest MapWinGIS mapping engine.

Download the latest MapWindow 5 open source desktop application.

 

2017-08-03 20:05 GMT+02:00 Chris Waigl <[hidden email]>:

I would not use gdal for this particular task. I presume you have the band data in a 2D numpy array. Then I’d get the 80th percentile for example with np.percentile() and use a boolean expression to generate a mask for the array (droneraster > perc80value ).

 

Chris

 

-- 

Christine (Chris) Waigl - [hidden email] -  <a href="tel:(907)%20474-5483" target="_blank">+1-907-474-5483 - Skype: cwaigl_work

Geophysical Institute, UAF, 903 Koyukuk Drive, Fairbanks, AK 99775-7320, USA

 

 

 

 

 

 

On Aug 3, 2017, at 5:43 AM, Paul Meems <[hidden email]> wrote:

 

I have a drone raster file which I want to use for some calculation.

Before the calculation, I need to loose some extreme values.

I want to do something like a percentile calculation where you get all values, order them and loose the top 10%.

For this, I need to get all values first which can be slow when using a large file.

 

I looked at the statistics (band.GetStatistics) but that doesn't work well.

I thought I could use 2 times the standard deviation added to the mean to get roughly 97%.

But with these statistics:

    STATISTICS_MAXIMUM=33.186080932617

    STATISTICS_MEAN=24.840205979603

    STATISTICS_MINIMUM=1.5951598882675

    STATISTICS_STDDEV=4.7285348016053

Mean + 2*std is larger than the max.

 

So I moved to the histogram. It is also very fast, but I'm not sure how to use it.

I have this:

  256 buckets from 1.53322 to 33.248:

  410 77 66 66 65 58 56 45 42 87 57 72 61 65 68 70 73 82 93 ...

Does this mean, bucket 1 = 410 that I have 410 pixels of value 1.53322 and the second bucket means I have 77 pixels between 1.53322 and 1.657? 1.657 = 1.53322 + ((33.248 - 1.53322)/256)

 

Is this a good approach? Or can/should I use a different one.

 

Thanks,

Paul

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev

 

 


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Loading...