[QGIS-Developer] Quantile (Equal Count) broken or is data borked?

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[QGIS-Developer] Quantile (Equal Count) broken or is data borked?

Richard Duivenvoorde
Hi,

I've been asked to see if I can add 'logarithmic' breaks to the
GraduatedSymbolRenderer.
I am right that it is missing, yes?

BUT: looking into current code and testing the available break modes I
happen to look at the Quantile (Equal Count) one with a random dataset I
had around:

https://duif.net/temp/QuantileEqualCount.png

Using more classes makes it even worse:

https://duif.net/temp/QuantileEqualCount2.png

As you see a lot of 'empty' zero classes.
If I google: "The quantile map tries to bin the same count of features
in each of the x classes."

As if the 'breaks'-algorithm cannot handle zero's or floats or negative
values ???

The spread is indeed funny, see:

https://duif.net/temp/EqualInterval.png

What do others think?
Am I misunderstanding things or is my data borked?

Regards,

Richard Duivenvoorde

ps cannot share the data yet... would be cool if there was a processing
algorithm called 'randomize points' or so :-)
_______________________________________________
QGIS-Developer mailing list
[hidden email]
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Reply | Threaded
Open this post in threaded view
|

Re: Quantile (Equal Count) broken or is data borked?

pcav
hI rICHARD,

On 02/03/19 08:41, Richard Duivenvoorde wrote:

> Hi,
>
> I've been asked to see if I can add 'logarithmic' breaks to the
> GraduatedSymbolRenderer.
> I am right that it is missing, yes?
>
> BUT: looking into current code and testing the available break modes I
> happen to look at the Quantile (Equal Count) one with a random dataset I
> had around:
>
> https://duif.net/temp/QuantileEqualCount.png
>
> Using more classes makes it even worse:
>
> https://duif.net/temp/QuantileEqualCount2.png
>
> As you see a lot of 'empty' zero classes.
> If I google: "The quantile map tries to bin the same count of features
> in each of the x classes."
>
> As if the 'breaks'-algorithm cannot handle zero's or floats or negative
> values ???
>
> The spread is indeed funny, see:
>
> https://duif.net/temp/EqualInterval.png
>
> What do others think?
> Am I misunderstanding things or is my data borked?

we also noticed that for large data sets (tens of thousands records)
`Pretty breaks` classification behaves strangely: when clicking on
Classify repeatedly, classes boundaries change by about 10% all the
time, apparently in a random fashion.
Unsure whether this is the same or a different issue.
All the best.
--
Paolo Cavallini - www.faunalia.eu
QGIS.ORG Chair:
http://planet.qgis.org/planet/user/28/tag/qgis%20board/
_______________________________________________
QGIS-Developer mailing list
[hidden email]
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Reply | Threaded
Open this post in threaded view
|

Re: Quantile (Equal Count) broken or is data borked?

Richard Duivenvoorde
On 02/03/2019 13.09, Paolo Cavallini wrote:
> we also noticed that for large data sets (tens of thousands records)
> `Pretty breaks` classification behaves strangely: when clicking on
> Classify repeatedly, classes boundaries change by about 10% all the
> time, apparently in a random fashion.
> Unsure whether this is the same or a different issue.

Hi Paolo,

I've created a random dataset of 100000 points, but can not reproduce
this behaviour. Can you sent me a dataset or provide more info?

Regards,

Richard Duivenvoorde

_______________________________________________
QGIS-Developer mailing list
[hidden email]
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Reply | Threaded
Open this post in threaded view
|

Re: Quantile (Equal Count) broken or is data borked?

Nyall Dawson
In reply to this post by Richard Duivenvoorde
On Sat, 2 Mar 2019 at 17:41, Richard Duivenvoorde <[hidden email]> wrote:

>
> Hi,
>
> I've been asked to see if I can add 'logarithmic' breaks to the
> GraduatedSymbolRenderer.
> I am right that it is missing, yes?
>
> BUT: looking into current code and testing the available break modes I
> happen to look at the Quantile (Equal Count) one with a random dataset I
> had around:
>
> https://duif.net/temp/QuantileEqualCount.png
>
> Using more classes makes it even worse:
>
> https://duif.net/temp/QuantileEqualCount2.png

I can't reproduce. Maybe data set dependant?

> ps cannot share the data yet... would be cool if there was a processing
> algorithm called 'randomize points' or so :-)

There was this: https://github.com/SpatialVision/differential_privacy,
but it's QGIS 2.x only (looks trivial to port though). Otherwise run
"Geometry by expression", and set the geometry to
make_point(randf(1,100),randf(1,100))

Nyall
_______________________________________________
QGIS-Developer mailing list
[hidden email]
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Reply | Threaded
Open this post in threaded view
|

Re: Quantile (Equal Count) broken or is data borked?

Nyall Dawson
In reply to this post by pcav
On Sat, 2 Mar 2019 at 22:09, Paolo Cavallini <[hidden email]> wrote:

> we also noticed that for large data sets (tens of thousands records)
> `Pretty breaks` classification behaves strangely: when clicking on
> Classify repeatedly, classes boundaries change by about 10% all the
> time, apparently in a random fashion.
> Unsure whether this is the same or a different issue.

Do you mean pretty breaks or natural breaks (Jenks)? Jenks has been
that way forever -- by design (for performance) it takes a random
sample of the layer and generates the breaks from that sample only.
The random samples is selected anew with each click of classify, which
results in the different breaks.

Nyall

> All the best.
> --
> Paolo Cavallini - www.faunalia.eu
> QGIS.ORG Chair:
> http://planet.qgis.org/planet/user/28/tag/qgis%20board/
> _______________________________________________
> QGIS-Developer mailing list
> [hidden email]
> List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
_______________________________________________
QGIS-Developer mailing list
[hidden email]
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Reply | Threaded
Open this post in threaded view
|

Re: Quantile (Equal Count) broken or is data borked?

pcav
You're right, sorry.
Any way to improve this, perhaps taking a larger sample to reduce variation?
Thanks.

Il 3 marzo 2019 05:33:04 CET, Nyall Dawson <[hidden email]> ha scritto:
On Sat, 2 Mar 2019 at 22:09, Paolo Cavallini <[hidden email]> wrote:

we also noticed that for large data sets (tens of thousands records)
`Pretty breaks` classification behaves strangely: when clicking on
Classify repeatedly, classes boundaries change by about 10% all the
time, apparently in a random fashion.
Unsure whether this is the same or a different issue.

Do you mean pretty breaks or natural breaks (Jenks)? Jenks has been
that way forever -- by design (for performance) it takes a random
sample of the layer and generates the breaks from that sample only.
The random samples is selected anew with each click of classify, which
results in the different breaks.

Nyall

All the best.
--
Paolo Cavallini - www.faunalia.eu
QGIS.ORG Chair:
http://planet.qgis.org/planet/user/28/tag/qgis%20board/
QGIS-Developer mailing list
[hidden email]
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer

--
Sorry for being short
_______________________________________________
QGIS-Developer mailing list
[hidden email]
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Reply | Threaded
Open this post in threaded view
|

Re: Quantile (Equal Count) broken or is data borked?

Richard Duivenvoorde
In reply to this post by Nyall Dawson
On 03/03/2019 05.28, Nyall Dawson wrote:
> On Sat, 2 Mar 2019 at 17:41, Richard Duivenvoorde <[hidden email]> wrote:
...
>> BUT: looking into current code and testing the available break modes I
>> happen to look at the Quantile (Equal Count) one with a random dataset I
>> had around:
..
>> https://duif.net/temp/QuantileEqualCount2.png
> I can't reproduce. Maybe data set dependant?

I've created random data in a geopackage and created an issue for it

https://issues.qgis.org/issues/21451

I can reproduce it with memory layer, gpkg and the original shp in both
Master as 2.18...

Regards,

Richard Duivenvoorde

_______________________________________________
QGIS-Developer mailing list
[hidden email]
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Reply | Threaded
Open this post in threaded view
|

Re: Quantile (Equal Count) broken or is data borked?

Tobias Wendorff
In reply to this post by pcav
Am So, 3.03.2019, 08:31 schrieb Paolo Cavallini:
> You're right, sorry.
> Any way to improve this, perhaps taking a larger sample to reduce
> variation?
> Thanks.

I've started to use R for all my statistical processing. Natural breaks
is some kind of kmeans-clustering, but R does natural jenks and others,
too.

I'm doing the clustering/binning in R and joining back the results
in QGIS. Maybe that helps for your bigger amout of data.

_______________________________________________
QGIS-Developer mailing list
[hidden email]
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Reply | Threaded
Open this post in threaded view
|

Re: Quantile (Equal Count) broken or is data borked?

Richard Duivenvoorde
In reply to this post by Richard Duivenvoorde
On 03/03/2019 11.31, Richard Duivenvoorde wrote:
>>> https://duif.net/temp/QuantileEqualCount2.png
>> I can't reproduce. Maybe data set dependant?
> I've created random data in a geopackage and created an issue for it
>
> https://issues.qgis.org/issues/21451

Mmm, diving into this a little more (googling about Quantiles...), it
appears this has to do with ordering/ranking the data and then set 'breaks'.

So in this dataset there seems to be A LOT of values '0' so if you first
order that dataset, then create the breaks/buckets, it is possible to
have several 'buckets' in the 0 range? So this then is not so much an
error in creating the breaks, but more the visualisation of the numbers
in the classes (IF it is indeed ok, to have several 'classes' all having
0 both as min and as max value....).
Then QGIS should divide the 0-values over the 3 buckets all containing
zero's?

OR am I just misusing a method? And should you not use this method in a
dataset in which the distribution is so uneven?

Anybody more into this statistical theory then I am??


Regards,

Richard Duivenvoorde
_______________________________________________
QGIS-Developer mailing list
[hidden email]
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Reply | Threaded
Open this post in threaded view
|

Re: Quantile (Equal Count) broken or is data borked?

Nyall Dawson
On Mon, 4 Mar 2019 at 02:21, Richard Duivenvoorde <[hidden email]> wrote:

>
> On 03/03/2019 11.31, Richard Duivenvoorde wrote:
> >>> https://duif.net/temp/QuantileEqualCount2.png
> >> I can't reproduce. Maybe data set dependant?
> > I've created random data in a geopackage and created an issue for it
> >
> > https://issues.qgis.org/issues/21451
>
> Mmm, diving into this a little more (googling about Quantiles...), it
> appears this has to do with ordering/ranking the data and then set 'breaks'.
>
> So in this dataset there seems to be A LOT of values '0' so if you first
> order that dataset, then create the breaks/buckets, it is possible to
> have several 'buckets' in the 0 range? So this then is not so much an
> error in creating the breaks, but more the visualisation of the numbers
> in the classes (IF it is indeed ok, to have several 'classes' all having
> 0 both as min and as max value....).
> Then QGIS should divide the 0-values over the 3 buckets all containing
> zero's?

I had a look - as you suspect, it's impossible to divide your data
into 10 equal sized buckets based solely on the distribution of values
contained within it.

>
> OR am I just misusing a method?And should you not use this method in a
> dataset in which the distribution is so uneven?

That's my thoughts. You'll need to choose a different partitioning method.

Nyall

>
> Anybody more into this statistical theory then I am??
>
>
> Regards,
>
> Richard Duivenvoorde
> _______________________________________________
> QGIS-Developer mailing list
> [hidden email]
> List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
_______________________________________________
QGIS-Developer mailing list
[hidden email]
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Reply | Threaded
Open this post in threaded view
|

Re: Quantile (Equal Count) broken or is data borked?

Nyall Dawson
In reply to this post by Tobias Wendorff
On Mon, 4 Mar 2019 at 00:39, Tobias Wendorff
<[hidden email]> wrote:

>
> Am So, 3.03.2019, 08:31 schrieb Paolo Cavallini:
> > You're right, sorry.
> > Any way to improve this, perhaps taking a larger sample to reduce
> > variation?
> > Thanks.
>
> I've started to use R for all my statistical processing. Natural breaks
> is some kind of kmeans-clustering, but R does natural jenks and others,
> too.
>
> I'm doing the clustering/binning in R and joining back the results
> in QGIS. Maybe that helps for your bigger amout of data.

Hi Tobias,

I think this approach works in some circumstances, but ideally the
"best practice" binning techniques are available directly for use in
QGIS. It's not exactly a "smooth" workflow to require someone to use
an external tool to achieve this.

I'd be interested to hear which R libraries/functions you think give
better results then QGIS' current algorithms -- it should be trivial
to port these algorithms back to QGIS!

Nyall


>
_______________________________________________
QGIS-Developer mailing list
[hidden email]
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Reply | Threaded
Open this post in threaded view
|

Re: Quantile (Equal Count) broken or is data borked?

Tobias Wendorff
Am So, 3.03.2019, 23:50 schrieb Nyall Dawson:
>
> I'd be interested to hear which R libraries/functions you think give
> better results then QGIS' current algorithms -- it should be trivial
> to port these algorithms back to QGIS!

Oops, I didn't want to give the impression that QGIS's algorithms were
bad! I', only using the R  workflow because it gives me the possibility
to save the bins directly in the dataset. In most of my cases, it is
interesting to know in which bin a dataset fell (e.g. if I export
the attribute table and work in Excel or other tools).

_______________________________________________
QGIS-Developer mailing list
[hidden email]
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Reply | Threaded
Open this post in threaded view
|

Re: Quantile (Equal Count) broken or is data borked?

Nyall Dawson
On Mon, 4 Mar 2019 at 09:43, Tobias Wendorff
<[hidden email]> wrote:

>
> Am So, 3.03.2019, 23:50 schrieb Nyall Dawson:
> >
> > I'd be interested to hear which R libraries/functions you think give
> > better results then QGIS' current algorithms -- it should be trivial
> > to port these algorithms back to QGIS!
>
> Oops, I didn't want to give the impression that QGIS's algorithms were
> bad! I', only using the R  workflow because it gives me the possibility
> to save the bins directly in the dataset. In most of my cases, it is
> interesting to know in which bin a dataset fell (e.g. if I export
> the attribute table and work in Excel or other tools).

Sure, I don't think there was any misunderstanding here. Before you
replied I'd already meant to ask if anyone knew the corresponding R
packages we could check our implementations again.

(This also makes me think - we should expose these graduated renderer
classification techniques via processing)

Nyall

>
_______________________________________________
QGIS-Developer mailing list
[hidden email]
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Reply | Threaded
Open this post in threaded view
|

Re: Quantile (Equal Count) broken or is data borked?

Tobias Wendorff
Am Mo, 4.03.2019, 01:02 schrieb Nyall Dawson:
>
> Sure, I don't think there was any misunderstanding here. Before you
> replied I'd already meant to ask if anyone knew the corresponding R
> packages we could check our implementations again.

Most of the time, I'm either using standard classifications like
the ones provides in "classInt", "BAMMtools" or the standard quartile,
kmeans stuff. Hierarchical clustering etc. needs some more thought
anyways (f.e. different similarity matrixes etc.).

But those might not be needed for cartography. By integrating stuff
through processing, this might open a brand new world for some
geostatistical guys of course.

_______________________________________________
QGIS-Developer mailing list
[hidden email]
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Reply | Threaded
Open this post in threaded view
|

Re: Quantile (Equal Count) broken or is data borked?

Pedro Venâncio-2
R quantile gives the same result of QGIS with this dataset.

Please see the comments here: https://issues.qgis.org/issues/21451

Best regards,
Pedro Venâncio


Tobias Wendorff <[hidden email]> escreveu no dia segunda, 4/03/2019 à(s) 00:17:
Am Mo, 4.03.2019, 01:02 schrieb Nyall Dawson:
>
> Sure, I don't think there was any misunderstanding here. Before you
> replied I'd already meant to ask if anyone knew the corresponding R
> packages we could check our implementations again.

Most of the time, I'm either using standard classifications like
the ones provides in "classInt", "BAMMtools" or the standard quartile,
kmeans stuff. Hierarchical clustering etc. needs some more thought
anyways (f.e. different similarity matrixes etc.).

But those might not be needed for cartography. By integrating stuff
through processing, this might open a brand new world for some
geostatistical guys of course.

_______________________________________________
QGIS-Developer mailing list
[hidden email]
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer

_______________________________________________
QGIS-Developer mailing list
[hidden email]
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer