whitepaper: Computing a shapefile by Belgian zip codes

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

whitepaper: Computing a shapefile by Belgian zip codes

Alexandre Detiste
Hi,

I've followed a one-week course & I felt inspired ("once you pop, you can't stop")
so I finally documented these proceedings as I felt
they might be interresting to other mappers.

As this list is Belgium-specific material,
this list felt like the best place to share it.

PS: I'm confident one could do the same with Open-Street Map data,
it's only a matter of how much does cost your developer's time versus
the one-shot 100€ cost of buying the base data at National Geography Institute.


Greets,

Alexandre Detiste

--------

http://users.skynet.be/bs366950/whitepaper/

Abstract:

In most datawarehouses, the lowest detail level for geographical analysis is the zipcode*sub-city;
but as in practice the sub-city is generaly encoded in a poor way; only the zipcode is usable.

The National Geography Institute provide for 100€ shapefile data at the 'statistical sector' level
that can be processed to get data at zipcode level at http://www.ngi.be/FR/FR1-5-2.shtm


_______________________________________________
Belgium mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/belgium
Reply | Threaded
Open this post in threaded view
|

Re: whitepaper: Computing a shapefile by Belgian zip codes

joost schouppe
Hi Alexandre,

A few thoughts:

* why pay 100 euro's for open data? [1] (note: there's a few errors in this file, which make it crash on analysis using QGIS. In my own work, I used ArcGIS FIx Geometry as I don't know the right tools in QGIS)

* Zipcode is a terrible way to handle geographical data, as it often has completely illogical borders. From a practical point of view you need it of course, as a lot of data are collected at this info. If at all possible, the lowest geographical level of a datawarehouse in Belgium should always be the statistical sector. 

* Careful: aggregating statistical sectors into postal codes is not entirely correct, as statistical sectors do have logical borders. See this example where buildings are coloured by postal code and overlayed with statsitical sectors (black lines) [2] . In the website I co-manage [3], we chose to name these merged statistical sectors "postal codes", even if that's not strictly true. You can see and download both "our" postal codes (with imperfect and strange geometry) [4] and our merged-statsec-to-postalcode [5] from the Antwerp open data portal.

* It's hard to define which sectors to which postal codes. Yes, the letters often do give an indication, but in this example [6], some postal codes consist of different letters, and some letters belong to different postal codes. I think a more correct way would be to do a spatial join of address points with statistical sectors, then count the most prevalent address postal code within a certain sector. Join the resulting table to the sector shapefile (join by attribute niscode), and you can just do a dissolve by attribute postal code to get the needed dataset.


2016-05-28 13:23 GMT+02:00 Alexandre Detiste <[hidden email]>:
Hi,

I've followed a one-week course & I felt inspired ("once you pop, you can't stop")
so I finally documented these proceedings as I felt
they might be interresting to other mappers.

As this list is Belgium-specific material,
this list felt like the best place to share it.

PS: I'm confident one could do the same with Open-Street Map data,
it's only a matter of how much does cost your developer's time versus
the one-shot 100€ cost of buying the base data at National Geography Institute.


Greets,

Alexandre Detiste

--------

http://users.skynet.be/bs366950/whitepaper/

Abstract:

In most datawarehouses, the lowest detail level for geographical analysis is the zipcode*sub-city;
but as in practice the sub-city is generaly encoded in a poor way; only the zipcode is usable.

The National Geography Institute provide for 100€ shapefile data at the 'statistical sector' level
that can be processed to get data at zipcode level at http://www.ngi.be/FR/FR1-5-2.shtm


_______________________________________________
Belgium mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/belgium



--

_______________________________________________
Belgium mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/belgium
Reply | Threaded
Open this post in threaded view
|

Re: whitepaper: Computing a shapefile by Belgian zip codes

Alexandre Detiste
Le lundi 30 mai 2016, 12:26:35 joost schouppe a écrit :
> Hi Alexandre,
>
> A few thoughts:
>
> * why pay 100 euro's for open data? [1] (note: there's a few errors in this
> file, which make it crash on analysis using QGIS. In my own work, I used
> ArcGIS FIx Geometry as I don't know the right tools in QGIS)

Because this file [1] doesn't contain any zipcode information;
and the file from IGN has a good word of mouth.
I was called to rescue this after the file was already bought anyway.

The zipcode in AD_1_MunicipalSection_WSG84 is already like 90% correct,
with some huge defect for big municipalities like Brussels, Antwerpen, Liège
that have complex zipcode layout; users were happy to get
close to 99% ok with a bit of extra efforts.

The further I tried to improve quality, the more it felt like pushing
a square bloc in a round holde... but end users just didn't cared.

> * Zipcode is a terrible way to handle geographical data, as it often has
> completely illogical borders. From a practical point of view you need it of
> course, as a lot of data are collected at this info. If at all possible,
> the lowest geographical level of a datawarehouse in Belgium should always
> be the statistical sector.

There's always a huge resistance to change, and after having payed a
+10.000€ / year software package user expected that it would automagicaly
answer all questions; so use of extra (even free) software is generally frowned upon.

They were terribly affraid of using the provided geocoding tool:
some .exe that read a text file and write an other one without
some shiny VisualBasic 6 GUI.

I think geocoding + using stat sectors is the correct way to do tough;
but it this case solution had to remain pragmatic & reproducible
by someone else.


In this case, too, the visalisation tool would slow down to a crawl
if too many shapes borders were defined, so zipcodes were
an easy way to merge stat sectors on a map.

> * Careful: aggregating statistical sectors into postal codes is not
> entirely correct, as statistical sectors do have logical borders. See this
> example where buildings are coloured by postal code and overlayed with
> statsitical sectors (black lines) [2] . In the website I co-manage [3], we
> chose to name these merged statistical sectors "postal codes", even if
> that's not strictly true. You can see and download both "our" postal codes
> (with imperfect and strange geometry) [4] and our
> merged-statsec-to-postalcode [5] from the Antwerp open data portal.
>
> * It's hard to define which sectors to which postal codes. Yes, the letters
> often do give an indication, but in this example [6], some postal codes
> consist of different letters, and some letters belong to different postal
> codes. I think a more correct way would be to do a spatial join of address
> points with statistical sectors, then count the most prevalent address
> postal code within a certain sector. Join the resulting table to the sector
> shapefile (join by attribute niscode), and you can just do a dissolve by
> attribute postal code to get the needed dataset.

It would be nice to share all these insights on a wiki or something
(or a premade file).

I'd look again at how to do that with OSM someday.


Greets,

Alexandre

_______________________________________________
Belgium mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/belgium
Reply | Threaded
Open this post in threaded view
|

Re: whitepaper: Computing a shapefile by Belgian zip codes

JorisMapMen
Hello 

I made a mapof postal codes (.shp), a couple of months ago by way of exercise. Made from combining data from cadastre maps (which contain old commune borders pre-fusion) “deelgemeenten”
and adresslists of primary schools (VL-BXL-W) (in every village there is one, and they all have a postcode mentioned)

sample controlling showed very satisfying result. 

ready to share 
contact me.

Joris Hintjens
Mapmen
Hofstraat 21 
1982 Elewijt
tel 0472 473 178

Op 31 mei 2016, om 06:27 heeft Alexandre Detiste <[hidden email]> het volgende geschreven:

Le lundi 30 mai 2016, 12:26:35 joost schouppe a écrit :
Hi Alexandre,

A few thoughts:

* why pay 100 euro's for open data? [1] (note: there's a few errors in this
file, which make it crash on analysis using QGIS. In my own work, I used
ArcGIS FIx Geometry as I don't know the right tools in QGIS)

Because this file [1] doesn't contain any zipcode information;
and the file from IGN has a good word of mouth.
I was called to rescue this after the file was already bought anyway.

The zipcode in AD_1_MunicipalSection_WSG84 is already like 90% correct,
with some huge defect for big municipalities like Brussels, Antwerpen, Liège
that have complex zipcode layout; users were happy to get
close to 99% ok with a bit of extra efforts.

The further I tried to improve quality, the more it felt like pushing
a square bloc in a round holde... but end users just didn't cared.

* Zipcode is a terrible way to handle geographical data, as it often has
completely illogical borders. From a practical point of view you need it of
course, as a lot of data are collected at this info. If at all possible,
the lowest geographical level of a datawarehouse in Belgium should always
be the statistical sector.

There's always a huge resistance to change, and after having payed a
+10.000€ / year software package user expected that it would automagicaly
answer all questions; so use of extra (even free) software is generally frowned upon.

They were terribly affraid of using the provided geocoding tool:
some .exe that read a text file and write an other one without
some shiny VisualBasic 6 GUI.

I think geocoding + using stat sectors is the correct way to do tough;
but it this case solution had to remain pragmatic & reproducible
by someone else.


In this case, too, the visalisation tool would slow down to a crawl
if too many shapes borders were defined, so zipcodes were
an easy way to merge stat sectors on a map.

* Careful: aggregating statistical sectors into postal codes is not
entirely correct, as statistical sectors do have logical borders. See this
example where buildings are coloured by postal code and overlayed with
statsitical sectors (black lines) [2] . In the website I co-manage [3], we
chose to name these merged statistical sectors "postal codes", even if
that's not strictly true. You can see and download both "our" postal codes
(with imperfect and strange geometry) [4] and our
merged-statsec-to-postalcode [5] from the Antwerp open data portal.

* It's hard to define which sectors to which postal codes. Yes, the letters
often do give an indication, but in this example [6], some postal codes
consist of different letters, and some letters belong to different postal
codes. I think a more correct way would be to do a spatial join of address
points with statistical sectors, then count the most prevalent address
postal code within a certain sector. Join the resulting table to the sector
shapefile (join by attribute niscode), and you can just do a dissolve by
attribute postal code to get the needed dataset.

It would be nice to share all these insights on a wiki or something
(or a premade file).

I'd look again at how to do that with OSM someday.


Greets,

Alexandre


1:
http://statbel.fgov.be/nl/statistieken/opendata/datasets/tools/big/SH_STAT_SECTORS.jsp
2: http://i.imgur.com/El8b4I4.jpg
3: https://stadincijfers.antwerpen.be/dashboard/
4: http://opendata.antwerpen.be/datasets/postzones
5: http://opendata.antwerpen.be/datasets/stadsdeel
6:
https://stadincijfers.antwerpen.be/databank/?sel_guid=bc4433ff-d734-4a54-b1ba-410e8a8cc975

2016-05-28 13:23 GMT+02:00 Alexandre Detiste <[hidden email]>:

http://users.skynet.be/bs366950/whitepaper/

_______________________________________________
Belgium mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/belgium


_______________________________________________
Belgium mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/belgium
Reply | Threaded
Open this post in threaded view
|

Re: whitepaper: Computing a shapefile by Belgian zip codes

Johan Van de Wauw
In Flanders, you could use the "adressenlijst" of CRAB to find out the borders.
https://download.agiv.be/Producten/Detail?id=447&title=CRAB_Adressenlijst

On Tue, May 31, 2016 at 9:54 PM, JorisMapMen <[hidden email]> wrote:

> Hello
>
> I made a mapof postal codes (.shp), a couple of months ago by way of
> exercise. Made from combining data from cadastre maps (which contain old
> commune borders pre-fusion) “deelgemeenten”
> and adresslists of primary schools (VL-BXL-W) (in every village there is
> one, and they all have a postcode mentioned)
>
> sample controlling showed very satisfying result.
>
> ready to share
> contact me.
>
> Joris Hintjens
> Mapmen
> Hofstraat 21
> 1982 Elewijt
> [hidden email]
> www.mapmen.be
> tel 0472 473 178
>
> Op 31 mei 2016, om 06:27 heeft Alexandre Detiste
> <[hidden email]> het volgende geschreven:
>
> Le lundi 30 mai 2016, 12:26:35 joost schouppe a écrit :
>
> Hi Alexandre,
>
> A few thoughts:
>
> * why pay 100 euro's for open data? [1] (note: there's a few errors in this
> file, which make it crash on analysis using QGIS. In my own work, I used
> ArcGIS FIx Geometry as I don't know the right tools in QGIS)
>
>
> Because this file [1] doesn't contain any zipcode information;
> and the file from IGN has a good word of mouth.
> I was called to rescue this after the file was already bought anyway.
>
> The zipcode in AD_1_MunicipalSection_WSG84 is already like 90% correct,
> with some huge defect for big municipalities like Brussels, Antwerpen, Liège
> that have complex zipcode layout; users were happy to get
> close to 99% ok with a bit of extra efforts.
>
> The further I tried to improve quality, the more it felt like pushing
> a square bloc in a round holde... but end users just didn't cared.
>
> * Zipcode is a terrible way to handle geographical data, as it often has
> completely illogical borders. From a practical point of view you need it of
> course, as a lot of data are collected at this info. If at all possible,
> the lowest geographical level of a datawarehouse in Belgium should always
> be the statistical sector.
>
>
> There's always a huge resistance to change, and after having payed a
> +10.000€ / year software package user expected that it would automagicaly
> answer all questions; so use of extra (even free) software is generally
> frowned upon.
>
> They were terribly affraid of using the provided geocoding tool:
> some .exe that read a text file and write an other one without
> some shiny VisualBasic 6 GUI.
>
> I think geocoding + using stat sectors is the correct way to do tough;
> but it this case solution had to remain pragmatic & reproducible
> by someone else.
>
>
> In this case, too, the visalisation tool would slow down to a crawl
> if too many shapes borders were defined, so zipcodes were
> an easy way to merge stat sectors on a map.
>
> * Careful: aggregating statistical sectors into postal codes is not
> entirely correct, as statistical sectors do have logical borders. See this
> example where buildings are coloured by postal code and overlayed with
> statsitical sectors (black lines) [2] . In the website I co-manage [3], we
> chose to name these merged statistical sectors "postal codes", even if
> that's not strictly true. You can see and download both "our" postal codes
> (with imperfect and strange geometry) [4] and our
> merged-statsec-to-postalcode [5] from the Antwerp open data portal.
>
> * It's hard to define which sectors to which postal codes. Yes, the letters
> often do give an indication, but in this example [6], some postal codes
> consist of different letters, and some letters belong to different postal
> codes. I think a more correct way would be to do a spatial join of address
> points with statistical sectors, then count the most prevalent address
> postal code within a certain sector. Join the resulting table to the sector
> shapefile (join by attribute niscode), and you can just do a dissolve by
> attribute postal code to get the needed dataset.
>
>
> It would be nice to share all these insights on a wiki or something
> (or a premade file).
>
> I'd look again at how to do that with OSM someday.
>
>
> Greets,
>
> Alexandre
>
>
> 1:
> http://statbel.fgov.be/nl/statistieken/opendata/datasets/tools/big/SH_STAT_SECTORS.jsp
> 2: http://i.imgur.com/El8b4I4.jpg
> 3: https://stadincijfers.antwerpen.be/dashboard/
> 4: http://opendata.antwerpen.be/datasets/postzones
> 5: http://opendata.antwerpen.be/datasets/stadsdeel
> 6:
> https://stadincijfers.antwerpen.be/databank/?sel_guid=bc4433ff-d734-4a54-b1ba-410e8a8cc975
>
> 2016-05-28 13:23 GMT+02:00 Alexandre Detiste <[hidden email]>:
>
> http://users.skynet.be/bs366950/whitepaper/
>
>
> _______________________________________________
> Belgium mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/belgium
>
>
>
> _______________________________________________
> Belgium mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/belgium
>
_______________________________________________
Belgium mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/belgium
Reply | Threaded
Open this post in threaded view
|

Re: whitepaper: Computing a shapefile by Belgian zip codes

joost schouppe
In reply to this post by Alexandre Detiste
Hey Alexandre,

Thanks for the clarification. Could you tell me a bit more about this geocoding exe. We still use some old software here in Antwerp, which is pretty intelligent but a pain to update and only returns statsec, not x-y. Would be nice to try an alternative.

I just set up stats4belgium to share methods I develop at the Stad Antwerpen (I know, maybe that's not the best name :). Maybe that could be a place to share methods like this? There's also a nice R package with open Belgian data floating around.

Joris, that file does sound interesting. Have you overlayed it with the open postal codes dataset for Antwerp?

Joost

_______________________________________________
Belgium mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/belgium
Reply | Threaded
Open this post in threaded view
|

Re: whitepaper: Computing a shapefile by Belgian zip codes

Alexandre Detiste
In reply to this post by JorisMapMen
Le Tuesday 31 May 2016, 21:54:14 JorisMapMen a écrit :

> Hello
>
> I made a mapof postal codes (.shp), a couple of months ago by way of exercise.
> Made from combining data from cadastre maps (which contain old commune borders pre-fusion) “deelgemeenten”
> and adresslists of primary schools (VL-BXL-W) (in every village there is one, and they all have a postcode mentioned)
>
> sample controlling showed very satisfying result.
>
> ready to share
> contact me.

Hi,

I was more interrested in sharing idea on how to do it than the result;
I don't even need the result anymore as it was needed by my previous
employer.

>
> Joris Hintjens
> Mapmen
> Hofstraat 21
> 1982 Elewijt
> [hidden email] <mailto:[hidden email]>
> www.mapmen.be <http://www.mapmen.be/>
> tel 0472 473 178

_______________________________________________
Belgium mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/belgium
Reply | Threaded
Open this post in threaded view
|

Re: whitepaper: Computing a shapefile by Belgian zip codes

Alexandre Detiste
In reply to this post by joost schouppe
Le Wednesday 01 June 2016, 09:21:38 joost schouppe a écrit :
> Hey Alexandre,
>
> Thanks for the clarification. Could you tell me a bit more about this
> geocoding exe. We still use some old software here in Antwerp, which is
> pretty intelligent but a pain to update and only returns statsec, not x-y.
> Would be nice to try an alternative.

This was a huge .exe file to be run from the command-line;
I guess the provider appended the data in a container (zip ?)
to the actual compiled program.

This was a kind of black-box reading from one .csv
& outputing an other one; I don't like working this way.

_______________________________________________
Belgium mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/belgium