[gdal-dev] ASCII Gridded XYZ and "Hectare Raster" data (STATPOP and STATENT)

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[gdal-dev] ASCII Gridded XYZ and "Hectare Raster" data (STATPOP and STATENT)

Stefan Keller
Hi,

The Swiss Statistical Office curates well-known datasets informally
called "Hectare Raster" (STATPOP and STATENT) which obviously are
ASCII Gridded XYZ, like this:

"RELI" "X" "Y" "C1" "C2" "C3"
66192542 661900 254200 1 2 3 ...
66192543 661900 254300 1 2 3 ...
66192599 661900 259900 1 2 3 ...
66192600 661900 260000 1 2 3 ...
...

This obviously is a regular gridded X and Y, sorted by X,Y but
obviously also has gaps and more than one  "Z" (C1, C2, C3). Typically
this is converted to a raster format like GeoTIFF and analysed through
map algebra.

Question 1: The GDAL docs (https://www.gdal.org/frmt_xyz.html ) says
"...no missing value is supported.". In fact, I think it supports them
by omitting lines by setting missing "lines" to NODATA.
=> Correct?

Question 2: I think, there is a way (which could be mentioned in the
docs above), to determine the "Z" value to be converted by gdal_grid
explained in "Reading comma separated values"
(https://www.gdal.org/gdal_grid.html#gdal_grid_csv ).
So XYZ in fact is XYZx meaning that potentially there can be many "Z"
in the input file, but only one can be included.
=> What do you think?

Question 3 (the fundamental one): Although XYZ format requires to
contain regular gridded coordinates, it's technically handled the same
as irregular coordinates (without interpolation) - at least regarding
the main task to convert an sparse array of values to a raster format.
=> Correct?

:Stefan
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: ASCII Gridded XYZ and "Hectare Raster" data (STATPOP and STATENT)

Even Rouault-2
Stefan,

>
> The Swiss Statistical Office curates well-known datasets informally
> called "Hectare Raster" (STATPOP and STATENT) which obviously are
> ASCII Gridded XYZ, like this:
>
> "RELI" "X" "Y" "C1" "C2" "C3"
> 66192542 661900 254200 1 2 3 ...
> 66192543 661900 254300 1 2 3 ...
> 66192599 661900 259900 1 2 3 ...
> 66192600 661900 260000 1 2 3 ...
> ...
>
> This obviously is a regular gridded X and Y, sorted by X,Y but
> obviously also has gaps and more than one  "Z" (C1, C2, C3). Typically
> this is converted to a raster format like GeoTIFF and analysed through
> map algebra.
>
> Question 1: The GDAL docs (https://www.gdal.org/frmt_xyz.html ) says
> "...no missing value is supported.". In fact, I think it supports them
> by omitting lines by setting missing "lines" to NODATA.
> => Correct?

The driver is quite picky. I don't think it will support missing values in the
middle of a line of same Y value. It supports missing values at the beginning
or end of a line of same Y value. And points must be sorted by increasing or
decreasing Y, and increasing X value for a line of same Y value.

>
> Question 2: I think, there is a way (which could be mentioned in the
> docs above), to determine the "Z" value to be converted by gdal_grid
> explained in "Reading comma separated values"
> (https://www.gdal.org/gdal_grid.html#gdal_grid_csv ).
> So XYZ in fact is XYZx meaning that potentially there can be many "Z"
> in the input file, but only one can be included.
> => What do you think?

The best way might probably to read the file with the CSV driver indeed. With
CSV:the_filename if the_filename has not a .csv extension
Convert it to something with a spatial index like a shapefile or GPKG if the
file is big
And then use gdal_grid to create a raster from it.

>
> Question 3 (the fundamental one): Although XYZ format requires to
> contain regular gridded coordinates, it's technically handled the same
> as irregular coordinates (without interpolation) - at least regarding
> the main task to convert an sparse array of values to a raster format.
> => Correct?

The XYZ driver requires constant spacing.

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: ASCII Gridded XYZ and "Hectare Raster" data (STATPOP and STATENT)

Stefan Keller
Hi Even

Thx very much!
I'll try that way and will report.

:Stefan

Am So., 21. Apr. 2019 um 11:31 Uhr schrieb Even Rouault
<[hidden email]>:

>
> Stefan,
>
> >
> > The Swiss Statistical Office curates well-known datasets informally
> > called "Hectare Raster" (STATPOP and STATENT) which obviously are
> > ASCII Gridded XYZ, like this:
> >
> > "RELI" "X" "Y" "C1" "C2" "C3"
> > 66192542 661900 254200 1 2 3 ...
> > 66192543 661900 254300 1 2 3 ...
> > 66192599 661900 259900 1 2 3 ...
> > 66192600 661900 260000 1 2 3 ...
> > ...
> >
> > This obviously is a regular gridded X and Y, sorted by X,Y but
> > obviously also has gaps and more than one  "Z" (C1, C2, C3). Typically
> > this is converted to a raster format like GeoTIFF and analysed through
> > map algebra.
> >
> > Question 1: The GDAL docs (https://www.gdal.org/frmt_xyz.html ) says
> > "...no missing value is supported.". In fact, I think it supports them
> > by omitting lines by setting missing "lines" to NODATA.
> > => Correct?
>
> The driver is quite picky. I don't think it will support missing values in the
> middle of a line of same Y value. It supports missing values at the beginning
> or end of a line of same Y value. And points must be sorted by increasing or
> decreasing Y, and increasing X value for a line of same Y value.
>
> >
> > Question 2: I think, there is a way (which could be mentioned in the
> > docs above), to determine the "Z" value to be converted by gdal_grid
> > explained in "Reading comma separated values"
> > (https://www.gdal.org/gdal_grid.html#gdal_grid_csv ).
> > So XYZ in fact is XYZx meaning that potentially there can be many "Z"
> > in the input file, but only one can be included.
> > => What do you think?
>
> The best way might probably to read the file with the CSV driver indeed. With
> CSV:the_filename if the_filename has not a .csv extension
> Convert it to something with a spatial index like a shapefile or GPKG if the
> file is big
> And then use gdal_grid to create a raster from it.
>
> >
> > Question 3 (the fundamental one): Although XYZ format requires to
> > contain regular gridded coordinates, it's technically handled the same
> > as irregular coordinates (without interpolation) - at least regarding
> > the main task to convert an sparse array of values to a raster format.
> > => Correct?
>
> The XYZ driver requires constant spacing.
>
> Even
>
> --
> Spatialys - Geospatial professional services
> http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: ASCII Gridded XYZ and "Hectare Raster" data (STATPOP and STATENT)

Stefan Keller
Even,

As said my point input data already has regular, constant spaces,
namely 100 meters in X and Y, with e.g. extent (2661900, 1253800) -
(2668400, 1261300), map units are meters (SRS EPSG:2056).

I managed to convert my data using following gdal_grid command
> gdal_grid -ot UInt16 -of GTiff -zfield "C1" in.gpkg out.tif

Now I still have an issue with the output pixel size which was
27.41341571800429122,-27.41341571800429122 meters
But what I want is a pixel size with exactly same spacing as input
points, namely 100,-100 (meters).
I know there's the gdal_grid options -txe/-tye/-outsize, but I'd like
to avoid calculating these by hand.
In addition IMHO there's no need interpolation at all (or nearest
neighbor at most), since input point is already in place, ready for
output.

Is there a solution for just setting output pixel size in map units
(or am I missing something)?

:Stefan

Am So., 21. Apr. 2019 um 14:48 Uhr schrieb Stefan Keller <[hidden email]>:

>
> Hi Even
>
> Thx very much!
> I'll try that way and will report.
>
> :Stefan
>
> Am So., 21. Apr. 2019 um 11:31 Uhr schrieb Even Rouault
> <[hidden email]>:
> >
> > Stefan,
> >
> > >
> > > The Swiss Statistical Office curates well-known datasets informally
> > > called "Hectare Raster" (STATPOP and STATENT) which obviously are
> > > ASCII Gridded XYZ, like this:
> > >
> > > "RELI" "X" "Y" "C1" "C2" "C3"
> > > 66192542 661900 254200 1 2 3 ...
> > > 66192543 661900 254300 1 2 3 ...
> > > 66192599 661900 259900 1 2 3 ...
> > > 66192600 661900 260000 1 2 3 ...
> > > ...
> > >
> > > This obviously is a regular gridded X and Y, sorted by X,Y but
> > > obviously also has gaps and more than one  "Z" (C1, C2, C3). Typically
> > > this is converted to a raster format like GeoTIFF and analysed through
> > > map algebra.
> > >
> > > Question 1: The GDAL docs (https://www.gdal.org/frmt_xyz.html ) says
> > > "...no missing value is supported.". In fact, I think it supports them
> > > by omitting lines by setting missing "lines" to NODATA.
> > > => Correct?
> >
> > The driver is quite picky. I don't think it will support missing values in the
> > middle of a line of same Y value. It supports missing values at the beginning
> > or end of a line of same Y value. And points must be sorted by increasing or
> > decreasing Y, and increasing X value for a line of same Y value.
> >
> > >
> > > Question 2: I think, there is a way (which could be mentioned in the
> > > docs above), to determine the "Z" value to be converted by gdal_grid
> > > explained in "Reading comma separated values"
> > > (https://www.gdal.org/gdal_grid.html#gdal_grid_csv ).
> > > So XYZ in fact is XYZx meaning that potentially there can be many "Z"
> > > in the input file, but only one can be included.
> > > => What do you think?
> >
> > The best way might probably to read the file with the CSV driver indeed. With
> > CSV:the_filename if the_filename has not a .csv extension
> > Convert it to something with a spatial index like a shapefile or GPKG if the
> > file is big
> > And then use gdal_grid to create a raster from it.
> >
> > >
> > > Question 3 (the fundamental one): Although XYZ format requires to
> > > contain regular gridded coordinates, it's technically handled the same
> > > as irregular coordinates (without interpolation) - at least regarding
> > > the main task to convert an sparse array of values to a raster format.
> > > => Correct?
> >
> > The XYZ driver requires constant spacing.
> >
> > Even
> >
> > --
> > Spatialys - Geospatial professional services
> > http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: ASCII Gridded XYZ and "Hectare Raster" data (STATPOP and STATENT)

Stefan Keller
Hi,

Am 22. Apr. 2019 23:49 wrote regarding gdal_grid:
> Is there a solution for just setting output pixel size in map units

By guessing the "-outsize 65 75" (no. of X and Y lines) I finally got
a result which contained a pixel size of 100, -100 meters plus
sensible values:

# gdal_grid -ot UInt16 -of GTiff -zfield "C3" -a
nearest:radius1=71:radius2=71 -outsize 65 75 in.gpkg out.tif

=> But, I'm still looking for a hint to set pixel sizes without
guessing or calculating the "outsize" from the extents of the input
file.

Then, I observed another problem: One obviously has no choice in
gdal_grid than to choose an interpolation method. As one can see from
the command above, I've chosen "nearest neighbour" as interpolation
method (option "-a nearest:radius1=71:radius2=71").

Now, this worked somehow - but not exactly: In almost all cases a
single input grid point (which "sits" at the junction of four pixels)
affected two or three neighboring pixels in the raster output.
=> But what I'm after is, that one input grid point only sets the
value of one pixel (say lower left pixel) in the raster output!

Which "neighboring" pixel exactly is of secondary importance here.
Main point is that one input value maps to only _one_ (not many) pixel
in the raster output.

:Stefan


Am Mo., 22. Apr. 2019 um 23:49 Uhr schrieb Stefan Keller <[hidden email]>:

>
> Even,
>
> As said my point input data already has regular, constant spaces,
> namely 100 meters in X and Y, with e.g. extent (2661900, 1253800) -
> (2668400, 1261300), map units are meters (SRS EPSG:2056).
>
> I managed to convert my data using following gdal_grid command
> > gdal_grid -ot UInt16 -of GTiff -zfield "C1" in.gpkg out.tif
>
> Now I still have an issue with the output pixel size which was
> 27.41341571800429122,-27.41341571800429122 meters
> But what I want is a pixel size with exactly same spacing as input
> points, namely 100,-100 (meters).
> I know there's the gdal_grid options -txe/-tye/-outsize, but I'd like
> to avoid calculating these by hand.
> In addition IMHO there's no need interpolation at all (or nearest
> neighbor at most), since input point is already in place, ready for
> output.
>
> Is there a solution for just setting output pixel size in map units
> (or am I missing something)?
>
> :Stefan
>
> Am So., 21. Apr. 2019 um 14:48 Uhr schrieb Stefan Keller <[hidden email]>:
> >
> > Hi Even
> >
> > Thx very much!
> > I'll try that way and will report.
> >
> > :Stefan
> >
> > Am So., 21. Apr. 2019 um 11:31 Uhr schrieb Even Rouault
> > <[hidden email]>:
> > >
> > > Stefan,
> > >
> > > >
> > > > The Swiss Statistical Office curates well-known datasets informally
> > > > called "Hectare Raster" (STATPOP and STATENT) which obviously are
> > > > ASCII Gridded XYZ, like this:
> > > >
> > > > "RELI" "X" "Y" "C1" "C2" "C3"
> > > > 66192542 661900 254200 1 2 3 ...
> > > > 66192543 661900 254300 1 2 3 ...
> > > > 66192599 661900 259900 1 2 3 ...
> > > > 66192600 661900 260000 1 2 3 ...
> > > > ...
> > > >
> > > > This obviously is a regular gridded X and Y, sorted by X,Y but
> > > > obviously also has gaps and more than one  "Z" (C1, C2, C3). Typically
> > > > this is converted to a raster format like GeoTIFF and analysed through
> > > > map algebra.
> > > >
> > > > Question 1: The GDAL docs (https://www.gdal.org/frmt_xyz.html ) says
> > > > "...no missing value is supported.". In fact, I think it supports them
> > > > by omitting lines by setting missing "lines" to NODATA.
> > > > => Correct?
> > >
> > > The driver is quite picky. I don't think it will support missing values in the
> > > middle of a line of same Y value. It supports missing values at the beginning
> > > or end of a line of same Y value. And points must be sorted by increasing or
> > > decreasing Y, and increasing X value for a line of same Y value.
> > >
> > > >
> > > > Question 2: I think, there is a way (which could be mentioned in the
> > > > docs above), to determine the "Z" value to be converted by gdal_grid
> > > > explained in "Reading comma separated values"
> > > > (https://www.gdal.org/gdal_grid.html#gdal_grid_csv ).
> > > > So XYZ in fact is XYZx meaning that potentially there can be many "Z"
> > > > in the input file, but only one can be included.
> > > > => What do you think?
> > >
> > > The best way might probably to read the file with the CSV driver indeed. With
> > > CSV:the_filename if the_filename has not a .csv extension
> > > Convert it to something with a spatial index like a shapefile or GPKG if the
> > > file is big
> > > And then use gdal_grid to create a raster from it.
> > >
> > > >
> > > > Question 3 (the fundamental one): Although XYZ format requires to
> > > > contain regular gridded coordinates, it's technically handled the same
> > > > as irregular coordinates (without interpolation) - at least regarding
> > > > the main task to convert an sparse array of values to a raster format.
> > > > => Correct?
> > >
> > > The XYZ driver requires constant spacing.
> > >
> > > Even
> > >
> > > --
> > > Spatialys - Geospatial professional services
> > > http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: ASCII Gridded XYZ and "Hectare Raster" data (STATPOP and STATENT)

Even Rouault-2
On jeudi 25 avril 2019 00:26:35 CEST Stefan Keller wrote:

> Hi,
>
> Am 22. Apr. 2019 23:49 wrote regarding gdal_grid:
> > Is there a solution for just setting output pixel size in map units
>
> By guessing the "-outsize 65 75" (no. of X and Y lines) I finally got
> a result which contained a pixel size of 100, -100 meters plus
> sensible values:
>
> # gdal_grid -ot UInt16 -of GTiff -zfield "C3" -a
> nearest:radius1=71:radius2=71 -outsize 65 75 in.gpkg out.tif
>
> => But, I'm still looking for a hint to set pixel sizes without
> guessing or calculating the "outsize" from the extents of the input
> file.

This is left as an exercice to the contributor to implement a -tr option
similar to gdalwarp / gdal_translate.

>
> Then, I observed another problem: One obviously has no choice in
> gdal_grid than to choose an interpolation method. As one can see from
> the command above, I've chosen "nearest neighbour" as interpolation
> method (option "-a nearest:radius1=71:radius2=71").

Why 71 ? If you're grid size is 100 meters, then less than 50 meters would be
a good radius
You would also need to grow the raster extent by a half-pixel size, so that
the middle of each raster point matches exactly an exact x,y coordinate of
your CSV (GDAL convention for georeferencing is the coordinate of the top-left
corner of a pixel area, not its center)
By default, gdal_grid takes the extent of the vector file to set it as the
extent of the raster file, and doesn't do this half-pixel extension, which is
the likely cause for the effect you observed.

Example:

given grid.csv with

x,y,z
100,500,1
300,500,3
100,600,4
200,600,5
300,600,6

# note the missing value at 200,500

$ ogr2ogr grid.gpkg grid.csv -oo X_POSSIBLE_NAMES=x -oo Y_POSSIBLE_NAMES=y

# Note the extra small radius since the center of a raster grid matches a
# input vector point, or none
$ gdal_grid -a nearest:radius1=1:radius2=1:nodata=-99  grid.gpkg  grid.tif \
                -txe 50 350 -tye 650 450 -outsize 3 2 -zfield z

$ gdal_translate grid.tif /vsistdout/ -of aaigrid
ncols        3
nrows        2
xllcorner    50.000000000000
yllcorner    450.000000000000
cellsize     100.000000000000
NODATA_value  -99
 4.0 5 6
 1 -99 3


Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: ASCII Gridded XYZ and "Hectare Raster" data (STATPOP and STATENT)

Stefan Keller
25. Apr. 2019 01:04 Uhr Even Rouault wrote:
> > => But, I'm still looking for a hint to set pixel sizes without
> > guessing or calculating the "outsize" from the extents of the input
> > file.
>
> This is left as an exercice to the contributor to implement a -tr option
> similar to gdalwarp / gdal_translate.

Got it :-)
In the meantime I try to sort out a calculation by hand.

> You would also need to grow the raster extent by a half-pixel size, so that
> the middle of each raster point matches exactly an exact x,y coordinate of
> your CSV ...

Now everything makes sense to me!
I had to read the above sentence twice, though.
And I'm now realizing the meaning of -txe/-tye.

Merci beaucoups! I'll try that.
:Stefan


Am Do., 25. Apr. 2019 um 01:04 Uhr schrieb Even Rouault
<[hidden email]>:

>
> On jeudi 25 avril 2019 00:26:35 CEST Stefan Keller wrote:
> > Hi,
> >
> > Am 22. Apr. 2019 23:49 wrote regarding gdal_grid:
> > > Is there a solution for just setting output pixel size in map units
> >
> > By guessing the "-outsize 65 75" (no. of X and Y lines) I finally got
> > a result which contained a pixel size of 100, -100 meters plus
> > sensible values:
> >
> > # gdal_grid -ot UInt16 -of GTiff -zfield "C3" -a
> > nearest:radius1=71:radius2=71 -outsize 65 75 in.gpkg out.tif
> >
> > => But, I'm still looking for a hint to set pixel sizes without
> > guessing or calculating the "outsize" from the extents of the input
> > file.
>
> This is left as an exercice to the contributor to implement a -tr option
> similar to gdalwarp / gdal_translate.
>
> >
> > Then, I observed another problem: One obviously has no choice in
> > gdal_grid than to choose an interpolation method. As one can see from
> > the command above, I've chosen "nearest neighbour" as interpolation
> > method (option "-a nearest:radius1=71:radius2=71").
>
> Why 71 ? If you're grid size is 100 meters, then less than 50 meters would be
> a good radius
> You would also need to grow the raster extent by a half-pixel size, so that
> the middle of each raster point matches exactly an exact x,y coordinate of
> your CSV (GDAL convention for georeferencing is the coordinate of the top-left
> corner of a pixel area, not its center)
> By default, gdal_grid takes the extent of the vector file to set it as the
> extent of the raster file, and doesn't do this half-pixel extension, which is
> the likely cause for the effect you observed.
>
> Example:
>
> given grid.csv with
>
> x,y,z
> 100,500,1
> 300,500,3
> 100,600,4
> 200,600,5
> 300,600,6
>
> # note the missing value at 200,500
>
> $ ogr2ogr grid.gpkg grid.csv -oo X_POSSIBLE_NAMES=x -oo Y_POSSIBLE_NAMES=y
>
> # Note the extra small radius since the center of a raster grid matches a
> # input vector point, or none
> $ gdal_grid -a nearest:radius1=1:radius2=1:nodata=-99  grid.gpkg  grid.tif \
>                 -txe 50 350 -tye 650 450 -outsize 3 2 -zfield z
>
> $ gdal_translate grid.tif /vsistdout/ -of aaigrid
> ncols        3
> nrows        2
> xllcorner    50.000000000000
> yllcorner    450.000000000000
> cellsize     100.000000000000
> NODATA_value  -99
>  4.0 5 6
>  1 -99 3
>
>
> Even
>
> --
> Spatialys - Geospatial professional services
> http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev