[PROJ] PROJ grid files CDN

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

[PROJ] PROJ grid files CDN

Howard Butler-3
All,

A few days ago, I tweeted [1] looking for a CDN partner to help us incrementally distribute the shift files that PROJ uses. After that generated some connections from organizations willing to help, I'm now asking the community if there is support for such an idea.

The current software approach PROJ uses to apply grid shifts works just fine, but it has one significant deficiency that can cause PROJ to generate "incorrect" results – the need for proper grid shift files. Incorrect results can happen due to missing shift data that users didn't know they needed to download and put in a magical directory. Some distributions do not ship full copies of the shift files due to their size, and new versions of the grid shift files are released at a different release cadence than the releases of distributions, the EPSG database, or PROJ itself.

The current management of the grid shift files would be improved for many users by providing an optional online web service alternative for obtaining shift information. Some benefits of a web service approach include:

* users no longer have to manually fetch grid files and place them in PROJ_LIB
* full and accurate capability of the software would no longer require GBs of grid shift files
* the web service can manage and provide proper versioning for the shift files
* cache build up of the grid files could happen lazily so users end up locally mirroring what they actually use

I recognize that many do not want PROJ reaching out to a web service, and I would propose that the machinery to do this would be optionally compiled and optionally activated via environment variable or some similar mechanism. However, a significant portion of don't-even-know-they're-using-PROJ users could benefit from PROJ having the optional ability to do its best in application of grid shifts.  

Does the PROJ community support such an idea? How does management of grid shift data impact your ability to use PROJ, and what ideas do you have to help us improve it?  If the feedback on this proposal is generally positive, I will work with Even on writing an RFC of a proposed implementation.

Howard

[1] https://twitter.com/howardbutler/status/1171778886646022145?s=20
_______________________________________________
PROJ mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/proj
Reply | Threaded
Open this post in threaded view
|

Re: PROJ grid files CDN

Nikolaos Ves
Willing to help vs committed to help it is a big difference. 

I support such an idea, but not through a voluntary base, maybe through OSGeo somehow?

On Fri, 13 Sep 2019 at 15:03, Howard Butler <[hidden email]> wrote:
All,

A few days ago, I tweeted [1] looking for a CDN partner to help us incrementally distribute the shift files that PROJ uses. After that generated some connections from organizations willing to help, I'm now asking the community if there is support for such an idea.

The current software approach PROJ uses to apply grid shifts works just fine, but it has one significant deficiency that can cause PROJ to generate "incorrect" results – the need for proper grid shift files. Incorrect results can happen due to missing shift data that users didn't know they needed to download and put in a magical directory. Some distributions do not ship full copies of the shift files due to their size, and new versions of the grid shift files are released at a different release cadence than the releases of distributions, the EPSG database, or PROJ itself.

The current management of the grid shift files would be improved for many users by providing an optional online web service alternative for obtaining shift information. Some benefits of a web service approach include:

* users no longer have to manually fetch grid files and place them in PROJ_LIB
* full and accurate capability of the software would no longer require GBs of grid shift files
* the web service can manage and provide proper versioning for the shift files
* cache build up of the grid files could happen lazily so users end up locally mirroring what they actually use

I recognize that many do not want PROJ reaching out to a web service, and I would propose that the machinery to do this would be optionally compiled and optionally activated via environment variable or some similar mechanism. However, a significant portion of don't-even-know-they're-using-PROJ users could benefit from PROJ having the optional ability to do its best in application of grid shifts. 

Does the PROJ community support such an idea? How does management of grid shift data impact your ability to use PROJ, and what ideas do you have to help us improve it?  If the feedback on this proposal is generally positive, I will work with Even on writing an RFC of a proposed implementation.

Howard

[1] https://twitter.com/howardbutler/status/1171778886646022145?s=20
_______________________________________________
PROJ mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/proj

_______________________________________________
PROJ mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/proj
Reply | Threaded
Open this post in threaded view
|

Re: PROJ grid files CDN

Howard Butler-3


On Sep 13, 2019, at 9:41 AM, Nikolaos Ves <[hidden email]> wrote:

Willing to help vs committed to help it is a big difference. 

I support such an idea, but not through a voluntary base, maybe through OSGeo somehow?

My email may not have been so clear. I can make the technical implementation of this proposal happen, but without community support of the idea, it would be wasted effort. It would also be impolite to simply dump code implementing this into the codebase without first having some community discussion about what it hopes to achieve. My email looks for feedback on the idea and whether it brings up any technical or cultural challenges.


_______________________________________________
PROJ mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/proj
Reply | Threaded
Open this post in threaded view
|

Re: PROJ grid files CDN

Nikolaos Ves
I understand that it is a different conversation but i cannot stop to wonder that it would be also wasted effort if down the road the CDN providers ask for a reasonable amount of money to sustain their services when all proj users use their resources to fetch grid files. 

Technically, I am still trying to cope with the build changes proj brought to some frameworks/libraries, so as long it's **optional**, hey it's good to have ;) 

Out of curiosity, is there any other project with similar approach as the one suggested?

Nikos

On Fri, 13 Sep 2019 at 15:46, Howard Butler <[hidden email]> wrote:


On Sep 13, 2019, at 9:41 AM, Nikolaos Ves <[hidden email]> wrote:

Willing to help vs committed to help it is a big difference. 

I support such an idea, but not through a voluntary base, maybe through OSGeo somehow?

My email may not have been so clear. I can make the technical implementation of this proposal happen, but without community support of the idea, it would be wasted effort. It would also be impolite to simply dump code implementing this into the codebase without first having some community discussion about what it hopes to achieve. My email looks for feedback on the idea and whether it brings up any technical or cultural challenges.


_______________________________________________
PROJ mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/proj
Reply | Threaded
Open this post in threaded view
|

Re: PROJ grid files CDN

Howard Butler-3


On Sep 13, 2019, at 10:17 AM, Nikolaos Ves <[hidden email]> wrote:

I understand that it is a different conversation but i cannot stop to wonder that it would be also wasted effort if down the road the CDN providers ask for a reasonable amount of money to sustain their services when all proj users use their resources to fetch grid files. 

Fair point. I would think an organization like OSGeo could step forward (especially after we complete OSGeo project incubation) to provide the resources to cover in that situation.

Technically, I am still trying to cope with the build changes proj brought to some frameworks/libraries, so as long it's **optional**, hey it's good to have ;) 

Out of curiosity, is there any other project with similar approach as the one suggested?

Many, if not most, of the popular JavaScript libraries use CDNs as their primary distribution mechanism with the network and storage resources contributed by the CDNs themselves. The Python project uses Fastly, for example. I don't think we would be treading any new ground with the approach.


_______________________________________________
PROJ mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/proj
Reply | Threaded
Open this post in threaded view
|

Re: PROJ grid files CDN

Sean Gillies-3
In reply to this post by Howard Butler-3
Hi Howard 

> However, a significant portion of don't-even-know-they're-using-PROJ users could benefit from PROJ having the optional ability to do its best in application of grid shifts.

If the remote update feature will require extra build time configuration or environment configuration, how will these users benefit? If you don't know you're using PROJ, how do you know to enable this? Don't these users need it on by default?

I think there are also security considerations. Is PROJ proofed against malicious grid files like our browsers are against malicious javascript?

Somewhat off topic: should PROJ be returning incorrect results in the absence of grid files? Should it not raise an exception instead?


On Fri, Sep 13, 2019 at 8:03 AM Howard Butler <[hidden email]> wrote:
All,

A few days ago, I tweeted [1] looking for a CDN partner to help us incrementally distribute the shift files that PROJ uses. After that generated some connections from organizations willing to help, I'm now asking the community if there is support for such an idea.

The current software approach PROJ uses to apply grid shifts works just fine, but it has one significant deficiency that can cause PROJ to generate "incorrect" results – the need for proper grid shift files. Incorrect results can happen due to missing shift data that users didn't know they needed to download and put in a magical directory. Some distributions do not ship full copies of the shift files due to their size, and new versions of the grid shift files are released at a different release cadence than the releases of distributions, the EPSG database, or PROJ itself.

The current management of the grid shift files would be improved for many users by providing an optional online web service alternative for obtaining shift information. Some benefits of a web service approach include:

* users no longer have to manually fetch grid files and place them in PROJ_LIB
* full and accurate capability of the software would no longer require GBs of grid shift files
* the web service can manage and provide proper versioning for the shift files
* cache build up of the grid files could happen lazily so users end up locally mirroring what they actually use

I recognize that many do not want PROJ reaching out to a web service, and I would propose that the machinery to do this would be optionally compiled and optionally activated via environment variable or some similar mechanism. However, a significant portion of don't-even-know-they're-using-PROJ users could benefit from PROJ having the optional ability to do its best in application of grid shifts. 

Does the PROJ community support such an idea? How does management of grid shift data impact your ability to use PROJ, and what ideas do you have to help us improve it?  If the feedback on this proposal is generally positive, I will work with Even on writing an RFC of a proposed implementation.

Howard

[1] https://twitter.com/howardbutler/status/1171778886646022145?s=20

--
Sean Gillies

_______________________________________________
PROJ mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/proj
Reply | Threaded
Open this post in threaded view
|

Re: PROJ grid files CDN

Greg Troxel-2
In reply to this post by Howard Butler-3
Howard Butler <[hidden email]> writes:

> A few days ago, I tweeted [1] looking for a CDN partner to help us
> incrementally distribute the shift files that PROJ uses. After that
> generated some connections from organizations willing to help, I'm now
> asking the community if there is support for such an idea.
>
> The current software approach PROJ uses to apply grid shifts works
> just fine, but it has one significant deficiency that can cause PROJ
> to generate "incorrect" results – the need for proper grid shift
> files. Incorrect results can happen due to missing shift data that
> users didn't know they needed to download and put in a magical
> directory. Some distributions do not ship full copies of the shift
> files due to their size, and new versions of the grid shift files are
> released at a different release cadence than the releases of
> distributions, the EPSG database, or PROJ itself.

That raises several issues:

  I really don't understand the notion of incorrect results from missing
  shift files as other than a bug.  I know we've discussed this before,
  but it seems that a transform that needs grid files should use grid
  files, and fail if they are not there.  I realize there is some notion
  of lower and higher accuracy transforms, but it seems dangerous to
  have different outcomes based on grids being installed (vs flags
  asking to not use them).  I acknowledge that this is a preference to
  repeatable results over convenience.

  One issue with grids seems to be licensing.  I have not investigated
  deeply, but it seems some have terms that make it difficult to
  distribute, and some are non-Free (being an issue for Debian, perhaps,
  an in pkgsrc requiring a non-Free license tag, sort of the same
  thing).  Are all of the grids you are talking about able to be
  distributed (verbatim) via no-cost internet downloads?  As part of a
  paid-for CDROM that aggregates many things?

  My impression is that grids are not included in proj packages because
  they are large, in addition to the above.

  In general, within the context of a packaging system, the right thing
  for additional files is to have packages for them, rather than
  inventing a sort of packaging system managed by some particular
  program.

  I don't follow the release cadence point at all.  Yes, proj has
  releases, grid shifts have releases, and these all make their way into
  packaging systems, which then have releases.  Generally people want to
  run more recent versions, except for some people that choose to run
  old software (which they call LTS).

  Your point about packaging systems not having grid shift packages due
  to size is fair; it's not clear how to deal with that.

> The current management of the grid shift files would be improved for
> many users by providing an optional online web service alternative for
> obtaining shift information. Some benefits of a web service approach
> include:
>
> * users no longer have to manually fetch grid files and place them in PROJ_LIB
> * full and accurate capability of the software would no longer require GBs of grid shift files
> * the web service can manage and provide proper versioning for the shift files
> * cache build up of the grid files could happen lazily so users end up locally mirroring what they actually use

Further issues:

  Generally, packaging systems install proj in a system directory, such
  that regular users cannot write the directories and files.  So a user
  running proj would not be able to write to the system proj directory.
  Do you envision the files going into some per-user directory in their
  homedir?

  Generally, packaging systems consider it a bug when programs in
  packages do automatic downloading.  Partly this is because one should
  be able to install a package to an offline computer.

  If what you're really talking about first is a CDN URL to download
  grids, mirroring the current download area, that sounds fine, but it
  seems somewhat separable from the autodownload notion.

> I recognize that many do not want PROJ reaching out to a web service,
> and I would propose that the machinery to do this would be optionally
> compiled and optionally activated via environment variable or some
> similar mechanism. However, a significant portion of
> don't-even-know-they're-using-PROJ users could benefit from PROJ
> having the optional ability to do its best in application of grid
> shifts.

Agreed that if it happens it should be opt in.

> Does the PROJ community support such an idea? How does management of
> grid shift data impact your ability to use PROJ, and what ideas do you
> have to help us improve it?  If the feedback on this proposal is
> generally positive, I will work with Even on writing an RFC of a
> proposed implementation.

_______________________________________________
PROJ mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/proj
Reply | Threaded
Open this post in threaded view
|

Re: PROJ grid files CDN

Greg Troxel-2
In reply to this post by Sean Gillies-3
> I think there are also security considerations. Is PROJ proofed against
> malicious grid files like our browsers are against malicious javascript?

Good point.  It seems that proj releases then need hashes of grids;
perhaps they should be fetched by hash.
_______________________________________________
PROJ mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/proj
Reply | Threaded
Open this post in threaded view
|

Re: PROJ grid files CDN

Even Rouault-2
In reply to this post by Sean Gillies-3
> If the remote update feature will require extra build time configuration or
> environment configuration, how will these users benefit? If you don't know
> you're using PROJ, how do you know to enable this? Don't these users need
> it on by default?

Good point. I can imagine that an application with a GUI such as QGIS could
ask the question to the user "QGIS might download resource files needed for
coordinate transformation. Do you agree ?", and if they approve, set the
appropriate environment variable.

A realistic use case for this download-on-demand capaibility is that you know
that PROJ exists, that you are going to use it, but you don't know in advance
which part(s) of the world you're going to work on, and don't want to / cannot
download data for the whole world in advance.

>
> I think there are also security considerations. Is PROJ proofed against
> malicious grid files like our browsers are against malicious javascript?

In that context, one should indeed have a deeper look at how it opens them and
try to secure that (with the curent raw formats which are quite simple, that
shouldn't be too challenging). That said, the resources it would fetch would
not be random, so unless a hostile party manages to upload a corrupted file in
the CDN storage (or changes entries in the local proj.db), the set of what is
access should be rather well defined.

Note: those concerns about security are already valid currently. For example
if using a PROJ string with a geoidgrids/nadgrids parameter that points to a
local file that would be hostile.

> Somewhat off topic: should PROJ be returning incorrect results in the
> absence of grid files? Should it not raise an exception instead?

There's no such thing as a "correct result" regarding coordinate
transformation that involve changes of datum. There are results that are more
or less accurate given the possibilities offered in the database which are
themselves somewhat arbitrarily available according to what national geodetic
agencies have provided to EPSG, which tranformation methods are actually
implemented in PROJ, etc. So depending on what is available and what your
needs are, you could get a result that is correct with an accuracy of 100m,
10m, 1m, 1cm...
In reality, the difference of accuracy between using a 7-parameter Helmert
transformation vs using a grid is quite often 1m vs 10cm, so raising an
exception when the grid is not there could be over-zealous if the user hasn't
required a particular level of acuracy and the result given by the 7-parameter
Helmert is just fine for them.
The lower level services of PROJ can give you the available possibilities,
with their accuracy and if some resources are missing or not, and where they
can be downloaded.


--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
PROJ mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/proj
Reply | Threaded
Open this post in threaded view
|

Re: PROJ grid files CDN

Howard Butler-3
In reply to this post by Greg Troxel-2


> On Sep 13, 2019, at 12:45 PM, Greg Troxel <[hidden email]> wrote:
>
>  I really don't understand the notion of incorrect results from missing  shift files as other than a bug.

That's why I scare quoted "incorrect", and Even's reply covers it thoroughly. If we were to error in the face of missing grids, the only thing a user without system access can do is to fully replicate PROJ_LIB somewhere locally and then copy in their own grids. It is massively inconvenient and easy to screw up.

>  One issue with grids seems to be licensing.  I have not investigated  deeply, but it seems some have terms that make it difficult to
>  distribute, and some are non-Free (being an issue for Debian, perhaps,  an in pkgsrc requiring a non-Free license tag, sort of the same thing).  Are all of the grids you are talking about able to be  distributed (verbatim) via no-cost internet downloads?  

I propose we wouldn't distribute anything via CDN that wouldn't meet Debian's notion of "free", and I would think that a distribution approach like this, if especially convenient, would encourage some of the licensing laggards of grids to follow along.

> As part of a paid-for CDROM that aggregates many things?

If you were to carefully inspect some of the paid-for CDROMs being distributed today, you're likely going to find these grid files regardless of the specific licensing language on some specific grids.

>  My impression is that grids are not included in proj packages because  they are large, in addition to the above.

Yes that's correct. They can be huge for the full set. That's why I'm proposing allowing a lazily-fetched CDN approach.

>  I don't follow the release cadence point at all.  Yes, proj has  releases, grid shifts have releases, and these all make their way into
>  packaging systems, which then have releases.  Generally people want to  run more recent versions, except for some people that choose to run  old software (which they call LTS).

You practically never want old grid data, unless you're trying to replicate something that happened in the past. This also brings up the problem of versioning of the grids, which is not handled currently.

>  Your point about packaging systems not having grid shift packages due  to size is fair; it's not clear how to deal with that.

Let the packagers continue to bulk copy the redistributable grids into packages and place them in PROJ_LIB. Fully unzipped, its ~650mb and growing. They can snap their packages and versions from the CDN.

> So a user  running proj would not be able to write to the system proj directory.  Do you envision the files going into some per-user directory in their  homedir?

Implementation detail, but yes, something like that.

>  Generally, packaging systems consider it a bug when programs in  packages do automatic downloading.  Partly this is because one should  be able to install a package to an offline computer.

That's why I propose it to be compile-time and runtime off-by-default. Specific packaging systems will have to determine if it is worth it to its users to do this. For front-end user-oriented applications where the users are not typically versed in the minutiae of geodesic transforms, I suspect the answer will be yes.

Howard


_______________________________________________
PROJ mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/proj
Reply | Threaded
Open this post in threaded view
|

Re: PROJ grid files CDN

Greg Troxel-2
Howard Butler <[hidden email]> writes:

>> On Sep 13, 2019, at 12:45 PM, Greg Troxel <[hidden email]> wrote:
>>
>>  I really don't understand the notion of incorrect results from
>>  missing shift files as other than a bug.
>
> That's why I scare quoted "incorrect", and Even's reply covers it
> thoroughly. If we were to error in the face of missing grids, the only
> thing a user without system access can do is to fully replicate
> PROJ_LIB somewhere locally and then copy in their own grids. It is
> massively inconvenient and easy to screw up.

I would think people could enable some PROJ_APPROXIMATE_OK environment
variable to proceed without grids.  Perhaps with a GUI helper.

>>  One issue with grids seems to be licensing.  I have not investigated
>> deeply, but it seems some have terms that make it difficult to
>> distribute, and some are non-Free (being an issue for Debian,
>> perhaps, an in pkgsrc requiring a non-Free license tag, sort of the
>> same thing).  Are all of the grids you are talking about able to be
>> distributed (verbatim) via no-cost internet downloads?
>
> I propose we wouldn't distribute anything via CDN that wouldn't meet
> Debian's notion of "free", and I would think that a distribution
> approach like this, if especially convenient, would encourage some of
> the licensing laggards of grids to follow along.

So what are you doing to do about grids that don't allow redistribution?
Some scheme to help people download from the original place?

What about grids that do not allow redistribution of modified copies?  I
would think there's a lot of that.  (The EPSG database seems non-Free,
but we aren't really dealing with that either.)

>> As part of a paid-for CDROM that aggregates many things?
>
> If you were to carefully inspect some of the paid-for CDROMs being
> distributed today, you're likely going to find these grid files
> regardless of the specific licensing language on some specific grids.

What people do contrary to terms is not really relevant.  I meant to
inquire about the actual terms, not whether other people routinely
violate them.  But if you are limiting to data that meets DFSG, that's
not relevant.  And, the CDROM thing has the same issue regardless of any
over-the-net fetching scheme.

>>  I don't follow the release cadence point at all.  Yes, proj has
>>  releases, grid shifts have releases, and these all make their way
>>  into packaging systems, which then have releases.  Generally people
>>  want to run more recent versions, except for some people that choose
>>  to run old software (which they call LTS).
>
> You practically never want old grid data, unless you're trying to
> replicate something that happened in the past. This also brings up the
> problem of versioning of the grids, which is not handled currently.

That's also true about old buggy software; one should not want that
either but people do.

But, I'd say that there is a larger problem, which is versions of grid
data.  It seems blindingly obvious that any data that is released should
be versioned, for all the usual reasons of knowing what you have,
knowing if you have to get new data, and recording an identifier for
what you used, for repeating calculations, understanding what happened,
etc.

I would suggest fixing the lack of versioning bugs first, before getting
into any autofetch stuff.  Surely with a CDN you want files with unique
names that never change anyway.

>>  Your point about packaging systems not having grid shift packages
>>  due to size is fair; it's not clear how to deal with that.
>
> Let the packagers continue to bulk copy the redistributable grids into
> packages and place them in PROJ_LIB. Fully unzipped, its ~650mb and
> growing. They can snap their packages and versions from the CDN.

I really don't understand the emphasis on CDN.  The big issues here are
having versions and a naming scheme, as well as a standard for how files
unpack and portable representions (not varying based on CPU type, word
size, and endianness0 and having a central place to get files from.

Then, a CDN solves the problem of not being able to handle or pay for
the bandwidth to that one place by caching.  But it's just an
optimization for fetching and does not bear on the actual hard problems,
as I see it.

>> So a user running proj would not be able to write to the system proj
>> directory.  Do you envision the files going into some per-user
>> directory in their homedir?
>
> Implementation detail, but yes, something like that.

OK - so multiple users would have multiple copies.   I'm not saying
that's terrible, but bears discussing/thinking.
_______________________________________________
PROJ mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/proj
Reply | Threaded
Open this post in threaded view
|

Re: PROJ grid files CDN

Greg Troxel-2
In reply to this post by Even Rouault-2
Even Rouault <[hidden email]> writes:

>> I think there are also security considerations. Is PROJ proofed against
>> malicious grid files like our browsers are against malicious javascript?
>
> In that context, one should indeed have a deeper look at how it opens them and
> try to secure that (with the curent raw formats which are quite simple, that
> shouldn't be too challenging). That said, the resources it would fetch would
> not be random, so unless a hostile party manages to upload a corrupted file in
> the CDN storage (or changes entries in the local proj.db), the set of what is
> access should be rather well defined.
>
> Note: those concerns about security are already valid currently. For example
> if using a PROJ string with a geoidgrids/nadgrids parameter that points to a
> local file that would be hostile.

Two separate issues:

  a file with an exploit

  just getting the wrong data.  packaging systems usually have crypto
  checksums, and hence I suggested having hashes.

> There's no such thing as a "correct result" regarding coordinate
> transformation that involve changes of datum. There are results that are more
> or less accurate given the possibilities offered in the database which are
> themselves somewhat arbitrarily available according to what national geodetic
> agencies have provided to EPSG, which tranformation methods are actually
> implemented in PROJ, etc. So depending on what is available and what your
> needs are, you could get a result that is correct with an accuracy of 100m,
> 10m, 1m, 1cm...

But still, there is a notion that:

  With this input, and this conversion string, I ran proj, and I
  (silently) got different results.

So it's not that the output is wrong in an egregious sense, it is that
that it's different in a way which perhaps should be controlled and
isn't.
_______________________________________________
PROJ mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/proj
Reply | Threaded
Open this post in threaded view
|

Re: PROJ grid files CDN

Martin Desruisseaux-3
Le 13/09/2019 à 20:38, Greg Troxel a écrit :

> With this input, and this conversion string, I ran proj, and I
> (silently) got different results.
>
The fact that since version 6, PROJ now has the capability to describe
the operation that it performs (thanks to WKT 2), may change the
perspective. This description includes whether the operation uses grid
files or other methods, together with accuracy information. There is an
education to do with the users, for encouraging them to look at this WKT
string when they have doubts about what is going on. Is a grid file
used? Is axis order the expected one? Is the transformation time-depend?
Do I'm inside the operation domain of validity? I think it should almost
become a reflex to ask users "What is your coordinate operation WKT"
when an issue is suspected. In GeoTools, it was hugely helpful.

    Martin


_______________________________________________
PROJ mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/proj
Reply | Threaded
Open this post in threaded view
|

Re: PROJ grid files CDN

Sean Gillies-3
In reply to this post by Even Rouault-2

On Fri, Sep 13, 2019 at 12:07 PM Even Rouault <[hidden email]> wrote:
> If the remote update feature will require extra build time configuration or
> environment configuration, how will these users benefit? If you don't know
> you're using PROJ, how do you know to enable this? Don't these users need
> it on by default?

Good point. I can imagine that an application with a GUI such as QGIS could
ask the question to the user "QGIS might download resource files needed for
coordinate transformation. Do you agree ?", and if they approve, set the
appropriate environment variable.

A realistic use case for this download-on-demand capaibility is that you know
that PROJ exists, that you are going to use it, but you don't know in advance
which part(s) of the world you're going to work on, and don't want to / cannot
download data for the whole world in advance.

>
> I think there are also security considerations. Is PROJ proofed against
> malicious grid files like our browsers are against malicious javascript?

In that context, one should indeed have a deeper look at how it opens them and
try to secure that (with the curent raw formats which are quite simple, that
shouldn't be too challenging). That said, the resources it would fetch would
not be random, so unless a hostile party manages to upload a corrupted file in
the CDN storage (or changes entries in the local proj.db), the set of what is
access should be rather well defined.

Note: those concerns about security are already valid currently. For example
if using a PROJ string with a geoidgrids/nadgrids parameter that points to a
local file that would be hostile.

> Somewhat off topic: should PROJ be returning incorrect results in the
> absence of grid files? Should it not raise an exception instead?

There's no such thing as a "correct result" regarding coordinate
transformation that involve changes of datum. There are results that are more
or less accurate given the possibilities offered in the database which are
themselves somewhat arbitrarily available according to what national geodetic
agencies have provided to EPSG, which tranformation methods are actually
implemented in PROJ, etc. So depending on what is available and what your
needs are, you could get a result that is correct with an accuracy of 100m,
10m, 1m, 1cm...
In reality, the difference of accuracy between using a 7-parameter Helmert
transformation vs using a grid is quite often 1m vs 10cm, so raising an
exception when the grid is not there could be over-zealous if the user hasn't
required a particular level of acuracy and the result given by the 7-parameter
Helmert is just fine for them.
The lower level services of PROJ can give you the available possibilities,
with their accuracy and if some resources are missing or not, and where they
can be downloaded.

Thanks for the explanation, Even!

I think I'll be in roughly the same situation as QGIS developers. Have you and Howard considered grids-on-demand as a PROJ API that developers could use in QGIS or Rasterio or gdalwarp? Or as a service like DNS? Something about having it built into the library feels not right to me. I'm trying to think of a precedent for this and am drawing a blank.

-- 
Sean Gillies

_______________________________________________
PROJ mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/proj
Reply | Threaded
Open this post in threaded view
|

Re: PROJ grid files CDN

Even Rouault-2
> I think I'll be in roughly the same situation as QGIS developers. Have you
> and Howard considered grids-on-demand as a PROJ API that developers could
> use in QGIS or Rasterio or gdalwarp? Or as a service like DNS? Something
> about having it built into the library feels not right to me.

I have imagined that the network layer could be an interface, with a default
libcurl implementation, that applications/libraries could subsitute with their
own.
As some grids can be pretty big and we don't necessarily want to download them
in their entirety, that could be something like
DownloadRange(url, start_offset, length)
One use case where this solution is kind of compulsory is if you have to do
coordinate reprojection in a context like AWS Lambda with a very small
footprint for the application.

> I'm trying to
> think of a precedent for this and am drawing a blank.

On your phone, you have the Maps app, but this is just the app: the map
content is stored remotely, and you have to connect if the area of interest is
not in the cache.

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
PROJ mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/proj
Reply | Threaded
Open this post in threaded view
|

Re: PROJ grid files CDN

Clifford J Mugnier
Speaking of grids ... NGS has officially released GEOID18 this week for CONUS.

Clifford J. Mugnier, c.p., c.m.s.

Chief of Geodesy

LSU Center for GeoInformatics (ERAD 266)

Dept. of Civil Engineering (P.F. Taylor 3531)

LOUISIANA STATE UNIVERSITY

Baton Rouge, LA  70803

Academic: (225) 578-8536

Research: (225) 578-4578

Cell:             (225) 328-8975

honorary lifetime member, lsps

fellow emeritus, asprs

member, apsg



From: PROJ <[hidden email]> on behalf of Even Rouault <[hidden email]>
Sent: Friday, September 13, 2019 4:24 PM
To: [hidden email] <[hidden email]>
Subject: Re: [PROJ] PROJ grid files CDN
 
> I think I'll be in roughly the same situation as QGIS developers. Have you
> and Howard considered grids-on-demand as a PROJ API that developers could
> use in QGIS or Rasterio or gdalwarp? Or as a service like DNS? Something
> about having it built into the library feels not right to me.

I have imagined that the network layer could be an interface, with a default
libcurl implementation, that applications/libraries could subsitute with their
own.
As some grids can be pretty big and we don't necessarily want to download them
in their entirety, that could be something like
DownloadRange(url, start_offset, length)
One use case where this solution is kind of compulsory is if you have to do
coordinate reprojection in a context like AWS Lambda with a very small
footprint for the application.

> I'm trying to
> think of a precedent for this and am drawing a blank.

On your phone, you have the Maps app, but this is just the app: the map
content is stored remotely, and you have to connect if the area of interest is
not in the cache.

--
Spatialys - Geospatial professional services
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.spatialys.com&amp;data=02%7C01%7Ccjmce%40lsu.edu%7C5aec19e54c82469ca61808d73890cfaa%7C2d4dad3f50ae47d983a09ae2b1f466f8%7C0%7C1%7C637040066935026197&amp;sdata=g82a9EVFDu%2BdiVaGtPdTAQODd1uO7sLbNG4dGgDMmOo%3D&amp;reserved=0
_______________________________________________
PROJ mailing list
[hidden email]
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.osgeo.org%2Fmailman%2Flistinfo%2Fproj&amp;data=02%7C01%7Ccjmce%40lsu.edu%7C5aec19e54c82469ca61808d73890cfaa%7C2d4dad3f50ae47d983a09ae2b1f466f8%7C0%7C1%7C637040066935026197&amp;sdata=zs9eDbIrcZ5ioMWdI%2BVxAtNB%2Fqf7Zit5VahcpgdBBo8%3D&amp;reserved=0

_______________________________________________
PROJ mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/proj
Reply | Threaded
Open this post in threaded view
|

Re: PROJ grid files CDN

Nyall Dawson
In reply to this post by Howard Butler-3
On Sat, 14 Sep 2019 at 00:03, Howard Butler <[hidden email]> wrote:

> Does the PROJ community support such an idea? How does management of grid shift data impact your ability to use PROJ, and what ideas do you have to help us improve it?  If the feedback on this proposal is generally positive, I will work with Even on writing an RFC of a proposed implementation.

Bit late to this thread, sorry. While I think the proposal will be
very useful in some circumstances, it's not a fix-all and it's still
going to be very important for client applications to warn their users
whenever a grid is desirable but not available and provide some
user-friendly method to do this.

I'm thinking specifically of offline users here, or those with
unreliable/expensive internet connections. For these users it's
critical that they are alerted about missing grids instead of just
silently falling back to an inferior operation.

And we should also consider releasing all-in-one packages of end user
applications which include ALL the datum grid packages, so that users
who require an offline installer will be guaranteed to have these
available... (again, thinking of someone who goes out to the field in
a remote location without regular internet connections).

Nyall
_______________________________________________
PROJ mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/proj
Reply | Threaded
Open this post in threaded view
|

Re: PROJ grid files CDN

jmckenna
Administrator

> Bit late to this thread, sorry. While I think the proposal will be
> very useful in some circumstances, it's not a fix-all and it's still
> going to be very important for client applications to warn their users
> whenever a grid is desirable but not available and provide some
> user-friendly method to do this.
>
> I'm thinking specifically of offline users here, or those with
> unreliable/expensive internet connections. For these users it's
> critical that they are alerted about missing grids instead of just
> silently falling back to an inferior operation.
>
> And we should also consider releasing all-in-one packages of end user
> applications which include ALL the datum grid packages, so that users
> who require an offline installer will be guaranteed to have these
> available... (again, thinking of someone who goes out to the field in
> a remote location without regular internet connections).
>
> Nyall
>
I feel the same as Nyall, that users of PROJ need to be alerted when a
grid file is missing.

In the case of MS4W, I package all of the grids for Windows users, since
there is no notification for Windows users who lack those files.  I
think once the warning/notice is implemented, this will mean packagers
like me won't have to package all of the grid files anymore, since users
will be notified and can then download the files themselves.

-jeff

_______________________________________________
PROJ mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/proj