[gdal-dev] vsicurl configuration design decisions

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[gdal-dev] vsicurl configuration design decisions

Sean Gillies-3
Hi all,


> Starting with GDAL 2.3, options can be passed in the filename with the following syntax: /vsicurl/option1=val1[,optionN=valN]*,url=http://...

I'd like to discuss the design decisions that are being made here before this gets out into the world.

I'm uncomfortable with the way configuration is spread between environment variables, config options that surface in the API, and also in identifiers. I don't think it's a great idea to that expand the amount of configuration in dataset identifiers. It's redundant, the syntax is complicated, and it dilutes the network effects of reusing identifiers in our applications.

Are there specific advantages to this 

  ogrinfo -so /vsicurl/max_retry=10,url=https://example.com/poly.shp

that we can't also have with a curl-style

  ogrinfo -so --max-retry=10 /vsicurl/https://example.com/poly.shp

or, better yet, in my opinion

  ogrinfo -so --max-retry=10 https://example.com/poly.shp

on the command line?

--
Sean Gillies

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: vsicurl configuration design decisions

Frank Warmerdam
Sean,

The obvious answer is that interfaces that are organized around the
dataset name do not make it easy to transport other parameters in a
way that is specific to one dataset.

While I worry a bit about complex dataset syntaxes I do think there is
a power to embedding these options in the dataset name.  I sometimes
wish we had a more generalized way of wrapping options into dataset
names with clear escaping rules, and a way to avoid interference
between different levels of wrapping and virtualization.

I would add that I requested a mechanism to control /vsicurl/ retry
strategies as we this machinery widely and failure to do "normal
retries" in /vsicurl was a problem for us.

Best regards,
Frank


On Mon, Oct 9, 2017 at 7:18 PM, Sean Gillies <[hidden email]> wrote:

> Hi all,
>
> It's written in
> http://gdal.org/gdal_virtual_file_systems.html#gdal_virtual_file_systems_vsicurl:
>
>> Starting with GDAL 2.3, options can be passed in the filename with the
>> following syntax: /vsicurl/option1=val1[,optionN=valN]*,url=http://...
>
> I'd like to discuss the design decisions that are being made here before
> this gets out into the world.
>
> I'm uncomfortable with the way configuration is spread between environment
> variables, config options that surface in the API, and also in identifiers.
> I don't think it's a great idea to that expand the amount of configuration
> in dataset identifiers. It's redundant, the syntax is complicated, and it
> dilutes the network effects of reusing identifiers in our applications.
>
> Are there specific advantages to this
>
>   ogrinfo -so /vsicurl/max_retry=10,url=https://example.com/poly.shp
>
> that we can't also have with a curl-style
>
>   ogrinfo -so --max-retry=10 /vsicurl/https://example.com/poly.shp
>
> or, better yet, in my opinion
>
>   ogrinfo -so --max-retry=10 https://example.com/poly.shp
>
> on the command line?
>
> --
> Sean Gillies
>
> _______________________________________________
> gdal-dev mailing list
> [hidden email]
> https://lists.osgeo.org/mailman/listinfo/gdal-dev



--
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up   | Frank Warmerdam, [hidden email]
light and sound - activate the windows |
and watch the world go round - Rush    | Geospatial Software Developer
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: vsicurl configuration design decisions

Even Rouault-2
In reply to this post by Sean Gillies-3

Hi Sean,

 

>

> It's written in

> http://gdal.org/gdal_virtual_file_systems.html#gdal_virtual_file_systems_vsi

> curl

> > Starting with GDAL 2.3, options can be passed in the filename with the

>

> following syntax: /vsicurl/option1=val1[,optionN=valN]*,url=http://...

>

> I'd like to discuss the design decisions that are being made here before

> this gets out into the world.

>

> I'm uncomfortable with the way configuration is spread between environment

> variables, config options that surface in the API,

 

Just a precision: GDAL only reads configuration options with CPLGetConfigOption(key). Those can be implictly set through environment variables of the same name or with CPLSetConfigOption(key, value).

 

> and also in identifiers.

> I don't think it's a great idea to that expand the amount of configuration

> in dataset identifiers. It's redundant, the syntax is complicated,

 

Frank answered on the main motivations.

 

> and it

> dilutes the network effects of reusing identifiers in our applications.

 

Didn't understand what you meant with the above sentence.

 

>

> Are there specific advantages to this

>

> ogrinfo -so /vsicurl/max_retry=10,url=https://example.com/poly.shp

>

> that we can't also have with a curl-style

>

> ogrinfo -so --max-retry=10 /vsicurl/https://example.com/poly.shp

>

> or, better yet, in my opinion

>

> ogrinfo -so --max-retry=10 https://example.com/poly.shp

>

> on the command line?

 

One issue with you proposal is that it would require ogrinfo (or any utility) to go from the highest level abstraction layers of GDAL to the lowest ones.

 

When ogrinfo is provided

"/vsicurl/max_retry=10,url=https://example.com/poly.shp",

this is just a string used as a dataset name

 

It happily feeds it into GDALOpenEx(), which in turns proposes it sequentially to all drivers

 

The shapefile driver tries this string with VSIFOpenL(), which in turns iterates over all virtual file systems. The /vsicurl/ VFS happens to recognize it, manages to open the file. The shapefile driver can read the few first bytes from it and recognizes that it is a header of a shapefile, etc..

 

So in the current design neither the utility, nor GDALOpenEx(), or the drivers themselves really make a sense of that string. This is quite a strength at the architectural level. This also enables to pass such a string in a VRT file for example.

 

Regarding the direct use of http:// https:// , I also find it is a bit unfortunate that we can't use them directly and vsicurl machinery would be implictly used. It turns that historically we have the HTTP driver that triggers on such dataset name (ingesting the whole file into /vsimem/, and proposing it in turn to other drivers). There's also a few other drivers (DODS, etc..) that trigger on such names.

 

Even

 

 

--

Spatialys - Geospatial professional services

http://www.spatialys.com


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: vsicurl configuration design decisions

Sean Gillies-3
Hi Even,

On Tue, Oct 10, 2017 at 4:02 AM, Even Rouault <[hidden email]> wrote:

Hi Sean,

 

>

> It's written in

> http://gdal.org/gdal_virtual_file_systems.html#gdal_virtual_file_systems_vsi

> curl

> > Starting with GDAL 2.3, options can be passed in the filename with the

>

> following syntax: /vsicurl/option1=val1[,optionN=valN]*,url=http://...

>

> I'd like to discuss the design decisions that are being made here before

> this gets out into the world.

>

> I'm uncomfortable with the way configuration is spread between environment

> variables, config options that surface in the API,

 

Just a precision: GDAL only reads configuration options with CPLGetConfigOption(key). Those can be implictly set through environment variables of the same name or with CPLSetConfigOption(key, value).

 

> and also in identifiers.

> I don't think it's a great idea to that expand the amount of configuration

> in dataset identifiers. It's redundant, the syntax is complicated,

 

Frank answered on the main motivations.


Yes, I understand that adding syntax tied to new core GDAL functionality can turn already-deployed software into full-fledged cloud data consumers. For cloud data providers and customers this is a big win.
 

 

> and it

> dilutes the network effects of reusing identifiers in our applications.

 

Didn't understand what you meant with the above sentence.


I mean that having multiple names for datasets in our domain, https://example.com/foo.tif vs /vsicurl/https://example.com/foo.tif vs /viscurl/option1=val,url=https//example.com/foo.tif dilutes the power of the names and potentially reduces the network effects we could get by using fewer names. This is an abstract concern, however, and I don't want it to distract from talking about the design decisions.

 

>

> Are there specific advantages to this

>

> ogrinfo -so /vsicurl/max_retry=10,url=https://example.com/poly.shp

>

> that we can't also have with a curl-style

>

> ogrinfo -so --max-retry=10 /vsicurl/https://example.com/poly.shp

>

> or, better yet, in my opinion

>

> ogrinfo -so --max-retry=10 https://example.com/poly.shp

>

> on the command line?

 

One issue with you proposal is that it would require ogrinfo (or any utility) to go from the highest level abstraction layers of GDAL to the lowest ones.

 

When ogrinfo is provided

"/vsicurl/max_retry=10,url=https://example.com/poly.shp",

this is just a string used as a dataset name

 

It happily feeds it into GDALOpenEx(), which in turns proposes it sequentially to all drivers

 

The shapefile driver tries this string with VSIFOpenL(), which in turns iterates over all virtual file systems. The /vsicurl/ VFS happens to recognize it, manages to open the file. The shapefile driver can read the few first bytes from it and recognizes that it is a header of a shapefile, etc..

 

So in the current design neither the utility, nor GDALOpenEx(), or the drivers themselves really make a sense of that string. This is quite a strength at the architectural level. This also enables to pass such a string in a VRT file for example.


Is the future of open and creation options? Do you imagine this extended to, say, block size, compression, number of threads? An RFC that discussed the scope of this and at what level of abstraction it is implemented at might be warranted? I'd be happy to participate.
 

 

Regarding the direct use of http:// https:// , I also find it is a bit unfortunate that we can't use them directly and vsicurl machinery would be implictly used. It turns that historically we have the HTTP driver that triggers on such dataset name (ingesting the whole file into /vsimem/, and proposing it in turn to other drivers). There's also a few other drivers (DODS, etc..) that trigger on such names.

 

Even


On the other hand, https://example.com/foo.tif identifies only a single resource, whereas /viscurl/url=https://example.com/foo.tif can identify a GeoTIFF along with all of its sidecars. I presume that the new GDAL cloud utilities like gdal_cp.py take care of the auxiliary files, yes?

My final concern about the virtual file opening options is the syntax. These /vsicurl/option1=val1[,optionN=valN]*,url=http://example.com/foo.tif identifiers (or filenames or whatever we call them) may spread from GDAL into the wider geospatial programming domain. Speaking from my experience with Rasterio, open source Python GIS developers expect the /vsi* filenames to "just work" in all software. Can we consider using a more standard syntax? One that has parsers already deployed everywhere?

For example, /viscurl?option1=foo&option2=bar&url=https://example.com/foo.tif can be parsed by standard URL parsers such as Python's.

>>> from urllib.parse import urlparse, parse_qs
>>> urlparse('/viscurl?option1=foo&option2=bar&url=https://example.com/foo.tif')
ParseResult(scheme='', netloc='', path='/viscurl', params='', query='option1=foo&option2=bar&url=https://example.com/foo.tif', fragment='')
>>> from urllib.parse import parse_qs
>>> parse_qs(_.query)
{'option1': ['foo'], 'url': ['https://example.com/foo.tif'], 'option2': ['bar']}

That syntax gives the /vsi* filenames the form of a "reflector" URL such as we see in Google searches (for example: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0ahUKEwjC6e7hvevWAhXmjFQKHWsHDyMQFggmMAA&url=http%3A%2F%2Fwww.gdal.org%2F&usg=AOvVaw3fbRv5TusYwkXgz2Acf2kt) and there are abundant tools and a body of knowledge about how to parse and work with these.

--
Sean Gillies

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: vsicurl configuration design decisions

Even Rouault-2

Hi Sean,

 

> Is the future of open and creation options?

 

I don't understand your above sentence.

 

> Do you imagine this extended

> to, say, block size, compression, number of threads? An RFC that discussed

> the scope of this and at what level of abstraction it is implemented at

> might be warranted? I'd be happy to participate.

 

Not clear what you've in mind. Are you thinking about some formalism to define and specify options for VSI filenames ?

 

> On the other hand, https://example.com/foo.tif identifies only a single

> resource, whereas /viscurl/url=https://example.com/foo.tif can identify a

> GeoTIFF along with all of its sidecars. I presume that the new GDAL cloud

> utilities like gdal_cp.py take care of the auxiliary files, yes?

 

No. They should perhaps be named cpl_xxx since they really operate at the VSI/file level. Auxiliary/sidecar files are concepts that exist only at the driver level/

 

Copy of datasets + side car files can be done with "gdalmanage copy"

 

>

> My final concern about the virtual file opening options is the syntax.

> These /vsicurl/option1=val1[,optionN=valN]*,url=http://example.com/foo.tif

> identifiers (or filenames or whatever we call them) may spread from GDAL

> into the wider geospatial programming domain. Speaking from my experience

> with Rasterio, open source Python GIS developers expect the /vsi* filenames

> to "just work" in all software. Can we consider using a more standard

> syntax? One that has parsers already deployed everywhere?

 

I don't really see a use of parsing those /vsi names by user code. User code has to compose them, not parse them. But I can see your point for something more standardized.

 

> That syntax gives the /vsi* filenames the form of a "reflector" URL such as

> we see in Google searches (for example:

> https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0ahUKEwj

> C6e7hvevWAhXmjFQKHWsHDyMQFggmMAA&url=http%3A%2F%2Fwww.gdal.org%2F&usg=AOvVaw

> 3fbRv5TusYwkXgz2Acf2kt) and there are abundant tools and a body of knowledge

> about how to parse and work with these.

 

One downside is that you need to URLEncode the URL, which can make it painful when composing it at hand.

 

 

--

Spatialys - Geospatial professional services

http://www.spatialys.com


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: vsicurl configuration design decisions

Craig de Stigter-2
Hi folks

We're slightly invested in this because we use VSI paths reasonably heavily, though not so much for cloud services yet.
 
One downside is that you need to URLEncode the URL, which can make it painful when composing it at hand.

True, but that does eliminate ambiguity in the URL, and does so in a well-known way.

Does the current scheme use any encoding? How would you escape text in option values that might use `=` and `,` etc? Or are there guaranteed to be no freeform-text options in these paths?



Tangential, but related: I've also just discovered the 2.2+ curly-brace syntax for vsizip/vsitar paths, which allows nested archives:

/vsizip/{/vsizip/{/path/to/outer.zip}/path/to/inner.zip}/file.shp

The curly braces are a definite improvement on the ambiguous older syntax for these paths. However, we noted the nesting order looks inside-out, and thought it would have been more intuitive to put the path inside the archive in the braces. i.e. nesting would look like:
/vsizip//path/to/outer.zip/{/vsizip//path/to/inner.zip/{file.shp}}

Of course, this latter syntax was added in 2.2, so perhaps that ship has already sailed.

From our experiences with vsicurl and vsizip urls, it feels like eliminating ambiguity in these paths is pretty important, more so than trivial composability. Just my 2¢ :)


On 13 October 2017 at 07:42, Even Rouault <[hidden email]> wrote:

Hi Sean,

 

> Is the future of open and creation options?

 

I don't understand your above sentence.

 

> Do you imagine this extended

> to, say, block size, compression, number of threads? An RFC that discussed

> the scope of this and at what level of abstraction it is implemented at

> might be warranted? I'd be happy to participate.

 

Not clear what you've in mind. Are you thinking about some formalism to define and specify options for VSI filenames ?

 

> On the other hand, https://example.com/foo.tif identifies only a single

> resource, whereas /viscurl/url=https://example.com/foo.tif can identify a

> GeoTIFF along with all of its sidecars. I presume that the new GDAL cloud

> utilities like gdal_cp.py take care of the auxiliary files, yes?

 

No. They should perhaps be named cpl_xxx since they really operate at the VSI/file level. Auxiliary/sidecar files are concepts that exist only at the driver level/

 

Copy of datasets + side car files can be done with "gdalmanage copy"

 

>

> My final concern about the virtual file opening options is the syntax.

> These /vsicurl/option1=val1[,optionN=valN]*,url=http://example.com/foo.tif

> identifiers (or filenames or whatever we call them) may spread from GDAL

> into the wider geospatial programming domain. Speaking from my experience

> with Rasterio, open source Python GIS developers expect the /vsi* filenames

> to "just work" in all software. Can we consider using a more standard

> syntax? One that has parsers already deployed everywhere?

 

I don't really see a use of parsing those /vsi names by user code. User code has to compose them, not parse them. But I can see your point for something more standardized.

 

> That syntax gives the /vsi* filenames the form of a "reflector" URL such as

> we see in Google searches (for example:

> https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0ahUKEwj

> C6e7hvevWAhXmjFQKHWsHDyMQFggmMAA&url=http%3A%2F%2Fwww.gdal.org%2F&usg=AOvVaw

> 3fbRv5TusYwkXgz2Acf2kt) and there are abundant tools and a body of knowledge

> about how to parse and work with these.

 

One downside is that you need to URLEncode the URL, which can make it painful when composing it at hand.

 

 

--

Spatialys - Geospatial professional services

http://www.spatialys.com


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev



--
Regards,
Craig

Developer
Koordinates

<a href="tel:+64%2021%20256%209488" style="color:rgb(17,85,204)" target="_blank">+64 21 256 9488 / koordinates.com / @koordinates

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: vsicurl configuration design decisions

Even Rouault-2

Craig,

 

> True, but that does eliminate ambiguity in the URL, and does so in a

> well-known way.

>

> Does the current scheme use any encoding?

 

No

 

> How would you escape text in

> option values that might use `=` and `,` etc? Or are there guaranteed to be

> no freeform-text options in these paths?

 

Currently, given the supported set of options and values, yes. In case of future ambiguity, yes, we'd need to define some escaping rules.

 

>

>

>

> Tangential, but related: I've also just discovered the 2.2+ curly-brace

> syntax for vsizip/vsitar paths, which allows nested archives:

 

That's a side effect. The main motivation was that there are .tar or .zip files in the wild that for good or wrong reasons have non standard extensions.

 

>

> /vsizip/{/vsizip/{/path/to/outer.zip}/path/to/inner.zip}/file.shp

>

>

> The curly braces are a definite improvement on the ambiguous older syntax

> for these paths. However, we noted the nesting order looks inside-out, and

> thought it would have been more intuitive to put the path *inside* the

>

> archive in the braces. i.e. nesting would look like:

> > /vsizip//path/to/outer.zip/{/vsizip//path/to/inner.zip/{file.shp}}

 

Hum, that doesn't seem really better to me, and I can't see how that could be implemented. /vsizip//path/to/inner.zip/{file.shp} couldn't be succesfully resolved by the vsizip VFS since /path/to/inner.zip isn't a regular file itself.

 

/vsizip/{/vsizip/{/path/to/outer.zip}/path/to/inner.zip}/file.shp is really closer how it works internally, and is close to function-like syntax f(g(h(x))) where you start evaluating the most internal member.

 

So you have /vsizip/{something}/file.shp, which means we'll expose a file handle, for a file.shp in a zip archive accessed through "something"

"something" happens to expand to /vsizip/{/path/to/outer.zip}/path/to/inner.zip which means we will return a file (that happens to be itself a zip, but at that stage, only /vsizip/{something}/file.shp cares that is a zip), that is in a zip file /path/to/outer.zip, which happens to be a regular file.

 

 

>

> Of course, this latter syntax was added in 2.2, so perhaps that ship has

> already sailed.

 

Yes

 

 

Even

 

--

Spatialys - Geospatial professional services

http://www.spatialys.com


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: vsicurl configuration design decisions

Sean Gillies-3
Even, Craig,

On Thu, Oct 12, 2017 at 4:15 PM, Even Rouault <[hidden email]> wrote:

Craig,

 

> True, but that does eliminate ambiguity in the URL, and does so in a

> well-known way.


I'd like to point out that very often URLs in a query string do not need to be encoded. Both the Python (see my earlier example) and Node standard parsers will handle the string

/viscurl?option1=foo&option2=bar&url=https://example.com/foo.tif

without any URL encoding.

> url.parse('/viscurl?option1=foo&option2=bar&url=https://example.com/foo.tif', true)
Url {
  protocol: null,
  slashes: null,
  auth: null,
  host: null,
  port: null,
  hostname: null,
  hash: null,
  search: '?option1=foo&option2=bar&url=https://example.com/foo.tif',
  query:
   { option1: 'foo',
     option2: 'bar',
     url: 'https://example.com/foo.tif' },
  pathname: '/viscurl',
  path: '/viscurl?option1=foo&option2=bar&url=https://example.com/foo.tif',
  href: '/viscurl?option1=foo&option2=bar&url=https://example.com/foo.tif' }

If the web resource has a query string of its own, it will certainly have to be encoded.

Typing URLs by hand into the console, URL-encoded or not, is always going to invite mistakes. In my experience, we're more likely to be selecting and copying from a UI element or the output of another program, and it would be unfortunate to trade away the benefits of URL standards only to make typing easier.

>

> Does the current scheme use any encoding?

 

No

 

> How would you escape text in

> option values that might use `=` and `,` etc? Or are there guaranteed to be

> no freeform-text options in these paths?

 

Currently, given the supported set of options and values, yes. In case of future ambiguity, yes, we'd need to define some escaping rules.


The web already has escaping rules built in, one of the benefits I alluded to above.

--
Sean Gillies

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: vsicurl configuration design decisions

Even Rouault-2
Sean,

> I'd like to point out that very often URLs in a query string do not need to
> be encoded. Both the Python (see my earlier example) and Node standard
> parsers will handle the string
>

While researching that issue, and from my pat memories, I found that there is
a bit of confusion around the subject when to escape or not

> /viscurl?option1=foo&option2=bar&url=https://example.com/foo.tif
>
> without any URL encoding.

https://tools.ietf.org/html/rfc3986 says in "3.4.  Query"

"""
      query       = *( pchar / "/" / "?" )

   The characters slash ("/") and question mark ("?") may represent data
   within the query component.  Beware that some older, erroneous
   implementations may not handle such data correctly when it is used as
   the base URI for relative references (Section 5.1), apparently
   because they fail to distinguish query data from path data when
   looking for hierarchical separators.  However, as query components
   are often used to carry identifying information in the form of
   "key=value" pairs and one frequently used value is a reference to
   another URI, it is sometimes better for usability to avoid percent-
   encoding those characters.
"""

But urrlib.urlencode() encodes slashes in values of query arguments, so they
probably decided to avoid isues with the above mentionned older, erroneous
implementations

>>> urllib.urlencode({'foo':'bar', 'url': 'http://example.com'})
'url=http%3A%2F%2Fexample.com&foo=bar'

> The web already has escaping rules built in, one of the benefits I alluded
> to above.

OK, let's follow your suggestion of using URL query string formatting, while
this hasn't gone in a official release. Could you create a ticket about that ?

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: vsicurl configuration design decisions

Sean Gillies-3

On Fri, Oct 13, 2017 at 4:29 PM, Even Rouault <[hidden email]> wrote:
Sean,

> I'd like to point out that very often URLs in a query string do not need to
> be encoded. Both the Python (see my earlier example) and Node standard
> parsers will handle the string
>

While researching that issue, and from my pat memories, I found that there is
a bit of confusion around the subject when to escape or not

> /viscurl?option1=foo&option2=bar&url=https://example.com/foo.tif
>
> without any URL encoding.

https://tools.ietf.org/html/rfc3986 says in "3.4.  Query"

"""
      query       = *( pchar / "/" / "?" )

   The characters slash ("/") and question mark ("?") may represent data
   within the query component.  Beware that some older, erroneous
   implementations may not handle such data correctly when it is used as
   the base URI for relative references (Section 5.1), apparently
   because they fail to distinguish query data from path data when
   looking for hierarchical separators.  However, as query components
   are often used to carry identifying information in the form of
   "key=value" pairs and one frequently used value is a reference to
   another URI, it is sometimes better for usability to avoid percent-
   encoding those characters.
"""

But urrlib.urlencode() encodes slashes in values of query arguments, so they
probably decided to avoid isues with the above mentionned older, erroneous
implementations

>>> urllib.urlencode({'foo':'bar', 'url': 'http://example.com'})
'url=http%3A%2F%2Fexample.com&foo=bar'

> The web already has escaping rules built in, one of the benefits I alluded
> to above.

OK, let's follow your suggestion of using URL query string formatting, while
this hasn't gone in a official release. Could you create a ticket about that ?

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com



--
Sean Gillies

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev