[gdal-dev] Call for discussion on RFC 70: Guessing output format from output file name extension

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[gdal-dev] Call for discussion on RFC 70: Guessing output format from output file name extension

Even Rouault-2

Hi,

 

This is a call for discussion on:

https://trac.osgeo.org/gdal/wiki/rfc70_output_format_guess

 

Summary:

This proposal is to add syntaxic sugar to make GDAL and OGR command line utilities, so they take into account the extension of the output filename to guess which output driver to use, when it is not explicitly specified.

 

Even

 

--

Spatialys - Geospatial professional services

http://www.spatialys.com


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Call for discussion on RFC 70: Guessing output format from output file name extension

Sean Gillies-3
Hi Even,

This seems reasonable to me.

What will be done about the GeoJSON driver case, where we have one driver covering both GeoJSON and EsriJSON (and other JSONs)? If a user specifies foo.json which of the GeoJSON driver modes do they activate?


On Sat, Aug 19, 2017 at 2:49 PM, Even Rouault <[hidden email]> wrote:

Hi,

 

This is a call for discussion on:

https://trac.osgeo.org/gdal/wiki/rfc70_output_format_guess

 

Summary:

This proposal is to add syntaxic sugar to make GDAL and OGR command line utilities, so they take into account the extension of the output filename to guess which output driver to use, when it is not explicitly specified.

 

Even

 

--

Spatialys - Geospatial professional services

http://www.spatialys.com


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev



--
Sean Gillies

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Call for discussion on RFC 70: Guessing output format from output file name extension

Even Rouault-2

On lundi 21 août 2017 14:30:15 CEST Sean Gillies wrote:

> Hi Even,

>

> This seems reasonable to me.

>

> What will be done about the GeoJSON driver case, where we have one driver

> covering both GeoJSON and EsriJSON (and other JSONs)? If a user specifies

> foo.json which of the GeoJSON driver modes do they activate?

 

Hi Sean,

 

Exactly like when you specify -f GeoJSON, ie "true" GeoJSON (there's only write support for GeoJSON in the driver. EsriJSON and TopJSON are read-only currently). If you need explicit RFC7946 compliance, you'll still need to add -lco RFC7946=YES

 

Even

 

>

>

> On Sat, Aug 19, 2017 at 2:49 PM, Even Rouault <[hidden email]>

>

> wrote:

> > Hi,

> >

> >

> >

> > This is a call for discussion on:

> >

> > https://trac.osgeo.org/gdal/wiki/rfc70_output_format_guess

> >

> >

> >

> > Summary:

> >

> > This proposal is to add syntaxic sugar to make GDAL and OGR command line

> > utilities, so they take into account the extension of the output filename

> > to guess which output driver to use, when it is not explicitly specified.

> >

> >

> >

> > Even

> >

> >

> >

> > --

> >

> > Spatialys - Geospatial professional services

> >

> > http://www.spatialys.com

> >

> > _______________________________________________

> > gdal-dev mailing list

> > [hidden email]

> > https://lists.osgeo.org/mailman/listinfo/gdal-dev

 

 

--

Spatialys - Geospatial professional services

http://www.spatialys.com


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Call for discussion on RFC 70: Guessing output format from output file name extension

Even Rouault-2
In reply to this post by Even Rouault-2

On samedi 19 août 2017 22:49:19 CET Even Rouault wrote:

> Hi,

>

> This is a call for discussion on:

> https://trac.osgeo.org/gdal/wiki/rfc70_output_format_guess

>

> Summary:

> This proposal is to add syntaxic sugar to make GDAL and OGR command line

> utilities, so they take into account the extension of the output filename to

> guess which output driver to use, when it is not explicitly specified.

>

 

I've found some time to do the initial implementation (for C++ utilities now):

https://github.com/rouault/gdal2/tree/rfc70

 

It went mostly along my initial ideas.

 

I've also updated the RFC text to reflect a change in behaviour w.r.t my initial ideas:

"""

When several drivers declare this extension (for example KML and LIBKML for .kml), the utility will select the first registered driver (except netCDF instead of GMT for .nc files), and a warning is emitted specifying which driver is used

"""

 

If there are no further comment, I'll call for a vote next week.

 

Even

 

--

Spatialys - Geospatial professional services

http://www.spatialys.com


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Call for discussion on RFC 70: Guessing output format from output file name extension

mj10777
Reading the Magic-Header signatures first and then guessing from the file extension may be a better approach.

In spatialite gaia_exif.c, the function gaiaGuessBlobType has a collection of signatures to start with, combined with spatialite.c guess_mime_type returning the mime type as string.

  //----------------------------------------------------------
  //----------------------------------------------------------

Mark Johnson, Berlin Germany

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Call for discussion on RFC 70: Guessing output format from output file name extension

jratike80
mj10777 wrote
> Reading the Magic-Header signatures first and then guessing from the file
> extension may be a better approach.

Hi,

The title says "Guessing output format from output file name extension",
like in

ogr2ogr output.gpkg input.shp

Where would you read the Magic-Headers in this case?

-Jukka Rahkonen-





--
Sent from: http://osgeo-org.1560.x6.nabble.com/GDAL-Dev-f3742093.html
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Call for discussion on RFC 70: Guessing output format from output file name extension

mj10777
In reply to this post by Even Rouault-2
ogr2ogr output.gpkg input.shp Where would you read the Magic-Headers in this case?
Before the input file is opened.
Read the first 100 bytes, compare with known signatures


if ((memcmp (blob, tiff_signature_big, 4) == 0) &&
    (memcmp (blob, tiff_signature_little, 4) == 0))
{
  return type_tiff;
}
else if (strncmp ((char *) blob, "%PDF-", 5) == 0)
{
  return type_pdf;
}
else if (strncmp ((char *) blob, "SQLite format 3", 14) == 0)
{
  return type_sqlite3; // even if the extension is called '.db' or '.blombo'
}
else if (strncmp ((char *) blob, "** This file contains an SQLite 2", 14) == 0)
{
  return type_sqlite2; // even if the extension is called '.db'
}
else if ((memcmp (blob, jp2_big, 12) == 0) &&
    (memcmp (blob, jp2_little, 12) == 0))
{
  return type_jp2;
}
else if ((memcmp (blob,jpeg1_signature, 2) == 0) &&
    (memcmp (blob, blob + size - 2, jpeg2_signature, 2) == 0))
{
  return type_jp2;
}
else // for other signatures
{
  return type_other;
}
else // for file extensions
{
  return type_kmz;
}

A Sqlite3 Database can have any type of extension.
Sometimes even '.sdb' is used, which normally is reserved for a Sybase-Database - which is also a file based Database with its own Magic Number.

To reliably determine this you use 'Magic Numbers', so that you know trough knowledge instead of guessing.

'Magic Numbers' were introduced to avoid this problem, since a user could rename the file to anything they want.

The above code is based on the used spatialite code, which read a BLOB, where the file is stored and thus has no file-name or extension to read.

Reading the Magic-Header signatures first and then guessing from the file extension may be a better approach.
Even if the title uses the word 'Guessing', that is not a reason not so suggest a more practical approach that, is less error prone.

Mark


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Call for discussion on RFC 70: Guessing output format from output file name extension

Even Rouault-2

On vendredi 10 novembre 2017 13:40:32 CET Mark Johnson wrote:

> > ogr2ogr output.gpkg input.shp Where would you read the Magic-Headers in

> > this case?

>

> Before the input file is opened.

 

Mark, this RFC is about *non existing yet output* files (can't think how to make its title clearer :-)), so there's nothing to read. The only thing that can be used to guess the appropriate output driver is the filename.

 

What you describe is what has been implement since 19 years in GDAL to figure out which input driver matches an existing file ;-)

 

--

Spatialys - Geospatial professional services

http://www.spatialys.com


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Call for discussion on RFC 70: Guessing output format from output file name extension

mj10777
In reply to this post by Even Rouault-2
Sorry, I overlooked the 'output filename to guess', while reading the PR and title.

Mark


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev