[gdal-dev] vsipreload: enabling VSI Virtual File API for regular I/O

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[gdal-dev] vsipreload: enabling VSI Virtual File API for regular I/O

Even Rouault
Hi,

I've just commited in trunk a new file port/vsipreload.cpp.

This file is the source code for a standalone shared library that can be
LD_PRELOAD'ed as an overload of libc to enable VSI Virtual FILE API to be used
with binaries using regular libc I/O API.

WARNING: Linux glibc ONLY. Might work with some adaptations (mainly around
64bit symbols) on other Unix systems

Compile:
g++ -Wall -fPIC port/vsipreload.cpp -shared -o vsipreload.so -Iport \
        -L. -L.libs -lgdal

Examples :

LD_PRELOAD=vsipreload.so gdalinfo \
        /vsicurl/http://download.osgeo.org/gdal/data/ecw/spif83.ecw

LD_PRELOAD=vsipreload.so gdalinfo \
 'HDF4_EOS:EOS_GRID:"/vsicurl/http://download.osgeo.org/gdal/data/hdf4/MOD09Q1G_EVI.A2006233.h07v03.005.2008338190308.hdf":MODIS_NACP_EVI:MODIS_EVI'

LD_PRELOAD=vsipreload.so ogrinfo \
         /vsicurl/http://svn.osgeo.org/gdal/trunk/autotest/ogr/data/testavc -ro

LD_PRELOAD=vsipreload.so ogrinfo -ro \
       
/vsizip//vsicurl/http://download.osgeo.org/gdal/1.10.0/gdalautotest-1.10.0.zip/gdalautotest-1.10.0/ogr/data/testavc

It can work even with non GDAL binaries :

LD_PRELOAD=vsipreload.so h5dump -d /x \
        /vsicurl/http://download.osgeo.org/gdal/data/netcdf/utm-big-chunks.nc

LD_PRELOAD=vsipreload.so sqlite3 \
         /vsicurl/http://download.osgeo.org/gdal/data/sqlite3/polygon.db \
        "select * from polygon limit 10"

This can work with all VSI Large File API filesystems : /vsizip/ , /vsitar/,
/vsisubfile/ , etc...

This is still a bit experimental in the sense that only the most common glibc
I/O API have been overloaded. If exotic ones are used with /vsi files, crashes
are likely.

--
Geospatial professional services
http://even.rouault.free.fr/services.html
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: vsipreload: enabling VSI Virtual File API for regular I/O

Howard Butler

On May 26, 2013, at 9:46 AM, Even Rouault <[hidden email]> wrote:
> This file is the source code for a standalone shared library that can be
> LD_PRELOAD'ed as an overload of libc to enable VSI Virtual FILE API to be used
> with binaries using regular libc I/O API.

> This can work with all VSI Large File API filesystems : /vsizip/ , /vsitar/,
> /vsisubfile/ , etc...

Cool!

Next question, if you'll pardon my ignorance, why would we ever want to do this? What does the VSI preload provide?

Howard
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: vsipreload: enabling VSI Virtual File API for regular I/O

Etienne Tourigny-3


On Tue, May 28, 2013 at 11:24 AM, Howard Butler <[hidden email]> wrote:

On May 26, 2013, at 9:46 AM, Even Rouault <[hidden email]> wrote:
> This file is the source code for a standalone shared library that can be
> LD_PRELOAD'ed as an overload of libc to enable VSI Virtual FILE API to be used
> with binaries using regular libc I/O API.

> This can work with all VSI Large File API filesystems : /vsizip/ , /vsitar/,
> /vsisubfile/ , etc...

Cool!

Way cool, Even! I'll test it when I have the time.
 

Next question, if you'll pardon my ignorance, why would we ever want to do this? What does the VSI preload provide?


You can open files over ftp/http connexions and files that are compressed inside zip/tgz/gz archives. This was not previously possible with certain drivers (hdf4, netcdf, etc.)
 

Howard
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev


_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: vsipreload: enabling VSI Virtual File API for regular I/O

Even Rouault
Le mardi 28 mai 2013 16:27:21, Etienne Tourigny a écrit :

> On Tue, May 28, 2013 at 11:24 AM, Howard Butler <[hidden email]> wrote:
> > On May 26, 2013, at 9:46 AM, Even Rouault <[hidden email]>
> >
> > wrote:
> > > This file is the source code for a standalone shared library that can
> > > be LD_PRELOAD'ed as an overload of libc to enable VSI Virtual FILE API
> > > to
> >
> > be used
> >
> > > with binaries using regular libc I/O API.
> > >
> > > This can work with all VSI Large File API filesystems : /vsizip/ ,
> >
> > /vsitar/,
> >
> > > /vsisubfile/ , etc...
> >
> > Cool!
>
> Way cool, Even! I'll test it when I have the time.
>
> > Next question, if you'll pardon my ignorance, why would we ever want to
> > do this? What does the VSI preload provide?
>
> You can open files over ftp/http connexions and files that are compressed
> inside zip/tgz/gz archives. This was not previously possible with certain
> drivers (hdf4, netcdf, etc.)

Yes exactly, my examples were supposed to illustrate the use cases where it
was needed. A few drivers do not support the /vsi file systems because they use
directly the standard IO functions and provide no way of redirecting them
through the VSI virtual file API, because their underlying library doesn't offer
this capability.

An alternate way of offering the same capability would be to write a fuse
module (would probably be less hackish) since the API of fuse modules is much
more limited than the one of glibc (you have only one read function not read()
and fread() )!

>
> > Howard
> > _______________________________________________
> > gdal-dev mailing list
> > [hidden email]
> > http://lists.osgeo.org/mailman/listinfo/gdal-dev

--
Geospatial professional services
http://even.rouault.free.fr/services.html
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: vsipreload: enabling VSI Virtual File API for regular I/O

Howard Butler

On May 28, 2013, at 12:59 PM, Even Rouault <[hidden email]> wrote:
> Yes exactly, my examples were supposed to illustrate the use cases where it
> was needed. A few drivers do not support the /vsi file systems because they use
> directly the standard IO functions and provide no way of redirecting them
> through the VSI virtual file API, because their underlying library doesn't offer
> this capability.

Sorry I didn't read far enough down to the examples. I saw "preload" and ".so" and immediately thought of symbol chicanery in the context of tracking memory allocations and the like, not twisting curl'able datasources into FILE*'s.

Now that I have your attention, I brought up with Frank at FOSS4GNA that there could sometimes be a need for both MEM drivers to spool off to disk through some sort of out-of-core mmap'd allocation. Does this currently exist in VSI, and if not, do you see any use for such a thing? My thought was there might be scenarios where someone working with multi-gb (or worse) raster data sources where you'd like to control the paging (ie, you have SSD, or you want to create intermediates for some weird processing chain). Maybe not all that useful in exchange for the added complexity...

Howard
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: vsipreload: enabling VSI Virtual File API for regular I/O

Etienne Tourigny-3
In reply to this post by Even Rouault


On Sun, May 26, 2013 at 11:46 AM, Even Rouault <[hidden email]> wrote:
Hi,

I've just commited in trunk a new file port/vsipreload.cpp.

This file is the source code for a standalone shared library that can be
LD_PRELOAD'ed as an overload of libc to enable VSI Virtual FILE API to be used
with binaries using regular libc I/O API.

WARNING: Linux glibc ONLY. Might work with some adaptations (mainly around
64bit symbols) on other Unix systems

Compile:
g++ -Wall -fPIC port/vsipreload.cpp -shared -o vsipreload.so -Iport \
        -L. -L.libs -lgdal

Examples :

LD_PRELOAD=vsipreload.so gdalinfo \
        /vsicurl/http://download.osgeo.org/gdal/data/ecw/spif83.ecw

Even,

I have built vsipreload with a fresh build of svn trunk, but I get the following error:

$ g++ -Wall -fPIC port/vsipreload.cpp -shared -o vsipreload.so -Iport -L. -L.libs -lgdal
$ LD_PRELOAD=vsipreload.so gdalinfo         /vsicurl/http://download.osgeo.org/gdal/data/ecw/spif83.ecw
ERROR: ld.so: object 'vsipreload.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR 4: `/vsicurl/http://download.osgeo.org/gdal/data/ecw/spif83.ecw' not recognised as a supported file format.

gdalinfo failed - unable to open '/vsicurl/http://download.osgeo.org/gdal/data/ecw/spif83.ecw'.

Is there a way to find out why it cannot be preloaded? Perhaps the compilation options should be tweaked?

Thanks,
Etienne

My system is Linux Mint 13 ubuntu (quantal), gcc version 4.7.2, 64bits, 3.5.0-30-generic kernel



LD_PRELOAD=vsipreload.so gdalinfo \
 'HDF4_EOS:EOS_GRID:"/vsicurl/http://download.osgeo.org/gdal/data/hdf4/MOD09Q1G_EVI.A2006233.h07v03.005.2008338190308.hdf":MODIS_NACP_EVI:MODIS_EVI'

LD_PRELOAD=vsipreload.so ogrinfo \
         /vsicurl/http://svn.osgeo.org/gdal/trunk/autotest/ogr/data/testavc -ro

LD_PRELOAD=vsipreload.so ogrinfo -ro \

/vsizip//vsicurl/http://download.osgeo.org/gdal/1.10.0/gdalautotest-1.10.0.zip/gdalautotest-1.10.0/ogr/data/testavc

It can work even with non GDAL binaries :

LD_PRELOAD=vsipreload.so h5dump -d /x \
        /vsicurl/http://download.osgeo.org/gdal/data/netcdf/utm-big-chunks.nc

LD_PRELOAD=vsipreload.so sqlite3 \
         /vsicurl/http://download.osgeo.org/gdal/data/sqlite3/polygon.db \
        "select * from polygon limit 10"

This can work with all VSI Large File API filesystems : /vsizip/ , /vsitar/,
/vsisubfile/ , etc...

This is still a bit experimental in the sense that only the most common glibc
I/O API have been overloaded. If exotic ones are used with /vsi files, crashes
are likely.

--
Geospatial professional services
http://even.rouault.free.fr/services.html
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev


_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: vsipreload: enabling VSI Virtual File API for regular I/O

Etienne Tourigny-3


On Tue, May 28, 2013 at 6:57 PM, Etienne Tourigny <[hidden email]> wrote:


On Sun, May 26, 2013 at 11:46 AM, Even Rouault <[hidden email]> wrote:
Hi,

I've just commited in trunk a new file port/vsipreload.cpp.

This file is the source code for a standalone shared library that can be
LD_PRELOAD'ed as an overload of libc to enable VSI Virtual FILE API to be used
with binaries using regular libc I/O API.

WARNING: Linux glibc ONLY. Might work with some adaptations (mainly around
64bit symbols) on other Unix systems

Compile:
g++ -Wall -fPIC port/vsipreload.cpp -shared -o vsipreload.so -Iport \
        -L. -L.libs -lgdal

Examples :

LD_PRELOAD=vsipreload.so gdalinfo \
        /vsicurl/http://download.osgeo.org/gdal/data/ecw/spif83.ecw

Even,

I have built vsipreload with a fresh build of svn trunk, but I get the following error:

$ g++ -Wall -fPIC port/vsipreload.cpp -shared -o vsipreload.so -Iport -L. -L.libs -lgdal
$ LD_PRELOAD=vsipreload.so gdalinfo         /vsicurl/http://download.osgeo.org/gdal/data/ecw/spif83.ecw
ERROR: ld.so: object 'vsipreload.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR 4: `/vsicurl/http://download.osgeo.org/gdal/data/ecw/spif83.ecw' not recognised as a supported file format.

gdalinfo failed - unable to open '/vsicurl/http://download.osgeo.org/gdal/data/ecw/spif83.ecw'.

Is there a way to find out why it cannot be preloaded? Perhaps the compilation options should be tweaked?

I have managed to get it working buy using LD_PRELOAD=./vsipreload.so 

e.g.
LD_PRELOAD=./vsipreload.so gdalinfo /vsicurl/http://download.osgeo.org/gdal/data/netcdf/utm-big-chunks.nc

It seems that setting LD_PRELOAD outside of the command works also, so this mechanism would be used transparently.
But does it have a significant overhead and/or chances it might do damage if always set?

e.g.
export LD_PRELOAD=./vsipreload.so 


Is there any way this could be integrated into gdal to transparently support vsifile for all drivers, without setting LD_PRELOAD?
In this way this feature would be available to all gdal commands or even other apps that use gdal. 

For example, /vsizip/ is used in QGIS, but it does not work with certain files (e.g. netcdf ) inside archives. But if I use LD_PRELOAD it works for netcdf files! Neat stuff!
e.g. LD_PRELOAD=./vsipreload.so qgis

great work
Etienne



Thanks,
Etienne

My system is Linux Mint 13 ubuntu (quantal), gcc version 4.7.2, 64bits, 3.5.0-30-generic kernel



LD_PRELOAD=vsipreload.so gdalinfo \
 'HDF4_EOS:EOS_GRID:"/vsicurl/http://download.osgeo.org/gdal/data/hdf4/MOD09Q1G_EVI.A2006233.h07v03.005.2008338190308.hdf":MODIS_NACP_EVI:MODIS_EVI'

LD_PRELOAD=vsipreload.so ogrinfo \
         /vsicurl/http://svn.osgeo.org/gdal/trunk/autotest/ogr/data/testavc -ro

LD_PRELOAD=vsipreload.so ogrinfo -ro \

/vsizip//vsicurl/http://download.osgeo.org/gdal/1.10.0/gdalautotest-1.10.0.zip/gdalautotest-1.10.0/ogr/data/testavc

It can work even with non GDAL binaries :

LD_PRELOAD=vsipreload.so h5dump -d /x \
        /vsicurl/http://download.osgeo.org/gdal/data/netcdf/utm-big-chunks.nc

LD_PRELOAD=vsipreload.so sqlite3 \
         /vsicurl/http://download.osgeo.org/gdal/data/sqlite3/polygon.db \
        "select * from polygon limit 10"

This can work with all VSI Large File API filesystems : /vsizip/ , /vsitar/,
/vsisubfile/ , etc...

This is still a bit experimental in the sense that only the most common glibc
I/O API have been overloaded. If exotic ones are used with /vsi files, crashes
are likely.

--
Geospatial professional services
http://even.rouault.free.fr/services.html
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev



_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: vsipreload: enabling VSI Virtual File API for regular I/O

Even Rouault
In reply to this post by Howard Butler
> Now that I have your attention, I brought up with Frank at FOSS4GNA that
> there could sometimes be a need for both MEM drivers to spool off to disk
> through some sort of out-of-core mmap'd allocation.

> Does this currently exist in VSI,

No

> and if not, do you see any use for such a thing?

I didn't yet, but apparently you do :-)

> My thought
> was there might be scenarios where someone working with multi-gb (or
> worse) raster data sources where you'd like to control the paging (ie, you
> have SSD, or you want to create intermediates for some weird processing
> chain).

I'm not sure to understand the mention to paging in your sentence. When
refering to mmap(), paging makes me think of the page size you get with
sysconf(_SC_PAGE_SIZE), but that's probably not what you meant.

> Maybe not all that useful in exchange for the added complexity...

I'm not entirely clear on the advantages of using mmap() over a backing file
rather than just doing a very large malloc(). In both cases I guess you would
get swap trashing when you ping more virtual memory than actual physical
memory available (although I see the madvise() call that could be used to give
a hint to avoid pages to stay in RAM for too long). Or perhaps you're thinking
to a mmap() on a limited portion of the backing file ? And the application
changes the mapping according to where the user requests data ? But getting
good performance with mmaping can be tricky (
http://stackoverflow.com/questions/6055861/why-is-sequentially-reading-a-large-
file-row-by-row-with-mmap-and-madvise-sequen ), so perhaps the backing to disk
could also be done with traditionnal IO ?

Anyway this wouldn't be a "usual" VSI file system since its semantics must be
similar to traditionnal POSIX file I/O (fread(), fwrite(), etc...), and
mmap'ing is a different beast. This would be more something like a portability
API for the Unix vs Windows API to establish a memory mapping.

(This discussion makes me think of the latest release of sqlite where they can
optiionnaly use mmap() : http://www.sqlite.org/mmap.html . Though it is
limited to situations where the mmap() size you give is big enough to fit the
file size. If that condition is met, they have observed performance boost by a
factor of 2 in some situations w.r.t traditionnal I/O methods)

Even

--
Geospatial professional services
http://even.rouault.free.fr/services.html
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: vsipreload: enabling VSI Virtual File API for regular I/O

Even Rouault
In reply to this post by Etienne Tourigny-3

> I have managed to get it working buy using LD_PRELOAD=./vsipreload.so

Yes for some strange reason, some systems require the ./ and some not... My
old Ubuntu 10.04 does no, but Travis on Ubuntu 12.04 does. Perhaps some
security measure that has been added in LD_PRELOAD mechanism.

I've just adjusted the examples in the .cpp header to add ./

>
> e.g.
> LD_PRELOAD=./vsipreload.so gdalinfo /vsicurl/
> http://download.osgeo.org/gdal/data/netcdf/utm-big-chunks.nc
>
> It seems that setting LD_PRELOAD outside of the command works also, so this
> mechanism would be used transparently.
> But does it have a significant overhead and/or chances it might do damage
> if always set?

This should have no impact for I/O on regular files, except for a hopefully
small performance hit to check if the FILE* is a regular one (and redirect to
the glibc function) or one that is a VSILFILE* instead (and redirect to VSI
implementation). On /vsiXXXX files, crashes are theoretically possible if a
glibc function using the fake FILE* is used, but not overloaded in the .so.

>
> e.g.
> export LD_PRELOAD=./vsipreload.so
> gdalinfo /vsicurl/
> http://download.osgeo.org/gdal/data/netcdf/utm-big-chunks.nc
>
>
> Is there any way this could be integrated into gdal to transparently
> support vsifile for all drivers, without setting LD_PRELOAD?

None that I'm aware of. LD_PRELOAD is really something that must be specified
before process launching. When GDAL code is reached, it is too late (and those
limitations are really welcome for obvious security reasons).

> In this way this feature would be available to all gdal commands or even
> other apps that use gdal.
>
> For example, /vsizip/ is used in QGIS, but it does not work with certain
> files (e.g. netcdf ) inside archives. But if I use LD_PRELOAD it works for
> netcdf files! Neat stuff!
> e.g. LD_PRELOAD=./vsipreload.so qgis

As I mentioned in another post in that thread, the use of a fuse filesystem
could make it less hackish, although you would need permissions to mount on
/vsizip , /vsicurl etc which might be problematic if you are not a good friend
with the system administrator. You don't need to be root to mount a fuse
filesystem, but you must have write permissions on the mount point I believe.

>
> great work
> Etienne

--
Geospatial professional services
http://even.rouault.free.fr/services.html
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: vsipreload: enabling VSI Virtual File API for regular I/O

Etienne Tourigny-3


On Tue, May 28, 2013 at 7:35 PM, Even Rouault <[hidden email]> wrote:

> I have managed to get it working buy using LD_PRELOAD=./vsipreload.so

Yes for some strange reason, some systems require the ./ and some not... My
old Ubuntu 10.04 does no, but Travis on Ubuntu 12.04 does. Perhaps some
security measure that has been added in LD_PRELOAD mechanism.

I've just adjusted the examples in the .cpp header to add ./

>
> e.g.
> LD_PRELOAD=./vsipreload.so gdalinfo /vsicurl/
> http://download.osgeo.org/gdal/data/netcdf/utm-big-chunks.nc
>
> It seems that setting LD_PRELOAD outside of the command works also, so this
> mechanism would be used transparently.
> But does it have a significant overhead and/or chances it might do damage
> if always set?

This should have no impact for I/O on regular files, except for a hopefully
small performance hit to check if the FILE* is a regular one (and redirect to
the glibc function) or one that is a VSILFILE* instead (and redirect to VSI
implementation). On /vsiXXXX files, crashes are theoretically possible if a
glibc function using the fake FILE* is used, but not overloaded in the .so.

>
> e.g.
> export LD_PRELOAD=./vsipreload.so
> gdalinfo /vsicurl/
> http://download.osgeo.org/gdal/data/netcdf/utm-big-chunks.nc
>
>
> Is there any way this could be integrated into gdal to transparently
> support vsifile for all drivers, without setting LD_PRELOAD?

None that I'm aware of. LD_PRELOAD is really something that must be specified
before process launching. When GDAL code is reached, it is too late (and those
limitations are really welcome for obvious security reasons).

It would probably best (and safer for the time being) to use aliases for commonly-used apps that would benefit from this (gdal*, qgis, etc) 

alias gdalinfo='LD_PRELOAD=/usr/local/lib/vsipreload.so gdalinfo'
alias gdalwarp='LD_PRELOAD=/usr/local/lib/vsipreload.so gdalwarp'
alias qgis='LD_PRELOAD='usr/local/lib/vsipreload.so qgis'


> In this way this feature would be available to all gdal commands or even
> other apps that use gdal.
>
> For example, /vsizip/ is used in QGIS, but it does not work with certain
> files (e.g. netcdf ) inside archives. But if I use LD_PRELOAD it works for
> netcdf files! Neat stuff!
> e.g. LD_PRELOAD=./vsipreload.so qgis

As I mentioned in another post in that thread, the use of a fuse filesystem
could make it less hackish, although you would need permissions to mount on
/vsizip , /vsicurl etc which might be problematic if you are not a good friend
with the system administrator. You don't need to be root to mount a fuse
filesystem, but you must have write permissions on the mount point I believe.

it seems a little more involved and less flexible
 

>
> great work
> Etienne

--
Geospatial professional services
http://even.rouault.free.fr/services.html


_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev