[gdal-dev] ogr2ogr : limit memory usage

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[gdal-dev] ogr2ogr : limit memory usage

Matthieu Lefort-2
Hi list,
I am using ogr2ogr to load big geojson files (about 1 Go each) into postgresql database.
On my MacBook, (GDAL 1.11.3, released 2015/09/16), although it lasts about 20 minutes for each file, this works fine.
I tried to do the same thing on a VPS (Debian Stretch 9.2, 8 Go RAM) but the process consumes all RAM in 20 seconds before stopping (out of memory).
Install was made with apt-get install gdal-bin. (GDAL 2.1.2, released 2016/10/24)

Is there a way to control memory usage or to avoid at least using more memory than the available one ?

Thanks for your help !

- Matthieu Lefort -

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev

signature.asc (859 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: ogr2ogr : limit memory usage

Even Rouault-2

On vendredi 13 octobre 2017 10:06:56 CEST Matthieu Lefort wrote:

> Hi list,

> I am using ogr2ogr to load big geojson files (about 1 Go each) into

> postgresql database. On my MacBook, (GDAL 1.11.3, released 2015/09/16),

> although it lasts about 20 minutes for each file, this works fine. I tried

> to do the same thing on a VPS (Debian Stretch 9.2, 8 Go RAM) but the

> process consumes all RAM in 20 seconds before stopping (out of memory).

> Install was made with apt-get install gdal-bin. (GDAL 2.1.2, released

> 2016/10/24)

>

> Is there a way to control memory usage or to avoid at least using more

> memory than the available one ?

 

Matthieu,

 

no, this is a known issue ( see https://trac.osgeo.org/gdal/ticket/6540 ) which would require substantial changes in the geojson driver to be adressed. It needs to ingest the whole file in memory currently, parse it completely, and store the resulting features in RAM. So typically, you need 10 to 20 times more RAM than the file size (I wouldn't expect the memory usage to be substantially different between GDAL 1.11 and 2.1, but there has been indeed a small change to use the MEM driver underneath)

 

So for now, your best chance is to try splitting the features[] array into smaller files.

 

https://www.google.fr/search?q=splitting+geojson+files points to various hints on how to split geojson files.

 

Even

 

 

--

Spatialys - Geospatial professional services

http://www.spatialys.com


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: ogr2ogr : limit memory usage

Matthieu Lefort-2
OK, I’ll try it. Thanks for theses informations.
Matthieu

Le 13 oct. 2017 à 12:43, Even Rouault <[hidden email]> a écrit :

On vendredi 13 octobre 2017 10:06:56 CEST Matthieu Lefort wrote:
> Hi list,
> I am using ogr2ogr to load big geojson files (about 1 Go each) into
> postgresql database. On my MacBook, (GDAL 1.11.3, released 2015/09/16),
> although it lasts about 20 minutes for each file, this works fine. I tried
> to do the same thing on a VPS (Debian Stretch 9.2, 8 Go RAM) but the
> process consumes all RAM in 20 seconds before stopping (out of memory).
> Install was made with apt-get install gdal-bin. (GDAL 2.1.2, released
> 2016/10/24)
>
> Is there a way to control memory usage or to avoid at least using more
> memory than the available one ?

 

Matthieu,

 

no, this is a known issue ( see https://trac.osgeo.org/gdal/ticket/6540 ) which would require substantial changes in the geojson driver to be adressed. It needs to ingest the whole file in memory currently, parse it completely, and store the resulting features in RAM. So typically, you need 10 to 20 times more RAM than the file size (I wouldn't expect the memory usage to be substantially different between GDAL 1.11 and 2.1, but there has been indeed a small change to use the MEM driver underneath)

 

So for now, your best chance is to try splitting the features[] array into smaller files.

 

https://www.google.fr/search?q=splitting+geojson+files points to various hints on how to split geojson files.

 

Even

 

 

--
Spatialys - Geospatial professional services


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev

signature.asc (859 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: ogr2ogr : limit memory usage

Even Rouault-2
In reply to this post by Even Rouault-2
On vendredi 13 octobre 2017 12:43:29 CEST Even Rouault wrote:

> On vendredi 13 octobre 2017 10:06:56 CEST Matthieu Lefort wrote:
> > Hi list,
> > I am using ogr2ogr to load big geojson files (about 1 Go each) into
> > postgresql database. On my MacBook, (GDAL 1.11.3, released 2015/09/16),
> > although it lasts about 20 minutes for each file, this works fine. I tried
> > to do the same thing on a VPS (Debian Stretch 9.2, 8 Go RAM) but the
> > process consumes all RAM in 20 seconds before stopping (out of memory).
> > Install was made with apt-get install gdal-bin. (GDAL 2.1.2, released
> > 2016/10/24)
> >
> > Is there a way to control memory usage or to avoid at least using more
> > memory than the available one ?
>
> Matthieu,
>
> no, this is a known issue ( see https://trac.osgeo.org/gdal/ticket/6540 )

Hi,

I've modified the GeoJSON driver in trunk to be able to ingest arbitrary large
files with neglectable RAM consumption. Testing appreciated as this is a non
trivial change.

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: ogr2ogr : limit memory usage

Kurt Schwehr-2
Added some small fuzzer corpus test cases that cause me OOM on a 2GB instance:


On Tue, Oct 17, 2017 at 3:05 AM, Even Rouault <[hidden email]> wrote:
On vendredi 13 octobre 2017 12:43:29 CEST Even Rouault wrote:
> On vendredi 13 octobre 2017 10:06:56 CEST Matthieu Lefort wrote:
> > Hi list,
> > I am using ogr2ogr to load big geojson files (about 1 Go each) into
> > postgresql database. On my MacBook, (GDAL 1.11.3, released 2015/09/16),
> > although it lasts about 20 minutes for each file, this works fine. I tried
> > to do the same thing on a VPS (Debian Stretch 9.2, 8 Go RAM) but the
> > process consumes all RAM in 20 seconds before stopping (out of memory).
> > Install was made with apt-get install gdal-bin. (GDAL 2.1.2, released
> > 2016/10/24)
> >
> > Is there a way to control memory usage or to avoid at least using more
> > memory than the available one ?
>
> Matthieu,
>
> no, this is a known issue ( see https://trac.osgeo.org/gdal/ticket/6540 )

Hi,

I've modified the GeoJSON driver in trunk to be able to ingest arbitrary large
files with neglectable RAM consumption. Testing appreciated as this is a non
trivial change.

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev



--

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: ogr2ogr : limit memory usage

Even Rouault-2
On mardi 17 octobre 2017 06:34:07 CEST Kurt Schwehr wrote:
> Added some small fuzzer corpus test cases that cause me OOM on a 2GB
> instance:
>
> https://trac.osgeo.org/gdal/ticket/6540#comment:5

I doubt this is a OOM issue. But there was indeed a regression with empty
string keys in dictionaries that caused an assertion (or a wrong branch to be
taken in non-debug mode). Just fixed it

After the fix the following works (with my debug build, with many libraries
linked, you need 700 MB to load GDAL)

for i in json/*; do (ulimit -v 700000; ogrinfo $i -al -so); done

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev