Re: GDAL, vsis3 and vsisubfile

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: GDAL, vsis3 and vsisubfile

Mike Pfaffenberger
I turned on some debug options that shed some light on to what's going on. It appears that the NITF driver must internally open a JPEG 2000 Driver on a virtual subfile. In my case, that virtual subfile starts at offset 4038 and continues to the end of the file, offset 901949970.

While this is a nice way of providing a JPEG2000 decompression routine to the NITF driver, when accessing a remote dataset, it causes the entire file to be downloaded even when reading a small window.

I used gdal_translate locally on my NITF file and turned it into a JP2 file, then I uploaded this file to S3 and ran my gdal_translate -srcwin 000 000 1000 1000 /vsis3/mybucket/jp2file.JP2 local_file.tiff and it ran instantly. Is there a way to completely bypass using the NITF driver and simply open the NITF file with the JP2 driver wrapped up with vsis3?

Thank you very much for your time. Also thank you for writing the vsicurl and vsis3 code -- it's been very useful!

On Mon, Jul 24, 2017 at 10:41 AM, Even Rouault <[hidden email]> wrote:

On dimanche 23 juillet 2017 23:09:05 CEST you wrote:

> Hi Even,

>

> I posted in the gdal-dev mailing list about reading sub-windows out of NITF

> files in S3. I turned on some debug options that shed some light on to

> what's going on. It appears that the NITF driver must internally open a

> JPEG 2000 Driver on a virtual subfile. In my case, that virtual subfile

> starts at offset 4038 and continues to the end of the file, offset

> 901949970.

>

> While this is a nice way of providing a JPEG2000 decompression routine to

> the NITF driver, when accessing a remote dataset, it causes the entire file

> to be downloaded even when reading a small window.

>

> I used gdal_translate locally on my NITF file and turned it into a JP2

> file, then I uploaded this file to S3 and ran my gdal_translate -srcwin 000

> 000 1000 1000 /vsis3/mybucket/jp2file.JP2 local_file.tiff and it ran

> instantly. Is there a way to completely bypass using the NITF driver and

> simply open the NITF file with the JP2 driver wrapped up with vsis3?

>

> Thank you very much for your time. Also thank you for writing the vsicurl

> and vsis3 code -- it's been very useful!

 

Do you mind reposting your above findings to the gdal-dev thread so we can continue the exchange there ?

 

Even

 

 

--

Spatialys - Geospatial professional services

http://www.spatialys.com



_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: GDAL, vsis3 and vsisubfile

Even Rouault-2

Mike,

 

(note to other readers: this is the continuation of the thread

[gdal-dev] VSIS3 on digital globe multiview-stereo (NITF) )

 

> I turned on some debug options that shed some light on to what's going on.

> It appears that the NITF driver must internally open a JPEG 2000 Driver on

> a virtual subfile. In my case, that virtual subfile starts at offset 4038

> and continues to the end of the file, offset 901949970.

>

> While this is a nice way of providing a JPEG2000 decompression routine to

> the NITF driver, when accessing a remote dataset, it causes the entire file

> to be downloaded even when reading a small window.

>

> I used gdal_translate locally on my NITF file and turned it into a JP2

> file, then I uploaded this file to S3 and ran my gdal_translate -srcwin 000

> 000 1000 1000 /vsis3/mybucket/jp2file.JP2 local_file.tiff and it ran

> instantly. Is there a way to completely bypass using the NITF driver and

> simply open the NITF file with the JP2 driver wrapped up with vsis3?

 

Yes, you should be able to open the following filename, but this is actually what the NITF driver does :

/vsisubfile/4038_901949970,/vsis3/glitch253/test2.ntf (you may need to adjust the second value '901949970' to be 901949970-4038, since it is supposed to be a lenght and not an offset)

This shoud be recognized by one of the JPEG2000 drivers, and you should likely get the same performance characteristics as using it through the NITF driver (or the NITF driver does something that requires reading the whole file, but I don't think so)

 

My hypothesis is that the root cause of the performance issue comes is the progression order of the JPEG2000 codestream of this NITF file, that causes most of the file to be read through. Likely only X % of bytes are really read, but as they are scattered throughout the whole file, given the chunk by chunk downloading logic of /vsis3, you end up reading the whole file in practice.

For example I'd expect LRCP (Layer-Resolution-Component-Precincts), RLCP and RPCL to cause issues. Whereas PCRL and CPRL should perform better for windowed requests.

 

http://www.gwg.nga.mil/ntb/baseline/docs/bpj2k01/ISOJ2K_profile.pdf recommands using LRCP with 19-20 quality layers, so that would indeed cause a lot of seeking through the file. You can check the progression order in the output of the following (check for "SGcod_Progress")

 

python dump_jp2.py /vsisubfile/4038_901949970,/vsis3/glitch253/test2.ntf

 

where dump_jp2.py is

https://svn.osgeo.org/gdal/trunk/gdal/swig/python/samples/dump_jp2.py

 

It is likely that your translating into JP2 turn the original codestream into one with a progression order that is more seeking friendly (the default progression order may be different depending on drivers)

 

Even

 

--

Spatialys - Geospatial professional services

http://www.spatialys.com


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: GDAL, vsis3 and vsisubfile

Mike Pfaffenberger
Hi Even,

I ran the script you linked, and your hypothesis is absolutely correct.

<JP2KCodeStream filename="/vsisubfile/4038_901949970,/vsis3/glitch253/test2.ntf">
.
.
.
<Field name="SGcod_Progress" type="uint8" description="RLCP">1</Field>
<Field name="SGcod_NumLayers" type="uint16">19</Field>

I also added a quick printf in the vsi subfile read function which prints the nSize and nCount variables. Running the python script you linked me triggered the vsi subfile read function 75,868 times, mostly with small sizes, and nCount=1.

Doing the same thing on my gdal_translate -srcwin 000 000 1000 1000 triggered vsi subfile read 9,024 times, almost all with nSize=1 and nCount=1024. If the vsisubfile object is wrapping a vsis3 dataset, then does each vsi subfile read turn into an HTTP request? That would certainly explain the extremely long time to crop my window.

Just out of curiosity I ran the python script you linked on my JP2 file (same image as the NITF, I just ran gdal_translate on it).

This one appears to have the codestream progression order LRCP with only one layer...?:
 <Field name="SGcod_Progress" type="uint8" description="LRCP">0</Field>
 <Field name="SGcod_NumLayers" type="uint16">1</Field>

I'm guessing the fact that my JP2 file only has one layer is the reason vsis3 works well with it, regardless of it being LRCP (not optimal for windowed reads).

Anyway, thanks. I learned some more about JPEG2K here. Unfortunately I think I'm pretty out of luck on the prospect of doing remote windowed reads quickly on this data. However, I'm very open to suggestions if anyone has any ideas on how it might work.

Cheers.

On Mon, Jul 24, 2017 at 11:21 AM, Even Rouault <[hidden email]> wrote:

Mike,

 

(note to other readers: this is the continuation of the thread

[gdal-dev] VSIS3 on digital globe multiview-stereo (NITF) )

 

> I turned on some debug options that shed some light on to what's going on.

> It appears that the NITF driver must internally open a JPEG 2000 Driver on

> a virtual subfile. In my case, that virtual subfile starts at offset 4038

> and continues to the end of the file, offset 901949970.

>

> While this is a nice way of providing a JPEG2000 decompression routine to

> the NITF driver, when accessing a remote dataset, it causes the entire file

> to be downloaded even when reading a small window.

>

> I used gdal_translate locally on my NITF file and turned it into a JP2

> file, then I uploaded this file to S3 and ran my gdal_translate -srcwin 000

> 000 1000 1000 /vsis3/mybucket/jp2file.JP2 local_file.tiff and it ran

> instantly. Is there a way to completely bypass using the NITF driver and

> simply open the NITF file with the JP2 driver wrapped up with vsis3?

 

Yes, you should be able to open the following filename, but this is actually what the NITF driver does :

/vsisubfile/4038_901949970,/vsis3/glitch253/test2.ntf (you may need to adjust the second value '901949970' to be 901949970-4038, since it is supposed to be a lenght and not an offset)

This shoud be recognized by one of the JPEG2000 drivers, and you should likely get the same performance characteristics as using it through the NITF driver (or the NITF driver does something that requires reading the whole file, but I don't think so)

 

My hypothesis is that the root cause of the performance issue comes is the progression order of the JPEG2000 codestream of this NITF file, that causes most of the file to be read through. Likely only X % of bytes are really read, but as they are scattered throughout the whole file, given the chunk by chunk downloading logic of /vsis3, you end up reading the whole file in practice.

For example I'd expect LRCP (Layer-Resolution-Component-Precincts), RLCP and RPCL to cause issues. Whereas PCRL and CPRL should perform better for windowed requests.

 

http://www.gwg.nga.mil/ntb/baseline/docs/bpj2k01/ISOJ2K_profile.pdf recommands using LRCP with 19-20 quality layers, so that would indeed cause a lot of seeking through the file. You can check the progression order in the output of the following (check for "SGcod_Progress")

 

python dump_jp2.py /vsisubfile/4038_901949970,/vsis3/glitch253/test2.ntf

 

where dump_jp2.py is

https://svn.osgeo.org/gdal/trunk/gdal/swig/python/samples/dump_jp2.py

 

It is likely that your translating into JP2 turn the original codestream into one with a progression order that is more seeking friendly (the default progression order may be different depending on drivers)

 

Even

 

--

Spatialys - Geospatial professional services

http://www.spatialys.com



_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: GDAL, vsis3 and vsisubfile

Even Rouault-2

On lundi 24 juillet 2017 22:41:05 CEST Mike Pfaffenberger wrote:

> Hi Even,

>

> I ran the script you linked, and your hypothesis is absolutely correct.

>

> <JP2KCodeStream filename="/vsisubfile/4038_901949970,/vsis3/glitch253/

> test2.ntf">

> .

> .

> .

> <Field name="SGcod_Progress" type="uint8" description="RLCP">1</Field>

> <Field name="SGcod_NumLayers" type="uint16">19</Field>

>

> I also added a quick printf in the vsi subfile read function which prints

> the nSize and nCount variables. Running the python script you linked me

> triggered the vsi subfile read function 75,868 times, mostly with small

> sizes, and nCount=1.

>

> Doing the same thing on my gdal_translate -srcwin 000 000 1000 1000

> triggered vsi subfile read 9,024 times, almost all with nSize=1 and

> nCount=1024. If the vsisubfile object is wrapping a vsis3 dataset, then

> does each vsi subfile read turn into an HTTP request?

 

Not exactly. /vsis3/ reads by chunks of a minimum 16 KB (with a logic to grow this chunk size when it realizes that the chunks are consecutive), and with a cache, to avoid issuing too many small HTTP range requests. The issue with the original NITF file are that those small sizes must be scattered through the whole file, causing a lot of 16 KB chunks to be read. I doubt that reducing the chunk size would really help with performance because the bottleneck must be more the latency of each HTTP request than the amount of bytes transfered.

 

That said it is not difficult to try. You can edit the port/cpl_vsil_curl.cpp file and modify the value of the DOWNLOAD_CHUNK_SIZE constant to be something smaller than the current 16384 (e.g try 1024)

 

LRCP with many quality layers is ideal when you want to be able to do a progressive rendering of the whole image. Remember the old times of super slow Internet where your browser would display a progressive JPEG or PNG with growing quality.

 

> That would certainly

> explain the extremely long time to crop my window.

>

> Just out of curiosity I ran the python script you linked on my JP2 file

> (same image as the NITF, I just ran gdal_translate on it).

>

> This one appears to have the codestream progression order LRCP with only

> one layer...?:

> <Field name="SGcod_Progress" type="uint8" description="LRCP">0</Field>

> <Field name="SGcod_NumLayers" type="uint16">1</Field>

>

> I'm guessing the fact that my JP2 file only has one layer is the reason

> vsis3 works well with it, regardless of it being LRCP (not optimal for

> windowed reads).

 

When one of the L, R, C, P "dimension" is of size 1, it doesn't really count. So with one single quality layer, this is in fact a RCP layout, which means that resolution level are scattered through the files. It is still not the ideal layout fro windowed reads at full resolution (unless there's just one resolution level. You can check it with the value of <SPcod_NumDecompositions>)

 

>

> Anyway, thanks. I learned some more about JPEG2K here. Unfortunately I

> think I'm pretty out of luck on the prospect of doing remote windowed reads

> quickly on this data.

 

Yes, I don' think it is really possible to improve that with those datasets unmodified.

 

I l believe that there are some JPEG2000 toolkits that can change the progression order of an existing JPEG2000 file without adding new loss (even if it uses lossy compression).

 

Even

 

--

Spatialys - Geospatial professional services

http://www.spatialys.com


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Loading...