VRT functionality

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

VRT functionality

Zoltan Szecsei
Hi,
I just want to clear up my mindset as to how a VRT is implemented in QGIS.

I'd like to understand when QGIS opens a file, when it reads the contents, and when it writes (if need be) and closes a file.
In this context, I am thinking about SHP files - especially the NGI dataset which comes out "cut up" into degree squares.
Let's just deal with 1 feature type: Rivers lines. My VRT looks like:
<OGRVRTDataSource>
  <OGRVRTUnionLayer name="Rivers">
  <OGRVRTLayer name="2730_RIVER_LINE_2006_06">
    <SrcDataSource relativeToVRT="1">2730/2730_RIVER_LINE_2006_06.shp</SrcDataSource>
  </OGRVRTLayer>
  <OGRVRTLayer name="2731_RIVER_LINE_2006_04">
    <SrcDataSource relativeToVRT="1">2731/2731_RIVER_LINE_2006_04.shp</SrcDataSource>
  </OGRVRTLayer>
  <OGRVRTLayer name="2732_RIVER_LINE_2006_04">
    <SrcDataSource relativeToVRT="1">2732/2732_RIVER_LINE_2006_04.shp</SrcDataSource>
  </OGRVRTLayer>
  </OGRVRTUnionLayer>
</OGRVRTDataSource>

  • When I open the VRT in QGIS, does QGIS open ALL the VRT files and look for the extent of each of the files?
    • If my VRT had the extents included for each of the files, would this stop QGIS from (at this stage) opening the files and reading the extents?

  • Before rendering the VRT, does QGIS look at the extents of my viewport and only physically open my files and render it's contents?
    In other words, if I first zoom into a known area, then open my VRT, will QGIS at this stage still open all the subfiles, instead of waiting until a specific subfile needs opening)

  • As I pan around my map, does QGIS open and close the VRT subfiles that are out of my current viewing region?

  • Presumably if any of my VRT subfiles touch or overlap my current viewport, they would be "processed" depending on what I am doing?

  • Is there a way to structure a VRT file so that you can have access to the underlying files that make up the VRT? (Even edit access?)


Or, is the VRT just any easy way to bunch a whole lot of maps under one name, and there is no processing benefit depending on the area you are viewing or working in?

TIA,
Zoltan






-- 

===========================================
Zoltan Szecsei PrGISc [PGP0031]
Geograph (Pty) Ltd.
GIS and Photogrammetric Services

P.O. Box 7, Muizenberg 7950, South Africa.

Mobile: +27-83-6004028
Fax:    +27-86-6115323     www.geograph.co.za
===========================================

_______________________________________________
Qgis-developer mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/qgis-developer
Reply | Threaded
Open this post in threaded view
|

Re: VRT functionality

Even Rouault-2
Le dimanche 26 octobre 2014 16:44:37, Zoltan Szecsei a écrit :
> Hi,
> I just want to clear up my mindset as to how a VRT is implemented in QGIS.

Zoltan,

In fact those are more OGR questions than QGIS questions. QGIS makes no
difference when reading a plain shapefile (through OGR) or a VRT.

>
> I'd like to understand when QGIS opens a file, when it reads the
> contents, and when it writes (if need be) and closes a file.
> In this context, I am thinking about SHP files - especially the NGI
> dataset which comes out "cut up" into degree squares.
> Let's just deal with 1 feature type: Rivers lines. My VRT looks like:
> <OGRVRTDataSource>
>    <OGRVRTUnionLayer name="Rivers">
>    <OGRVRTLayer name="2730_RIVER_LINE_2006_06">
>      <SrcDataSource
> relativeToVRT="1">2730/2730_RIVER_LINE_2006_06.shp</SrcDataSource>
>    </OGRVRTLayer>
>    <OGRVRTLayer name="2731_RIVER_LINE_2006_04">
>      <SrcDataSource
> relativeToVRT="1">2731/2731_RIVER_LINE_2006_04.shp</SrcDataSource>
>    </OGRVRTLayer>
>    <OGRVRTLayer name="2732_RIVER_LINE_2006_04">
>      <SrcDataSource
> relativeToVRT="1">2732/2732_RIVER_LINE_2006_04.shp</SrcDataSource>
>    </OGRVRTLayer>
>    </OGRVRTUnionLayer>
> </OGRVRTDataSource>
>
>   * When I open the VRT in QGIS, does QGIS open ALL the VRT files and
>     look for the extent of each of the files?

If QGIS issues a GetExtent() on the VRT, then with the above definition, it
will query the 3 shapefiles to find the extent of each. But on shapefiles this is
a fast operation.
You could define <ExtentXMin>, etc... just besides OGRVRTUnionLayer if you
really want fast GetExtent()

>       o If my VRT had the extents included for each of the files, would
>         this stop QGIS from (at this stage) opening the files and
>         reading the extents?

Yes, but QGIS probably asks GetFeatureCount(), so it will need to open each
shapefile, unless you define <FeatureCount> as well.
But QGIS will also asks the field definition, and will need to open each ....,
unless you define <Field>

>
>   * Before rendering the VRT, does QGIS look at the extents of my
>     viewport and only physically open my files and render it's contents?

QGIS will define SetSpatialFilter() on the layer with the extent, so as the
layer can use a spatial index if it has one. Reviewing my code in VRT union
layer, I can see that the spatial filter will be forwarded to each source
layer. So it will need to open them, but the shapefile driver won't scan any
feature if setting a spatial filter that does not intersect the extent of the
shapefile, so that should be fast. A possible optimization could be done in the
VRT union layer to take into account the extent of the source layer to avoid
iterating on it if the spatial filter on th union layer doesn't intersect that
extent.

To be efficient, you likely need to compute .qix spatial index on each shapefile.


>     In other words, if I first zoom into a known area, then open my VRT,
>     will QGIS at this stage still open all the subfiles, instead of
>     waiting until a specific subfile needs opening)
>
>   * As I pan around my map, does QGIS open and close the VRT subfiles
>     that are out of my current viewing region?

The VRT driver will maintain a pool of a maximum of 100 source layers by
default (that number can be altered by setting the OGR_VRT_MAX_OPENED
configuration option) and will close transparently the older ones

>
>   * Presumably if any of my VRT subfiles touch or overlap my current
>     viewport, they would be "processed" depending on what I am doing?
>
>   * Is there a way to structure a VRT file so that you can have access
>     to the underlying files that make up the VRT? (Even edit access?)

Not sure what you mean by "have access to". But a union VRT can be opened in
update mode and the update mode will be forwareded to the source layers
(provided they support it). You can delete or modify features. For creation of
new features, you need to specify <SourceLayerFieldName> as documented in
http://gdal.org/drv_vrt.html

>
>
> Or, is the VRT just any easy way to bunch a whole lot of maps under one
> name, and there is no processing benefit depending on the area you are
> viewing or working in?

Your above VRT should work reasonably fast. Unless you have several hunderds
or thousands of source layers. In which case, you may need to define more
optional elements in the VRT to avoid the scans, and there would be perhaps a
need for some enhancements in the OGRUnionLayer class.

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
Qgis-developer mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/qgis-developer
Reply | Threaded
Open this post in threaded view
|

Re: VRT functionality

Zoltan Szecsei
On 2014/10/26 19:22, Even Rouault wrote:

> Le dimanche 26 octobre 2014 16:44:37, Zoltan Szecsei a écrit :
>> Hi,
>> I just want to clear up my mindset as to how a VRT is implemented in QGIS.
> Zoltan,
>
> In fact those are more OGR questions than QGIS questions. QGIS makes no
> difference when reading a plain shapefile (through OGR) or a VRT.
>
>> I'd like to understand when QGIS opens a file, when it reads the
>> contents, and when it writes (if need be) and closes a file.
>> In this context, I am thinking about SHP files - especially the NGI
>> dataset which comes out "cut up" into degree squares.
>> Let's just deal with 1 feature type: Rivers lines. My VRT looks like:
>> <OGRVRTDataSource>
>>     <OGRVRTUnionLayer name="Rivers">
>>     <OGRVRTLayer name="2730_RIVER_LINE_2006_06">
>>       <SrcDataSource
>> relativeToVRT="1">2730/2730_RIVER_LINE_2006_06.shp</SrcDataSource>
>>     </OGRVRTLayer>
>>     <OGRVRTLayer name="2731_RIVER_LINE_2006_04">
>>       <SrcDataSource
>> relativeToVRT="1">2731/2731_RIVER_LINE_2006_04.shp</SrcDataSource>
>>     </OGRVRTLayer>
>>     <OGRVRTLayer name="2732_RIVER_LINE_2006_04">
>>       <SrcDataSource
>> relativeToVRT="1">2732/2732_RIVER_LINE_2006_04.shp</SrcDataSource>
>>     </OGRVRTLayer>
>>     </OGRVRTUnionLayer>
>> </OGRVRTDataSource>
>>
>>    * When I open the VRT in QGIS, does QGIS open ALL the VRT files and
>>      look for the extent of each of the files?
> If QGIS issues a GetExtent() on the VRT, then with the above definition, it
> will query the 3 shapefiles to find the extent of each. But on shapefiles this is
> a fast operation.
> You could define <ExtentXMin>, etc... just besides OGRVRTUnionLayer if you
> really want fast GetExtent()
>
>>        o If my VRT had the extents included for each of the files, would
>>          this stop QGIS from (at this stage) opening the files and
>>          reading the extents?
> Yes, but QGIS probably asks GetFeatureCount(), so it will need to open each
> shapefile, unless you define <FeatureCount> as well.
> But QGIS will also asks the field definition, and will need to open each ....,
> unless you define <Field>
>
>>    * Before rendering the VRT, does QGIS look at the extents of my
>>      viewport and only physically open my files and render it's contents?
> QGIS will define SetSpatialFilter() on the layer with the extent, so as the
> layer can use a spatial index if it has one. Reviewing my code in VRT union
> layer, I can see that the spatial filter will be forwarded to each source
> layer. So it will need to open them, but the shapefile driver won't scan any
> feature if setting a spatial filter that does not intersect the extent of the
> shapefile, so that should be fast. A possible optimization could be done in the
> VRT union layer to take into account the extent of the source layer to avoid
> iterating on it if the spatial filter on th union layer doesn't intersect that
> extent.
>
> To be efficient, you likely need to compute .qix spatial index on each shapefile.
>
>
>>      In other words, if I first zoom into a known area, then open my VRT,
>>      will QGIS at this stage still open all the subfiles, instead of
>>      waiting until a specific subfile needs opening)
>>
>>    * As I pan around my map, does QGIS open and close the VRT subfiles
>>      that are out of my current viewing region?
> The VRT driver will maintain a pool of a maximum of 100 source layers by
> default (that number can be altered by setting the OGR_VRT_MAX_OPENED
> configuration option) and will close transparently the older ones
>
>>    * Presumably if any of my VRT subfiles touch or overlap my current
>>      viewport, they would be "processed" depending on what I am doing?
>>
>>    * Is there a way to structure a VRT file so that you can have access
>>      to the underlying files that make up the VRT? (Even edit access?)
> Not sure what you mean by "have access to". But a union VRT can be opened in
> update mode and the update mode will be forwareded to the source layers
> (provided they support it). You can delete or modify features. For creation of
> new features, you need to specify <SourceLayerFieldName> as documented in
> http://gdal.org/drv_vrt.html
>
>>
>> Or, is the VRT just any easy way to bunch a whole lot of maps under one
>> name, and there is no processing benefit depending on the area you are
>> viewing or working in?
> Your above VRT should work reasonably fast. Unless you have several hunderds
> or thousands of source layers. In which case, you may need to define more
> optional elements in the VRT to avoid the scans, and there would be perhaps a
> need for some enhancements in the OGRUnionLayer class.
>
> Even
>

Hi Even,
Thanks for the detailed thought, and for the effort of reviewing your code.
I'm fiddling with setting up quite a big dataset - likely to have over
1000 shapefiles in the VRT - maybe even up to 3000 - but I will
experiment and see what is both logical and practical.
My goal with the above questions is to try to avoid opening all the
shapefiles at the time the VRT is opened, so that there won't be a
"million and one" physical disk IOs.
If the user then loads my VRT with rendering off, it should load very
quickly (if I can supply all the details needed, in the VRT file).
Once the user has zoomed into his/her area of interest, and turns
rendering on for the VRT, then (hopefully) only the underlying
shapefiles in that AOI need to be physically accessed.

So, how compatible is the current code when opening a VRT, to zeroing
the need to open any underlying VRT files before any rendering or other
operations are done by the user, and if the user is "zoomed in", to
limiting the underlying VRT file-actions only to those affected by the
current zoom level?

ogrinfo -al -so gives a lot of info that could be added to the static
VRT file, but is it enough to stop QGIS's implementation of VRT from
physically querying the underlying files until absolutely necessary?

Also, when a VRT opened, do you really need all the knowledge (like
featurecount) at this stage?
One negative of me putting the featurecount into the VRT xml, is that
someone could change that shape file, and the actual feature count would
then differ from that in the xml file.

So, probably to negate the direction I am hoping to go in (like putting
details into the VRT file so that opening the VRT would cause minimal
disk io), the correct way would be to optimise the QGIS code so that the
information about the underlying files is only read by QGIS when
absolutely necessary.


Regards & thanks again for your interest.
Zoltan



--

===========================================
Zoltan Szecsei PrGISc [PGP0031]
Geograph (Pty) Ltd.
GIS and Photogrammetric Services

P.O. Box 7, Muizenberg 7950, South Africa.

Mobile: +27-83-6004028
Fax:    +27-86-6115323     www.geograph.co.za
===========================================

_______________________________________________
Qgis-developer mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/qgis-developer
Reply | Threaded
Open this post in threaded view
|

Re: VRT functionality

Even Rouault-2
Le lundi 27 octobre 2014 08:04:21, Zoltan Szecsei a écrit :

> On 2014/10/26 19:22, Even Rouault wrote:
> > Le dimanche 26 octobre 2014 16:44:37, Zoltan Szecsei a écrit :
> >> Hi,
> >> I just want to clear up my mindset as to how a VRT is implemented in
> >> QGIS.
> >
> > Zoltan,
> >
> > In fact those are more OGR questions than QGIS questions. QGIS makes no
> > difference when reading a plain shapefile (through OGR) or a VRT.
> >
> >> I'd like to understand when QGIS opens a file, when it reads the
> >> contents, and when it writes (if need be) and closes a file.
> >> In this context, I am thinking about SHP files - especially the NGI
> >> dataset which comes out "cut up" into degree squares.
> >> Let's just deal with 1 feature type: Rivers lines. My VRT looks like:
> >> <OGRVRTDataSource>
> >>
> >>     <OGRVRTUnionLayer name="Rivers">
> >>     <OGRVRTLayer name="2730_RIVER_LINE_2006_06">
> >>    
> >>       <SrcDataSource
> >>
> >> relativeToVRT="1">2730/2730_RIVER_LINE_2006_06.shp</SrcDataSource>
> >>
> >>     </OGRVRTLayer>
> >>     <OGRVRTLayer name="2731_RIVER_LINE_2006_04">
> >>    
> >>       <SrcDataSource
> >>
> >> relativeToVRT="1">2731/2731_RIVER_LINE_2006_04.shp</SrcDataSource>
> >>
> >>     </OGRVRTLayer>
> >>     <OGRVRTLayer name="2732_RIVER_LINE_2006_04">
> >>    
> >>       <SrcDataSource
> >>
> >> relativeToVRT="1">2732/2732_RIVER_LINE_2006_04.shp</SrcDataSource>
> >>
> >>     </OGRVRTLayer>
> >>     </OGRVRTUnionLayer>
> >>
> >> </OGRVRTDataSource>
> >>
> >>    * When I open the VRT in QGIS, does QGIS open ALL the VRT files and
> >>    
> >>      look for the extent of each of the files?
> >
> > If QGIS issues a GetExtent() on the VRT, then with the above definition,
> > it will query the 3 shapefiles to find the extent of each. But on
> > shapefiles this is a fast operation.
> > You could define <ExtentXMin>, etc... just besides OGRVRTUnionLayer if
> > you really want fast GetExtent()
> >
> >>        o If my VRT had the extents included for each of the files, would
> >>        
> >>          this stop QGIS from (at this stage) opening the files and
> >>          reading the extents?
> >
> > Yes, but QGIS probably asks GetFeatureCount(), so it will need to open
> > each shapefile, unless you define <FeatureCount> as well.
> > But QGIS will also asks the field definition, and will need to open each
> > ...., unless you define <Field>
> >
> >>    * Before rendering the VRT, does QGIS look at the extents of my
> >>    
> >>      viewport and only physically open my files and render it's
> >>      contents?
> >
> > QGIS will define SetSpatialFilter() on the layer with the extent, so as
> > the layer can use a spatial index if it has one. Reviewing my code in
> > VRT union layer, I can see that the spatial filter will be forwarded to
> > each source layer. So it will need to open them, but the shapefile
> > driver won't scan any feature if setting a spatial filter that does not
> > intersect the extent of the shapefile, so that should be fast. A
> > possible optimization could be done in the VRT union layer to take into
> > account the extent of the source layer to avoid iterating on it if the
> > spatial filter on th union layer doesn't intersect that extent.
> >
> > To be efficient, you likely need to compute .qix spatial index on each
> > shapefile.
> >
> >>      In other words, if I first zoom into a known area, then open my
> >>      VRT, will QGIS at this stage still open all the subfiles, instead
> >>      of waiting until a specific subfile needs opening)
> >>    
> >>    * As I pan around my map, does QGIS open and close the VRT subfiles
> >>    
> >>      that are out of my current viewing region?
> >
> > The VRT driver will maintain a pool of a maximum of 100 source layers by
> > default (that number can be altered by setting the OGR_VRT_MAX_OPENED
> > configuration option) and will close transparently the older ones
> >
> >>    * Presumably if any of my VRT subfiles touch or overlap my current
> >>    
> >>      viewport, they would be "processed" depending on what I am doing?
> >>    
> >>    * Is there a way to structure a VRT file so that you can have access
> >>    
> >>      to the underlying files that make up the VRT? (Even edit access?)
> >
> > Not sure what you mean by "have access to". But a union VRT can be opened
> > in update mode and the update mode will be forwareded to the source
> > layers (provided they support it). You can delete or modify features.
> > For creation of new features, you need to specify <SourceLayerFieldName>
> > as documented in http://gdal.org/drv_vrt.html
> >
> >> Or, is the VRT just any easy way to bunch a whole lot of maps under one
> >> name, and there is no processing benefit depending on the area you are
> >> viewing or working in?
> >
> > Your above VRT should work reasonably fast. Unless you have several
> > hunderds or thousands of source layers. In which case, you may need to
> > define more optional elements in the VRT to avoid the scans, and there
> > would be perhaps a need for some enhancements in the OGRUnionLayer
> > class.
> >
> > Even
>
> Hi Even,
> Thanks for the detailed thought, and for the effort of reviewing your code.
> I'm fiddling with setting up quite a big dataset - likely to have over
> 1000 shapefiles in the VRT - maybe even up to 3000 - but I will
> experiment and see what is both logical and practical.
> My goal with the above questions is to try to avoid opening all the
> shapefiles at the time the VRT is opened, so that there won't be a
> "million and one" physical disk IOs.
> If the user then loads my VRT with rendering off, it should load very
> quickly (if I can supply all the details needed, in the VRT file).
> Once the user has zoomed into his/her area of interest, and turns
> rendering on for the VRT, then (hopefully) only the underlying
> shapefiles in that AOI need to be physically accessed.
>
> So, how compatible is the current code when opening a VRT, to zeroing
> the need to open any underlying VRT files before any rendering or other
> operations are done by the user, and if the user is "zoomed in", to
> limiting the underlying VRT file-actions only to those affected by the
> current zoom level?

Zoltan,

You definitely need to define all fields (otherwise the VRT driver will open each
file to compute the union of fields) or declare
<FieldStrategy>FirstLayer</FieldStrategy, the geometry type, the global
extent. And ultimately, you would also need to declare the extent per layer,
once an extra optimization would be done in the union layer to avoid opening
files whose declared extent doesn't intersect the area of interest declared by
SetSpatialFilter().
I can imagine an enhanced version of ogrbuildvrt, as a python script, that
would retrieve all needed informations from source layers, could be usefull
too.
I can imagine also an interesting improvement: you could declare some option
in the VRT saying that by default GetNextFeature() on the union layer should
return nothing, except if the area of interest doesn't cover more than X
source layers, so that when the VRT is zoomed out, one doesn't try to open
thousands of files.
If you're interested in some of those improvements, you can contact me.
Perhaps some spatial indexing of the bounding boxes of the shapefiles would
also help, instead of sequential iterations, but for a few thousands ones,
that isn't probably necessary yet (spatial indices are generally interesting
only starting with dozains of thousands of geometries).

>
> ogrinfo -al -so gives a lot of info that could be added to the static
> VRT file, but is it enough to stop QGIS's implementation of VRT from
> physically querying the underlying files until absolutely necessary?
>
> Also, when a VRT opened, do you really need all the knowledge (like
> featurecount) at this stage?

I know QGIS asks that in some situations for example when displaying
informations about layers when opening a multi-layer dataset. You coud likely
put a dummy value, like -1, since I don't think QGIS uses that except for
informative purposes.

> One negative of me putting the featurecount into the VRT xml, is that
> someone could change that shape file, and the actual feature count would
> then differ from that in the xml file.
>
> So, probably to negate the direction I am hoping to go in (like putting
> details into the VRT file so that opening the VRT would cause minimal
> disk io), the correct way would be to optimise the QGIS code so that the
> information about the underlying files is only read by QGIS when
> absolutely necessary.

Yes, there are perhaps optimizations possible on QGIS side as well. If you use
GDAL trunk, compiled as a debug build, you can use the OGR C API spy
mechanism. I've added recently to help debugging my improvements in the
MapInfo driver when I spotted bugs in it when using QGIS. See
http://www.gdal.org/ograpispy_8h.html

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
Qgis-developer mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/qgis-developer
Reply | Threaded
Open this post in threaded view
|

Re: VRT functionality

Zoltan Szecsei
On 2014/10/27 10:38, Even Rouault wrote:

> Le lundi 27 octobre 2014 08:04:21, Zoltan Szecsei a écrit :
>> On 2014/10/26 19:22, Even Rouault wrote:
>>> Le dimanche 26 octobre 2014 16:44:37, Zoltan Szecsei a écrit :
>>>> Hi,
>>>> I just want to clear up my mindset as to how a VRT is implemented in
>>>> QGIS.
>>> Zoltan,
>>>
>>> In fact those are more OGR questions than QGIS questions. QGIS makes no
>>> difference when reading a plain shapefile (through OGR) or a VRT.
>>>
>>>> I'd like to understand when QGIS opens a file, when it reads the
>>>> contents, and when it writes (if need be) and closes a file.
>>>> In this context, I am thinking about SHP files - especially the NGI
>>>> dataset which comes out "cut up" into degree squares.
>>>> Let's just deal with 1 feature type: Rivers lines. My VRT looks like:
>>>> <OGRVRTDataSource>
>>>>
>>>>      <OGRVRTUnionLayer name="Rivers">
>>>>      <OGRVRTLayer name="2730_RIVER_LINE_2006_06">
>>>>      
>>>>        <SrcDataSource
>>>>
>>>> relativeToVRT="1">2730/2730_RIVER_LINE_2006_06.shp</SrcDataSource>
>>>>
>>>>      </OGRVRTLayer>
>>>>      <OGRVRTLayer name="2731_RIVER_LINE_2006_04">
>>>>      
>>>>        <SrcDataSource
>>>>
>>>> relativeToVRT="1">2731/2731_RIVER_LINE_2006_04.shp</SrcDataSource>
>>>>
>>>>      </OGRVRTLayer>
>>>>      <OGRVRTLayer name="2732_RIVER_LINE_2006_04">
>>>>      
>>>>        <SrcDataSource
>>>>
>>>> relativeToVRT="1">2732/2732_RIVER_LINE_2006_04.shp</SrcDataSource>
>>>>
>>>>      </OGRVRTLayer>
>>>>      </OGRVRTUnionLayer>
>>>>
>>>> </OGRVRTDataSource>
>>>>
>>>>     * When I open the VRT in QGIS, does QGIS open ALL the VRT files and
>>>>    
>>>>       look for the extent of each of the files?
>>> If QGIS issues a GetExtent() on the VRT, then with the above definition,
>>> it will query the 3 shapefiles to find the extent of each. But on
>>> shapefiles this is a fast operation.
>>> You could define <ExtentXMin>, etc... just besides OGRVRTUnionLayer if
>>> you really want fast GetExtent()
>>>
>>>>         o If my VRT had the extents included for each of the files, would
>>>>        
>>>>           this stop QGIS from (at this stage) opening the files and
>>>>           reading the extents?
>>> Yes, but QGIS probably asks GetFeatureCount(), so it will need to open
>>> each shapefile, unless you define <FeatureCount> as well.
>>> But QGIS will also asks the field definition, and will need to open each
>>> ...., unless you define <Field>
>>>
>>>>     * Before rendering the VRT, does QGIS look at the extents of my
>>>>    
>>>>       viewport and only physically open my files and render it's
>>>>       contents?
>>> QGIS will define SetSpatialFilter() on the layer with the extent, so as
>>> the layer can use a spatial index if it has one. Reviewing my code in
>>> VRT union layer, I can see that the spatial filter will be forwarded to
>>> each source layer. So it will need to open them, but the shapefile
>>> driver won't scan any feature if setting a spatial filter that does not
>>> intersect the extent of the shapefile, so that should be fast. A
>>> possible optimization could be done in the VRT union layer to take into
>>> account the extent of the source layer to avoid iterating on it if the
>>> spatial filter on th union layer doesn't intersect that extent.
>>>
>>> To be efficient, you likely need to compute .qix spatial index on each
>>> shapefile.
>>>
>>>>       In other words, if I first zoom into a known area, then open my
>>>>       VRT, will QGIS at this stage still open all the subfiles, instead
>>>>       of waiting until a specific subfile needs opening)
>>>>    
>>>>     * As I pan around my map, does QGIS open and close the VRT subfiles
>>>>    
>>>>       that are out of my current viewing region?
>>> The VRT driver will maintain a pool of a maximum of 100 source layers by
>>> default (that number can be altered by setting the OGR_VRT_MAX_OPENED
>>> configuration option) and will close transparently the older ones
>>>
>>>>     * Presumably if any of my VRT subfiles touch or overlap my current
>>>>    
>>>>       viewport, they would be "processed" depending on what I am doing?
>>>>    
>>>>     * Is there a way to structure a VRT file so that you can have access
>>>>    
>>>>       to the underlying files that make up the VRT? (Even edit access?)
>>> Not sure what you mean by "have access to". But a union VRT can be opened
>>> in update mode and the update mode will be forwareded to the source
>>> layers (provided they support it). You can delete or modify features.
>>> For creation of new features, you need to specify <SourceLayerFieldName>
>>> as documented in http://gdal.org/drv_vrt.html
>>>
>>>> Or, is the VRT just any easy way to bunch a whole lot of maps under one
>>>> name, and there is no processing benefit depending on the area you are
>>>> viewing or working in?
>>> Your above VRT should work reasonably fast. Unless you have several
>>> hunderds or thousands of source layers. In which case, you may need to
>>> define more optional elements in the VRT to avoid the scans, and there
>>> would be perhaps a need for some enhancements in the OGRUnionLayer
>>> class.
>>>
>>> Even
>> Hi Even,
>> Thanks for the detailed thought, and for the effort of reviewing your code.
>> I'm fiddling with setting up quite a big dataset - likely to have over
>> 1000 shapefiles in the VRT - maybe even up to 3000 - but I will
>> experiment and see what is both logical and practical.
>> My goal with the above questions is to try to avoid opening all the
>> shapefiles at the time the VRT is opened, so that there won't be a
>> "million and one" physical disk IOs.
>> If the user then loads my VRT with rendering off, it should load very
>> quickly (if I can supply all the details needed, in the VRT file).
>> Once the user has zoomed into his/her area of interest, and turns
>> rendering on for the VRT, then (hopefully) only the underlying
>> shapefiles in that AOI need to be physically accessed.
>>
>> So, how compatible is the current code when opening a VRT, to zeroing
>> the need to open any underlying VRT files before any rendering or other
>> operations are done by the user, and if the user is "zoomed in", to
>> limiting the underlying VRT file-actions only to those affected by the
>> current zoom level?
> Zoltan,
>
> You definitely need to define all fields (otherwise the VRT driver will open each
> file to compute the union of fields) or declare
> <FieldStrategy>FirstLayer</FieldStrategy, the geometry type, the global
> extent. And ultimately, you would also need to declare the extent per layer,
> once an extra optimization would be done in the union layer to avoid opening
> files whose declared extent doesn't intersect the area of interest declared by
> SetSpatialFilter().
> I can imagine an enhanced version of ogrbuildvrt, as a python script, that
> would retrieve all needed informations from source layers, could be usefull
> too.
> I can imagine also an interesting improvement: you could declare some option
> in the VRT saying that by default GetNextFeature() on the union layer should
> return nothing, except if the area of interest doesn't cover more than X
> source layers, so that when the VRT is zoomed out, one doesn't try to open
> thousands of files.
> If you're interested in some of those improvements, you can contact me.
> Perhaps some spatial indexing of the bounding boxes of the shapefiles would
> also help, instead of sequential iterations, but for a few thousands ones,
> that isn't probably necessary yet (spatial indices are generally interesting
> only starting with dozains of thousands of geometries).
>
>> ogrinfo -al -so gives a lot of info that could be added to the static
>> VRT file, but is it enough to stop QGIS's implementation of VRT from
>> physically querying the underlying files until absolutely necessary?
>>
>> Also, when a VRT opened, do you really need all the knowledge (like
>> featurecount) at this stage?
> I know QGIS asks that in some situations for example when displaying
> informations about layers when opening a multi-layer dataset. You coud likely
> put a dummy value, like -1, since I don't think QGIS uses that except for
> informative purposes.
>
>> One negative of me putting the featurecount into the VRT xml, is that
>> someone could change that shape file, and the actual feature count would
>> then differ from that in the xml file.
>>
>> So, probably to negate the direction I am hoping to go in (like putting
>> details into the VRT file so that opening the VRT would cause minimal
>> disk io), the correct way would be to optimise the QGIS code so that the
>> information about the underlying files is only read by QGIS when
>> absolutely necessary.
> Yes, there are perhaps optimizations possible on QGIS side as well. If you use
> GDAL trunk, compiled as a debug build, you can use the OGR C API spy
> mechanism. I've added recently to help debugging my improvements in the
> MapInfo driver when I spotted bugs in it when using QGIS. See
> http://www.gdal.org/ograpispy_8h.html
>
> Even
>
OK - some good thoughts there, thanks.

At the moment it is knowledge gaining and experimenting for me, but if I
ever need to deploy what I am trying to set up, I'll most certainly try
to influence any possible optimisation - by whatever method makes
everyone happy to get involved.

Regards & keep well,
Zoltan

--

===========================================
Zoltan Szecsei PrGISc [PGP0031]
Geograph (Pty) Ltd.
GIS and Photogrammetric Services

P.O. Box 7, Muizenberg 7950, South Africa.

Mobile: +27-83-6004028
Fax:    +27-86-6115323     www.geograph.co.za
===========================================

_______________________________________________
Qgis-developer mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/qgis-developer