R: R: [Geoserver-devel] Ingestion Engine proposal

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

R: R: [Geoserver-devel] Ingestion Engine proposal

P.Rizzi Ag.Mobilità Ambiente
Hi Paolo,
ok I understand what you're saying ... I agree with you, but actually I cannot figure out how we can persist the configuration somewhere. We cannot exclude the Web interface ... let me do an example. Consider that we have different ways to populate the GeoServer Memory Image and at a certain point we need to manually modify through the web interface the metadata associated with a certain DataStore or FeatureType. My question is, how can we persist changes at the end? We should preserve even informations about the config plugin or something like that ... I missing something?
[P.Rizzi Ag.Mobilità Ambiente]  Yes, in fact what I have done so far is making the Web interface unfunctional.
You can go there, see all the automatically configured DataStores and FeatureTypes, modify them
and save them to the catalog.xml and fellow files, but they're ignored.
If you do a GetCapabilities, you'll see only the automatically configured stuff,
because the Tomcat Valve is replacing the memory image of the config data
with the one coming from the automatism.
Sure this is not good, because the Web interface should be functional, or be removed
altogether if not (albeit it's anyway useful even if it's read-only).
The fact is that, apart from the connection params of each DataStores, all the
other metadata, the one related to FeatureTypes, is coming from MetaStores,
that are data model describing all aspect of a FeatureTypes (structure, validation,
security, etc.). And MetaStores are, indeed, DataStores.
So to change the metadata associated with a particular FeatureType, you do that
by modifying it inside it's MetaStore. Since a MetaStore is a DataStore, you can do that
with whatever user interface capable of talking to a DataStore, even through WFS.
We currently have no good user interface and there're other pieces not already properly done,
that's why I didn't published it yet.
Therfore now we have to make a choice between two ways I think:
1) The fastest: building a plugin integrated with the actual GeoServer implementation
2) The slowest: start to think about a better GeoServer configuration system
By the way, for the WCS we absolutely needs an Ingestion Engine capable to recognize file system changes, so in any case we need a plugin that writes DTOs by performing a periodic file system scan or something like this.
I'd like to ear something as soon as possible, eventually you can send me your code so I can take a look and try to find a common solution to the problem.
Dave, I'd like to ear something from you too before starting to implement something unuseful.
[P.Rizzi Ag.Mobilità Ambiente] Yes, in fact I wasn't suggesting you to follow the
slowest path, because I would not do it myself. When I created our auto-config system
I did it following only our needs, and you should do the same.
My message was meant to state our current vision and also as a memorandum.
The only thing I can recommend you is to keep good separation between the config data
you need, the way you persist it and the servlet that uses both.
This way your mechanism will surely be useful now and tomorrow,
if you talk to DTOs there should be no problem
I'd like to send you our code, but I have to find the time to revisit it and see
if there's anything that may make it useless for you or, at least, write
a good instruction document, but I'm really taken with other matters.
In fact when I saw you first message about the ingestion engine I hoped
it was about something else, because in this very moment, I doubt I can be
of much help...
Paolo Rizzi
On 7/19/05, P. Rizzi Ag.Mobilità Ambiente <[hidden email]> wrote:
Hi Alessio,
we here have, like many others I think, the same need to automatically
configure GeoServer, without passing through its Web Admin interface.
We are, until now, only working with DataStores, but I believe there
should be no much differences for the Coverages part.

I read what you are thinking to do to solve this, and I should say
I personally wouldn't go in that direction, but you have for sure
your needs and for sure your best bet is to follow them.

Anyway here follow my vision of this matter.

I think the configuration part in GeoServer is not its best one,
because it intermixes three concerns that should instead be separated:
  - the config data itself
  - the config data persistence
  - the servlet

The configuration data, for me, is the MEMORY IMAGE a given running instance
of GeoServer has in any given instant in time, and that has nothing to do
with how that information was generated, beeing it from an XML file, from a
DB, etc.
So each configuration mechanism will persist its own data in the form it
likes more, there's no need to save and load it from a single source.

What I'm saying is that the configuration data must be separated from it's
Now GeoServer uses several files and directories to store its configuration
and it's
only able to load the configuration from that files.
Instead the catalog.xml and all others should be only one of the many
persistence forms for GeoServer's configuration data. That is, the
file and the pieces of Java code able to manage it, should be no different
from the ingestion engine or from whatever other mechanism one can invent.

The memory image of the configuration data is kept by GeoServer inside
the ServletContext, so that it's globally accessible from any servlet
running inside a given GeoServer instance and from any HTTP request/response
beeing serviced by that servlet. So you can modify that memory image
using a servlet filter, without any need to store it using the same
persistence mechanism that GeoServer uses now.

Anyway, if one would prefer to persist all the config info inside the
same store, I won't use files and directories for that, I'd use DataStores.
GeoServer uses GeoTools to access data, and GeoTools uses DataStores
to that purpose. Configuration data are no different from other data,
so I believe the right place to store it is inside DataStores.
There's a GMLDataStore (although it's read-only at the moment), so one
can use it if he likes to store config info inside XML files.

I think that the main concern is not how one persist the config info,
but it's the config info itself, that is the Catalog.
Now GeoServer uses FeatureTypeInfo and related classes to construct
its memory image of configuration. Are that classes good enough
to support coverages too??? Are they flexible/extensible enough
to support future services??? Are they "compatible" with classes
used by other systems (like uDig for example)???
I feel this is what we should think about, the mean by which these
catalog classes are persisted should be irrilevant.

One distinction must be made here. I think that there're two different
levels of config info. One is about DataStores, that is how to connect
to each source of data. The other is about FeatureTypes, that is which
data is available, how it is structured, how it must be validated,
who has the rights to read or modify it, etc.
DataStore config info can be in simple XML files or whatever, because
you only have to say how to connect to that source of data, so the
connection params are basically all that's needed.
FeatureType config info are a different story, or better they're no
config info at all, they are metadata and they belong to the Catalog.
In our vision, and in what we have implemented so far, to add new data
to GeoServer, you simply have to add an XML file with the connection params
of the DataStore. The metadata for each FeatureType contained in that
is read from the DataStore itself. That is, for each DataStore you want to
to GeoServer you MUST also add what we call a MetaStore, that is a
form of the metadata for the FeatureTypes contained in the DataStore itself.
Together with the MetaStore you MUST specify a Loader able to load that
persisted metadata in a memory image, that is a Catalog. We have a few
implemented, one is able to load metadata from FeatureSources (aka DB
with a specific structure, another one is able to infer a minimum set
of metadata from the DataStore itself, so that you can also add a DataStore
for which you actually don't have a proper MetaStore.
So we have this catalog in memory and we can use it to configure GeoServer
on-the-fly using a servlet filter (actually it is a Tomcat Valve) that
"talks" to the GeoServer's catalog building DTOs on the fly.

But there's more to it...

We're also more and more convinced that some of the things GeoServer is able
to do now should be moved to GeoTools. Validation and Transactionality
should be in GeoTools, probably even the GetFeature operation should be in
The central point of it all is basically about be able to operate
(querying, reading, writing, validating, etc.) against a set of DataStores
instead that against each one separately, and this capability should be in
not in GeoServer.

If things were like that, many of the config info now used by GeoServer,
the ones regarding FeatureTypes metadata, would go into the GeoTools catalog

GeoServer will then have only the config info relevant to each OGC service
it exposes (WFS, WMS, WCS, etc.) and it only needs to have references to the
metadata configured inside the GeoTools catalog.

So each service plugged into GeoServer will have it's own configuration
for its own configuration info. GeoServer will only need a system to
configure the plugins,
data will instead be configured inside the GeoTools catalog.

...that was a very long dissertation, I'm sorry...
And implementing it is much more impacting then what you're proposing,
so it may take a while to do (even if we already implemented a certain part
of it).
Also in this very moment we have other more urgent aspects to see after,
so I'm afraid I won't be able to heavily work on this for a while.
I'm very sorry about this, because I'd like to see others using what
we've done so far, I'd have to find time to make it general enough
and to publish it...

Paolo Rizzi

> -----Messaggio originale-----
> Da: Alessio Fabiani [mailto:[hidden email]]
> Inviato: martedì 19 luglio 2005 11.30
> A: [hidden email];
> [hidden email]
> Oggetto: [Geoserver-devel] Ingestion Engine proposal
> Hi all,
> I will explain in this email our proposal for a GeoServer
> Ingestion Engine.
> The Ingestion Engine we would like to implement for GeoServer should
> be configured as a PlugIn that an Administrator can plug into
> GeoServer and use as an alternative to the web interface to manage the
> configuration files, i.e. the "catalog.xml" which is where NameSpaces,
> DataStores and CoverageFormats parameters are stored and the different
> "info.xml" associated to each GeoServer features and coverages which
> is where all the information relative to the FeatureType or
> GridCoverage are stored.
> In order to achieve this objective, we do not want to modify the
> actual GeoServer configuration concept, at this moment every time an
> Administrator wants to add a new FeatureType or Coverage to GeoServer
> he has to follow several steps:
> Step 1: Defining the parameters and the ID for a new DataStore or
> Format. In the new release of GeoServer-WCS experiment we have renamed
> DataStore as FeatureStore and Format as CoverageStore because they are
> theoretically the same thing respectively for Vectorial and Gridded
> data. GeoServer stores all those informations in the catalog.xml .
> Step 2: Creating a new FeatureType or GridCoverage starting from the
> Store created in the Step 1. GeoServer creates a new directory with
> the same name of the Store ID and Feature/Coverage name and stores
> inside an info.xml file containing all the metadata associated to the
> latter.
> Notice that GeoServer makes the configuration files persistent only
> after the Administrator does a Save action by clicking over the button
> associated.
> The Ingestion Engine we have in mind should be able to perform Step 1,
> Step 2 and Save configuration automatically.
> We have two main objectives to achieve:
> 1.    Building something that is pluggable and unplaggable to
> GeoServer
> 2.    Building something that allows GeoServer to automatically modify
> the configuration performing the above steps without removing the
> actual GeoServer configuration management system
> To achieve the first objective we think about building a Servlet with
> his own classes that the administrator can add/remove, configure and
> enable/disable by GeoServer web.xml. This Servlet will work on a
> temporal based schedule by simply checking the file system structure
> for changes.
> To achieve the second objective the Servlet simply will automatize the
> Administrator steps for each change.
> How the servlet works:
> First of all we do not want to force users to maintain a predefined
> file system structure. We think about a system that mainly leaves
> unaltered the file system manually created by the Administrator using
> the web interface but creates and maintains it's own structure for the
> subdirectories automatically managed, compatible with the first one.
> Suppose that the Administrator wants to create a new subdirectory
> automatically managed by the GeoServer Ingestion Engine for a set of
> files belonging to a particular Store.
> What he has to do is creating this subdirectory and placing inside it
> a particular xml file which describes the Store type the common
> parameters and the metadata that the Ingestion Engine will use. An
> external tool, that we want to create too, can be used to create this
> configuration file. The Ingestion Servlet will scan the directory and
> every time it will encounter a new compatible file it will create a
> new subdirectory where this file (and all related) will be moved and
> the relative info.xml will be created. For FeatureStores like postgis
> those files can be just xml files containing the parameters named like
> the final FeatureType. If the Administrator deletes one or more of
> those subdirectories, the Ingestion Servlet will remove the
> Features/Coverages (and the associated Store) from the GeoServer
> configuration. Notice that the Administrator can even manually remove
> those features/coverages by using the web interface.
> Attached there are two images that show how the Ingestion Engine
> should work on an "auto-managed" subdirectory.
> Moreover notice that by adding few more metadata informations we can
> even handle WMS nested layers. We don't need to reflect the exact File
> System tree structure, we can even build a virtual WMS layer tree
> structure by handling some metadata. I will explain in detail in the
> next email.