Validation of XML against ISO 19139 XSDs and other ISO 19115 rules

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Validation of XML against ISO 19139 XSDs and other ISO 19115 rules

John.Hockaday
Hi all,

I am new to GeoNetwork so I'm not sure if I should send this email to the
developers list or the users list.  This email certainly describes essential
user requirements to correctly implement ISO 19115 and relevant profiles.

I have been following the development of the ISO 19139 XSDs and noticed that
the "final" XSDs are available from
http://eden.ign.fr/xsd/isotc211/index_html?set_language=en&cl=en I have
downloaded these XSDs and validated them using Xerces-J and full-schema
checking.  I have also noticed that GeoNetwork is using a very old format for
the XML that it generates for the ISO 19115 metadata.

In 2006-05-02 Jeroen Ticheler mentions:

"With the ISO19139 now almost ready, we are working on a migration to
ISO19115:2003 validated using the 19139 schemas, but as said, this is not
[expected to be] ready within the coming 6 months except for some beta
versions".

This places the ISO 19139 work to be released in 2006-11.

There are three issues that seem to be not covered by the above mentioned
proposed release:

1. GeoNetwork needs to allow the user to identify all the validation rules
necessary for proof of compliance to ISO 19115 that are not provided by the
ISO 19139 XSDs; and

2. GeoNetwork needs to somehow allow all ISO 19115 profiles that are expected
to be created for different countries and communities of practices to
implement ISO 19115 and ISO 19139.

3. GeoNetwork needs to allow inheritance between metadata records.

Discussion item 1:
==================

To allow flexibility the ISO 19139 XSDs do not provide code lists.   For
example, each of the 24 Code lists in section B.5 of ISO 19115 can be
extended and supplied in different languages.  Therefore, it is expected that
these appropriate code lists will be defined in the profiles that implement
ISO 19139.

Many of the conditional statements in ISO 19115 (shown in comment boxes in
the UML diagrams) are not validated by the ISO 19139 XSDs.    Conditions such
as, "hierarchyLevel" and "hierarchyLevelName" are mandatory if not equal to
"dataset"; "topicCategory" is only mandatory if hierarchyLevel equals
"dataset" or "series"; "GeographicBoundingBox" and/or "GeographicDescription"
are mandatory if hierarchyLevel equals "dataset".  These conditions will need
to be validated using Schematron or XSL.  This does not appear to be
available in GeoNetwork.  

GeoNetwork will need to provide a two parse process to apply these rules.
The first validation is against the ISO 19139 XSDs to prove compliance to
this specification and the second validation using Schematron or XSLs to
prove compliance of the conditional statements, code lists and profile
extensions for ISO 19115.   There will also be the need to convert from one
profile to ISO 19139 format using some form of XSL.

Discussion item 2:
==================

GeoNetwork will need to be configurable to allow the profiles that meet
different countries', organisations' or communities of practices' needs.  ISO
19115 profiles should consist of: XSDs that are clones or import and extend
the ISO 19139 XSDs, code lists that implement those in ISO 19115, XSLs that
translate the profile's XML format into the ISO 19139 XML format for
validation to prove compliance to ISO 19139, namespaces to allow others to
access the profile's XSDs and validate the XML document instances against
these XSDs, registration with an ISO approved registrar to prove acceptance
of the profile by ISO.

Can GeoNetwork provide this flexibility to implement profiles using Jeeves or
is there a need to use some other technology like Xforms to provide these
user requirements?

Discussion item 3:
==================

One of the most powerful features of XML is the ability to inherit
information from other XML records.  ISO 19115 also identifies the need to
allow inheritance.  Annexes "G" and "H" explain how this is expected to be
made available.  GeoNetwork does not appear to make this option available.
The is no option to "inherit" information from another metadata record.  This
is essential for the implementation of child hierarchLevels such as;
"featureType", "feature", "attributeType", "attribute", "tile",
"collectionSession", "fieldSession" etc.  It is also useful for the
maintenance of often used content such as "organisationName" and its relevant
child elements similar to the normalisation capabilities of RDBMS.  For
example, a metadata record could contain the contact details of an
organisation.  This information can be obtained by other metadata records
using inheritance.  If that organisation changes its name, address, phone
numbers, email address or any other contact details then these changes only
need to be made in the parent metadata record rather than every single
metadata record.  The children metadata records would inherit these changes
without direct editing the child's content.

Will the next release of GeoNetwork provide inheritance?


I have also found some other smaller issues with GeoNetwork that do not
comply with ISO 19115.  Here is a small but not comprehensive list:

1. The domain of the content of the "language" element is ISO 639-2.  "-2"
specifies the three letter abbreviation used for different countries. "-1"
specifies the two letter abbreviation.  Therefore, the pick list for the
"language" element should contain three letter values like "eng" for
"English" not two letter values like "en".

2. The domain for the "dateStamp" element is "Date".  "Date" is an ISO 8601
date format not an ISO 8601 DateTime format.  GeoNetwork prompts a dateTime
format in the content of the "dateStamp" element.  It should be a "Date"
format.  For example, yyyy-mm-dd, yyyymmdd, yyyy-mm, yyyy or yy (for the
century although this is not available in the W3C XML Schema implementation)
not yyyy-mm-ddThh:mm:ss.  DateTime formats should also include the time zone
eg. yyyy-mm-ddThh:MM:ss+Z so that local times can be used rather than GMT.

3. The "Default view" for creating metadata shows the "identificationInfo" as
the first list of items to be filled out.  The condition of many other
elements depends on the type of metadata being filled out.  That is,
"hierarchyLevel".  It is important to identify other metadata elements early
in the creation of metadata such as, "language" (What language to use for the
metadata), "parentIdentifier" (From what metadata records should content be
inherited), "characterSet" (What characters do you need to enter your
metadata).  I would expect that the very first options that a user should
select are these elements.  Of course default values should be prompted for
these elements but the user should be able to change them so that the content
of other metadata elements can be determined.

4. GeoNetwork doesn't seem to contain some of the corrections made in the ISO
19115 Corrigendum.  The Corrigendum has been approved for publishing by ISO.
When will GeoNetwork reflect these corrections?

I hope that this information helps and thank you to anyone who replies to
this email.


John Hockaday
Geoscience Australia
GPO Box 378
Canberra ACT 2601
(02) 6249 9735
http://www.ga.gov.au/ 
john.hockaday\@ga.gov.au