Optimizing lookup of init files

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Optimizing lookup of init files

Daniel Morissette
Hi,

Users of applications such as MapServer that make use of the
"+init=epsg:4326" type of syntax to initialize projection definitions
have reported a performance issue when their maps contain multiple
projections (see http://trac.osgeo.org/mapserver/ticket/1976).

After discussing the issue with Frank it seems that the best fix would
be to implement caching in the get_init(), get_defaults(), and get_opt()
functions which manage access to initialization files for +init
statements and for defaults.

Unfortunately implementing caching in a static data structure could
introduce some multi-threading issues for some applications. There are
two possible ways to handle this in a thread-safe way and I would like
to get your opinion on which one you think is best:


Option 1:
---------

Implement multi-threaded locking support to protect the blocks of code
that deal with the cache. We would likely do this using a copy of the
code from MapServer's mapthread.c/.h which supports Linux (Pthreads) and
Windows and has been tested and is known to work on several platforms.
For more details on this option see the documentation starting at line
30 in mapthread.c:
http://trac.osgeo.org/mapserver/browser/trunk/mapserver/mapthread.c


Option 2:
---------

Create the concept of a pj_session data structure in which we maintain
the cache and can which be easily extended later on to contain any
relevant information that needs to persist between various calls to the
PROJ API.

Then we could create a pj_init_ex.c with
   pj_init_ex(pj_session *, int argc, char **argv)
which calls get_init_ex(), etc...

To maintain backwards compatibility, the existing pj_init() and family
would simply be calls to the pj_..._ex() equivalent with a NULL session
handle. If the session handle is NULL then no caching happens.

The pj_session object would be created and freed using
pj_session_create() and pj_session_destroy() by the calling application
if it cares to take advantage of caching and other persistent features.


There are pros and cons to both approaches. Can anyone think of a 3rd
option that would be even less disruptive?

Comments and suggestions welcome. I'm especially interested in feedback
from developers of applications using PROJ.

Thanks in advance

Daniel
--
Daniel Morissette
http://www.mapgears.com/
_______________________________________________
Proj mailing list
[hidden email]
http://lists.maptools.org/mailman/listinfo/proj
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing lookup of init files

Gerald I. Evenden
On Wednesday 01 August 2007 11:49 am, Daniel Morissette wrote:
> Hi,
>
> Users of applications such as MapServer that make use of the
> "+init=epsg:4326" type of syntax to initialize projection definitions
> have reported a performance issue when their maps contain multiple
> projections (see http://trac.osgeo.org/mapserver/ticket/1976).
        ...
> There are pros and cons to both approaches. Can anyone think of a 3rd
> option that would be even less disruptive?

Yes.  Remove the +init from proj.  That was the first thing I did when
upgrading to libproj4 and lproj.  Seems like something more appropriate for
mysql or other DBMS.

> Comments and suggestions welcome. I'm especially interested in feedback
> from developers of applications using PROJ.

Sorry.  I am sure that suggestion was not appreciated but it follows my
approach of keeping operations separated into appropriate categories.
Like datum shifting is not appropriate within the projection software element,
extracting initialization information from a database is also not
appropriate.

--
The whole religious complexion of the modern world is due
to the absence from Jerusalem of a lunatic asylum.
-- Havelock Ellis (1859-1939)  British psychologist
_______________________________________________
Proj mailing list
[hidden email]
http://lists.maptools.org/mailman/listinfo/proj
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing lookup of init files

Eric Miller-4
>>> On 8/1/2007 at 9:46 AM, "Gerald I. Evenden" <[hidden email]> wrote:
> On Wednesday 01 August 2007 11:49 am, Daniel Morissette wrote:
>> Hi,
>>
>> Users of applications such as MapServer that make use of the
>> "+init=epsg:4326" type of syntax to initialize projection definitions
>> have reported a performance issue when their maps contain multiple
>> projections (see http://trac.osgeo.org/mapserver/ticket/1976).
> ...
>> There are pros and cons to both approaches. Can anyone think of a 3rd
>> option that would be even less disruptive?
>
> Yes.  Remove the +init from proj.  That was the first thing I did when
> upgrading to libproj4 and lproj.  Seems like something more appropriate for
> mysql or other DBMS.
>
>> Comments and suggestions welcome. I'm especially interested in feedback
>> from developers of applications using PROJ.
>
> Sorry.  I am sure that suggestion was not appreciated but it follows my
> approach of keeping operations separated into appropriate categories.
> Like datum shifting is not appropriate within the projection software
> element,
> extracting initialization information from a database is also not
> appropriate.

I tend to agree with Gerald on this issue.  I think maintenance of predefined coordinate systems
should be separated from the library.  It seems the coordinate systems need to be updated more
often than the actual library.

However, if you're going to add caching support and make it thread safe, you might as well fix the non-thread safe caching of grid shift files.  I recently had to abandon PROJ.4 from a project because of that.

Because of the threading problems from above, I started to look into using Gerald's new library.  After I patched that library for Windows (no atanh in math.h), I started looking at how the interface should be designed.  My initial idea is a bit different than the cs2cs model.  Basically, I decided the library should supply a set of coordinate transformations.  The transformations then can be chained to get from some system A to another system B.  It'd be up to the client to specify the set of transforms that need to be applied.

For instance:

typedef struct _transform_t transform_t;

int transform_create(const char *init, transform_t **transform);
int transform_apply(const transform_t *transform, size_t dims, double data[dims]);
void transform_destroy(transform_t **transform);

Initialization strings might be something like:
"+transform=pj_fwd +proj= ... "
"+transform=gridshit +grid=conus +direction=forward"

But, that's about as far as I got.  It's hard for me to devote any time to work on this...

The main idea was that there are many ways to transform coordinates, so the library should provide more flexibility on how to get from A to B.  A higher level interface could provide some defaults (invproj -> geodetic2geocentric -> bursa_wolf -> geocentric2geodetic -> proj).



_______________________________________________
Proj mailing list
[hidden email]
http://lists.maptools.org/mailman/listinfo/proj
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing lookup of init files

Frank Warmerdam
Eric Miller wrote:
> However, if you're going to add caching support and make it thread safe, you
> might as well fix the non-thread safe caching of grid shift files.  I
> recently had to abandon PROJ.4 from a project because of that.

Eric,

I agree.  If we either introduce a session object, or thread locks
this should also be used to protect the grid shift file stuff.

> Because of the threading problems from above, I started to look into using
> Gerald's new library.  After I patched that library for Windows (no atanh in
> math.h), I started looking at how the interface should be designed.  My
> initial idea is a bit different than the cs2cs model.  Basically, I decided
> the library should supply a set of coordinate transformations.  The
> transformations then can be chained to get from some system A to another
> system B.  It'd be up to the client to specify the set of transforms that
> need to be applied.
...
> The main idea was that there are many ways to transform coordinates, so the
> library should provide more flexibility on how to get from A to B.  A higher
> level interface could provide some defaults (invproj -> geodetic2geocentric
> -> bursa_wolf -> geocentric2geodetic -> proj).

I'm not adverse to having the library expose individual methods so folks
can have detailed control when they want it.  But from a practical point of
view I believe that applications like MapServer, GRASS, and GDAL mostly
want to avoid micromanaging this stuff.  They want a string representation
of two coordinate systems and a function to convert coordinates from one
to the other.

This discussion is beside the point as far as Daniel's question goes though.

With regard to his two approaches, I think introducing thread locks in
PROJ.4 would not be terribly complicated. We have good models for this in
MapServer and GDAL.  But I do hate having to maintain platform stuff like
a thread locking abstraction in so many different places.

The session idea provides a nice lock-free approach with the understanding
that sessions cannot be shared between threads - at least concurrently.
*But* each session will end up with duplication of cached grids and
initialized projections.  It puts a non-trivial weight on calling
applications to keep track of sessions when there was previously no such
thing.

So, I'm personally inclined towards the internal thread locking approach,
though not strongly.

Best regards,
--
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up   | Frank Warmerdam, [hidden email]
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | President OSGeo, http://osgeo.org

_______________________________________________
Proj mailing list
[hidden email]
http://lists.maptools.org/mailman/listinfo/proj
Reply | Threaded
Open this post in threaded view
|

RE: Optimizing lookup of init files

Dean Mikkelsen
In reply to this post by Eric Miller-4
I like the route shown below.

invproj -> geodetic2geocentric -> bursa_wolf -> geocentric2geodetic -> proj

I'm currently looking at developing a tool based PROJ.4 to do something
similar (web-based possibly).

I've always thought of projections and datum transforms as two different
beasts.

In place of busra_wolfe in the above example, though, you could incorporate
the grid methods (used in South Africa, Canada, the U.S., etc.), depending
on the accuracy you are after, whether it be for mapping or more detailed
surveying.

Ideally, also, on the bursa_wolfe, you'd have to specify which specific one
particular to a region, as even in the EPSG codes, there are various ones
for various parts of countries or states and even offshore. The accuracies
range from being very poor (very few survey points to be adjusted) to very
good for a small area (many survey points used to determine the rotations,
scale factor, translations).

Cheers,
Dean


-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Eric Miller
Sent: Wednesday, August 01, 2007 10:38 AM
To: [hidden email]; PROJ.4 and general Projections Discussions
Subject: Re: [Proj] Optimizing lookup of init files


>>> On 8/1/2007 at 9:46 AM, "Gerald I. Evenden"
>>> <[hidden email]> wrote:
> On Wednesday 01 August 2007 11:49 am, Daniel Morissette wrote:
>> Hi,
>>
>> Users of applications such as MapServer that make use of the
>> "+init=epsg:4326" type of syntax to initialize projection definitions
>> have reported a performance issue when their maps contain multiple
>> projections (see http://trac.osgeo.org/mapserver/ticket/1976).
> ...
>> There are pros and cons to both approaches. Can anyone think of a 3rd
>> option that would be even less disruptive?
>
> Yes.  Remove the +init from proj.  That was the first thing I did when
> upgrading to libproj4 and lproj.  Seems like something more appropriate
for

> mysql or other DBMS.
>
>> Comments and suggestions welcome. I'm especially interested in
>> feedback from developers of applications using PROJ.
>
> Sorry.  I am sure that suggestion was not appreciated but it follows
> my
> approach of keeping operations separated into appropriate categories.
> Like datum shifting is not appropriate within the projection software
> element,
> extracting initialization information from a database is also not
> appropriate.

I tend to agree with Gerald on this issue.  I think maintenance of
predefined coordinate systems should be separated from the library.  It
seems the coordinate systems need to be updated more often than the actual
library.

However, if you're going to add caching support and make it thread safe, you
might as well fix the non-thread safe caching of grid shift files.  I
recently had to abandon PROJ.4 from a project because of that.

Because of the threading problems from above, I started to look into using
Gerald's new library.  After I patched that library for Windows (no atanh in
math.h), I started looking at how the interface should be designed.  My
initial idea is a bit different than the cs2cs model.  Basically, I decided
the library should supply a set of coordinate transformations.  The
transformations then can be chained to get from some system A to another
system B.  It'd be up to the client to specify the set of transforms that
need to be applied.

For instance:

typedef struct _transform_t transform_t;

int transform_create(const char *init, transform_t **transform); int
transform_apply(const transform_t *transform, size_t dims, double
data[dims]); void transform_destroy(transform_t **transform);

Initialization strings might be something like: "+transform=pj_fwd +proj=
... " "+transform=gridshit +grid=conus +direction=forward"

But, that's about as far as I got.  It's hard for me to devote any time to
work on this...

The main idea was that there are many ways to transform coordinates, so the
library should provide more flexibility on how to get from A to B.  A higher
level interface could provide some defaults (invproj -> geodetic2geocentric
-> bursa_wolf -> geocentric2geodetic -> proj).



_______________________________________________
Proj mailing list
[hidden email] http://lists.maptools.org/mailman/listinfo/proj


_______________________________________________
Proj mailing list
[hidden email]
http://lists.maptools.org/mailman/listinfo/proj