OS Spatial environment 'sizing'

classic Classic list List threaded Threaded
58 messages Options
123
Reply | Threaded
Open this post in threaded view
|

OS Spatial environment 'sizing'

Bruce.Bannerman

IMO:


Hello everyone,

I'm trying to get a feel for server 'sizing' for a **hypothetical** Corporate environment to support OS Spatial apps.



Assume that:

- this is a dedicated environment to allow the use of OS Spatial applications to serve Corporate OGC Services.

- the applications of interest are GeoServer, Deegree, GeoNetwork, MapServer, MapGuide and Postgres/PostGIS.

- the environment may need to scale relatively quickly.

- it will be required to serve in the vicinty of 5 to 10 TB of data initially (WMS, WFS, WCS).



Can anyone shed some light on the following questions please?

- I'm assuming a Linux installation (SLES, Redhat or Debian) or possibly Intel Solaris. Has anyone experienced any issues in these (or other) environments that they'd like to share?

- Are there any recommendations as to dedicated network bandwidth that should be allocated?

- Has anyone done any work with load balancing and would like to share their experiences?

- Of the above OS Spatial products, which ones could co-exist on the same server (excluding Postgres/PostGIS)?


Any thoughts are appreciated.


Bruce Bannerman

Australia

Notice:
This email and any attachments may contain information that is personal, confidential,
legally privileged and/or copyright.
No part of it should be reproduced, adapted or communicated without the prior written consent of the copyright owner.

It is the responsibility of the recipient to check for and remove viruses.

If you have received this email in error, please notify the sender by return email, delete it from your system and destroy any copies. You are not authorised to use, communicate or rely on the information contained in this email.

Please consider the environment before printing this email.

 

 

 


_______________________________________________
Discuss mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/discuss
Reply | Threaded
Open this post in threaded view
|

Re: OS Spatial environment 'sizing'

Paul Ramsey
Bruce,

On 2/18/08, [hidden email]
<[hidden email]> wrote
> - the applications of interest are GeoServer, Deegree, GeoNetwork,
> MapServer, MapGuide and Postgres/PostGIS.
>
> - the environment may need to scale relatively quickly.
>
> - it will be required to serve in the vicinty of 5 to 10 TB of data
> initially (WMS, WFS, WCS).
> - Of the above OS Spatial products, which ones could co-exist on the same
> server (excluding Postgres/PostGIS)?

Putting the Java applications into the same application server would
save a fair amount of memory. Running Java applications takes a
surprising amount of memory, so having them share a runtime would add
efficiency.

I think the best thing folks could do to make a "corporate open source
spatial" strategy work would be to give folks a means of easily
creating apps and moving them through the devel/test/production chain.
 Access to scripting languages with database access (PHP, Python,
whatever) and a standard application packaging standard that allows
folks to "deploy from a tag". Basically once an app is "done" in
development, tag it and push "deploy" and it's pulled into test
without human hands touching it. Once it's passed test, again, mash a
button and boom it's live on production.

Fun fun fun!

P
_______________________________________________
Discuss mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/discuss
Reply | Threaded
Open this post in threaded view
|

RE: OS Spatial environment 'sizing'

Randy George
In reply to this post by Bruce.Bannerman

Hi Bruce,

 

                On the “scale relatively quickly” front, you should look at Amazon’s EC2/S3 services. I’ve recently worked with it and find it an attractive platform for scaling http://www.cadmaps.com/gisblog

 

The stack I like is Ubuntu+Java+ Postgresql/PostGIS + Apache2 mod_jk Tomcat + Geoserver + custom SVG or XAML clients run out of Tomcat

 

                If you use the larger instances the cost is higher but it sounds like you plan on some heavy raster services (WMS,WCS) and lots of memory will help.

Small EC2 instance provides $0.10/hr:

1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit), 160 GB of instance storage, 32-bit platform

 

Large EC2 instances provide $0.40/hr:

7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each), 850 GB of instance storage, 64-bit platform

 

Extra large EC2 instances $0.80/hr:

15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each), 1690 GB of instance storage, 64-bit platform

 

Note: that the instances do not need to be permanent. Some people (WeoGeo) have been using a couple of failover small instances and then starting new large instances for specific requirements. The idea is to start and stop instances as required rather than having ongoing infrastructure costs. It only takes a minute or so to start an ec2 instance. If you are running a corporate service there may be parts of the day with very little use so you just schedule your heavy duty instances for peak times. If you can connect your raster to S3 buckets rather than instance storage you have built in replicated backup.

 

I know that Java JAI can easily eat up memory and is core to Geoserver WMS/WCS so you probably want to look at large memory footprint for any platform with lots of raster service. I’m partial to Geoserver because of its Java foundation.  I think I would try to keep the Apache2 mod_jk Tomcat Geoserver on a separate server instance from PostGIS. This might avoid problems for instance startup since your database would need to be loaded separately. The instance ami resides in a 10G partition the balance of data will probably reside on a /mnt partition separate from ec2-run-instances. You may be able to avoid datadir problems by adding something like Elastra to the mix. Elastra beta is a wrapper for PostgreSql that puts the datadir on S3 rather than local to an instance. I suppose they still keep indices(GIST et al) on the local instance.

(I still think it an interesting exercise to see what could be done connecting PostGIS to AWS SimpleDB services.)

 

So thinking out loud here is a possible architecture–

    Basic permanent setup

put raster in S3 – this may require some customization of Geoserver,

build a datadir in a PostGIS and backup to S3

create a private ami for Postgresql/PostGIS

create a private ami for the load balancer instance

create a private ami with your service stack for both a small and large instance for flexibility,

   Startup services

start a balancer instance

point your DNS CNAME to this balancer instance

start a PostGis instance (you could have more than one if necessary but it would be easier to just scale to a larger instance type if the load demands it)

have a scripted download from an S3 BU to your PostGIS datadir (I’m assuming a relatively static data resource)

   Variable services

start service stack instance and connect to PostGIS

update balancer to see new instance – this could be tricky

repeat previous  two steps as needed

at night scale back – cron scaling for a known cycle or use a controller like weoceo to detect and respond to load fluctuation

 

By the way the public AWS ami with the best resources that I have found is Ubuntu 7.10 Gutsy. The debian dependency tools are much easier to use and the resources are plentiful.

 

I’ve been toying with using an AWS stack adapted for serving some larger Postgis vector sets such as fully connected census demographic data and block polygons here in US. The idea would be to populate the data directly from the census SF* and TIGER with a background Java bot. There are some potentially novel 3D viewing approaches possible with xaml. Anyway lots of fun to have access to virtual systems like this.

 

As you can see I’m excited anyway.

 

randy

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of [hidden email]
Sent: Monday, February 18, 2008 6:35 PM
To: OSGeo Discussions
Subject: [OSGeo-Discuss] OS Spatial environment 'sizing'

 


IMO:


Hello everyone,

I'm trying to get a feel for server 'sizing' for a **hypothetical** Corporate environment to support OS Spatial apps.



Assume that:

- this is a dedicated environment to allow the use of OS Spatial applications to serve Corporate OGC Services.

- the applications of interest are GeoServer, Deegree, GeoNetwork, MapServer, MapGuide and Postgres/PostGIS.

- the environment may need to scale relatively quickly.

- it will be required to serve in the vicinty of 5 to 10 TB of data initially (WMS, WFS, WCS).



Can anyone shed some light on the following questions please?

- I'm assuming a Linux installation (SLES, Redhat or Debian) or possibly Intel Solaris. Has anyone experienced any issues in these (or other) environments that they'd like to share?

- Are there any recommendations as to dedicated network bandwidth that should be allocated?

- Has anyone done any work with load balancing and would like to share their experiences?

- Of the above OS Spatial products, which ones could co-exist on the same server (excluding Postgres/PostGIS)?


Any thoughts are appreciated.


Bruce Bannerman

Australia

Notice:
This email and any attachments may contain information that is personal, confidential,
legally privileged and/or copyright.
No part of it should be reproduced, adapted or communicated without the prior written consent of the copyright owner.

It is the responsibility of the recipient to check for and remove viruses.

If you have received this email in error, please notify the sender by return email, delete it from your system and destroy any copies. You are not authorised to use, communicate or rely on the information contained in this email.

Please consider the environment before printing this email.

 

 

 


_______________________________________________
Discuss mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/discuss
Reply | Threaded
Open this post in threaded view
|

Re: OS Spatial environment 'sizing'

Cameron Shorter
Randy, what an informative email.
It is almost a "Howto for OSGeo hardware and performance tuning". I'm
not aware of anyone who has written something similar (although I admit
I have not looked).

I'd love to see it incorporated into an easily referenced resource -
maybe a chapter in
http://wiki.osgeo.org/index.php/Educational_Content_Inventory

Also, a link from http://wiki.osgeo.org/index.php/Case_Studies .

What do you think?

Randy George wrote:

>
> Hi Bruce,
>
> On the “scale relatively quickly” front, you should look at Amazon’s
> EC2/S3 services. I’ve recently worked with it and find it an
> attractive platform for scaling http://www.cadmaps.com/gisblog
>
> The stack I like is Ubuntu+Java+ Postgresql/PostGIS + Apache2 mod_jk
> Tomcat + Geoserver + custom SVG or XAML clients run out of Tomcat
>
> If you use the larger instances the cost is higher but it sounds like
> you plan on some heavy raster services (WMS,WCS) and lots of memory
> will help.
>
> Small EC2 instance provides $0.10/hr:
>
> 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2
> Compute Unit), 160 GB of instance storage, 32-bit platform
>
> Large EC2 instances provide $0.40/hr:
>
> 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2
> Compute Units each), 850 GB of instance storage, 64-bit platform
>
> Extra large EC2 instances $0.80/hr:
>
> 15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2
> Compute Units each), 1690 GB of instance storage, 64-bit platform
>
> Note: that the instances do not need to be permanent. Some people
> (WeoGeo) have been using a couple of failover small instances and then
> starting new large instances for specific requirements. The idea is to
> start and stop instances as required rather than having ongoing
> infrastructure costs. It only takes a minute or so to start an ec2
> instance. If you are running a corporate service there may be parts of
> the day with very little use so you just schedule your heavy duty
> instances for peak times. If you can connect your raster to S3 buckets
> rather than instance storage you have built in replicated backup.
>
> I know that Java JAI can easily eat up memory and is core to Geoserver
> WMS/WCS so you probably want to look at large memory footprint for any
> platform with lots of raster service. I’m partial to Geoserver because
> of its Java foundation. I think I would try to keep the Apache2 mod_jk
> Tomcat Geoserver on a separate server instance from PostGIS. This
> might avoid problems for instance startup since your database would
> need to be loaded separately. The instance ami resides in a 10G
> partition the balance of data will probably reside on a /mnt partition
> separate from ec2-run-instances. You may be able to avoid datadir
> problems by adding something like Elastra to the mix. Elastra beta is
> a wrapper for PostgreSql that puts the datadir on S3 rather than local
> to an instance. I suppose they still keep indices(GIST et al) on the
> local instance.
>
> (I still think it an interesting exercise to see what could be done
> connecting PostGIS to AWS SimpleDB services.)
>
> So thinking out loud here is a possible architecture–
>
> Basic permanent setup
>
> put raster in S3 – this may require some customization of Geoserver,
>
> build a datadir in a PostGIS and backup to S3
>
> create a private ami for Postgresql/PostGIS
>
> create a private ami for the load balancer instance
>
> create a private ami with your service stack for both a small and
> large instance for flexibility,
>
> Startup services
>
> start a balancer instance
>
> point your DNS CNAME to this balancer instance
>
> start a PostGis instance (you could have more than one if necessary
> but it would be easier to just scale to a larger instance type if the
> load demands it)
>
> have a scripted download from an S3 BU to your PostGIS datadir (I’m
> assuming a relatively static data resource)
>
> Variable services
>
> start service stack instance and connect to PostGIS
>
> update balancer to see new instance – this could be tricky
>
> repeat previous two steps as needed
>
> at night scale back – cron scaling for a known cycle or use a
> controller like weoceo to detect and respond to load fluctuation
>
> By the way the public AWS ami with the best resources that I have
> found is Ubuntu 7.10 Gutsy. The debian dependency tools are much
> easier to use and the resources are plentiful.
>
> I’ve been toying with using an AWS stack adapted for serving some
> larger Postgis vector sets such as fully connected census demographic
> data and block polygons here in US. The idea would be to populate the
> data directly from the census SF* and TIGER with a background Java
> bot. There are some potentially novel 3D viewing approaches possible
> with xaml. Anyway lots of fun to have access to virtual systems like
> this.
>
> As you can see I’m excited anyway.
>
> randy
>
> *From:* [hidden email]
> [mailto:[hidden email]] *On Behalf Of
> *[hidden email]
> *Sent:* Monday, February 18, 2008 6:35 PM
> *To:* OSGeo Discussions
> *Subject:* [OSGeo-Discuss] OS Spatial environment 'sizing'
>
>
> IMO:
>
>
> Hello everyone,
>
> I'm trying to get a feel for server 'sizing' for a **hypothetical**
> Corporate environment to support OS Spatial apps.
>
>
>
> Assume that:
>
> - this is a dedicated environment to allow the use of OS Spatial
> applications to serve Corporate OGC Services.
>
> - the applications of interest are GeoServer, Deegree, GeoNetwork,
> MapServer, MapGuide and Postgres/PostGIS.
>
> - the environment may need to scale relatively quickly.
>
> - it will be required to serve in the vicinty of 5 to 10 TB of data
> initially (WMS, WFS, WCS).
>
>
>
> Can anyone shed some light on the following questions please?
>
> - I'm assuming a Linux installation (SLES, Redhat or Debian) or
> possibly Intel Solaris. Has anyone experienced any issues in these (or
> other) environments that they'd like to share?
>
> - Are there any recommendations as to dedicated network bandwidth that
> should be allocated?
>
> - Has anyone done any work with load balancing and would like to share
> their experiences?
>
> - Of the above OS Spatial products, which ones could co-exist on the
> same server (excluding Postgres/PostGIS)?
>
>
> Any thoughts are appreciated.
>
>
> Bruce Bannerman
> Australia
>
> Notice:
> This email and any attachments may contain information that is
> personal, confidential,
> legally privileged and/or copyright. No part of it should be
> reproduced, adapted or communicated without the prior written consent
> of the copyright owner.
>
> It is the responsibility of the recipient to check for and remove viruses.
>
> If you have received this email in error, please notify the sender by
> return email, delete it from your system and destroy any copies. You
> are not authorised to use, communicate or rely on the information
> contained in this email.
>
> Please consider the environment before printing this email.
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Discuss mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/discuss
>  


--
Cameron Shorter
Geospatial Systems Architect
Tel: +61 (0)2 8570 5050
Mob: +61 (0)419 142 254

Think Globally, Fix Locally
Commercial Support for Geospatial Open Source Solutions
http://www.lisasoft.com/LISAsoft/SupportedProducts.html

_______________________________________________
Discuss mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/discuss
Reply | Threaded
Open this post in threaded view
|

Re: OS Spatial environment 'sizing'

Ivan Lucena
In reply to this post by Randy George
Hi Randy, Bruce,

That is a nice piece of advise Randy. I am sorry to intrude the
conversation but I would like to ask how that "heavy raster"
manipulation would be treated by PostgreSQL/PostGIS, managed or unmanaged?

Best regards,

Ivan

Randy George wrote:

> Hi Bruce,
>
>  
>
>                 On the “scale relatively quickly” front, you should look
> at Amazon’s EC2/S3 services. I’ve recently worked with it and find it an
> attractive platform for scaling http://www.cadmaps.com/gisblog
>
>  
>
> The stack I like is Ubuntu+Java+ Postgresql/PostGIS + Apache2 mod_jk
> Tomcat + Geoserver + custom SVG or XAML clients run out of Tomcat
>
>  
>
>                 If you use the larger instances the cost is higher but
> it sounds like you plan on some heavy raster services (WMS,WCS) and lots
> of memory will help.
>
> Small EC2 instance provides $0.10/hr:
>
> 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute
> Unit), 160 GB of instance storage, 32-bit platform
>
>  
>
> Large EC2 instances provide $0.40/hr:
>
> 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2
> Compute Units each), 850 GB of instance storage, 64-bit platform
>
>  
>
> Extra large EC2 instances $0.80/hr:
>
> 15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute
> Units each), 1690 GB of instance storage, 64-bit platform
>
>  
>
> Note: that the instances do not need to be permanent. Some people
> (WeoGeo) have been using a couple of failover small instances and then
> starting new large instances for specific requirements. The idea is to
> start and stop instances as required rather than having ongoing
> infrastructure costs. It only takes a minute or so to start an ec2
> instance. If you are running a corporate service there may be parts of
> the day with very little use so you just schedule your heavy duty
> instances for peak times. If you can connect your raster to S3 buckets
> rather than instance storage you have built in replicated backup.
>
>  
>
> I know that Java JAI can easily eat up memory and is core to Geoserver
> WMS/WCS so you probably want to look at large memory footprint for any
> platform with lots of raster service. I’m partial to Geoserver because
> of its Java foundation.  I think I would try to keep the Apache2 mod_jk
> Tomcat Geoserver on a separate server instance from PostGIS. This might
> avoid problems for instance startup since your database would need to be
> loaded separately. The instance ami resides in a 10G partition the
> balance of data will probably reside on a /mnt partition separate from
> ec2-run-instances. You may be able to avoid datadir problems by adding
> something like Elastra to the mix. Elastra beta is a wrapper for
> PostgreSql that puts the datadir on S3 rather than local to an instance.
> I suppose they still keep indices(GIST et al) on the local instance.
>
> (I still think it an interesting exercise to see what could be done
> connecting PostGIS to AWS SimpleDB services.)
>
>  
>
> So thinking out loud here is a possible architecture–
>
>     Basic permanent setup
>
> put raster in S3 – this may require some customization of Geoserver,
>
> build a datadir in a PostGIS and backup to S3
>
> create a private ami for Postgresql/PostGIS
>
> create a private ami for the load balancer instance
>
> create a private ami with your service stack for both a small and large
> instance for flexibility,
>
>    Startup services
>
> start a balancer instance
>
> point your DNS CNAME to this balancer instance
>
> start a PostGis instance (you could have more than one if necessary but
> it would be easier to just scale to a larger instance type if the load
> demands it)
>
> have a scripted download from an S3 BU to your PostGIS datadir (I’m
> assuming a relatively static data resource)
>
>    Variable services
>
> start service stack instance and connect to PostGIS
>
> update balancer to see new instance – this could be tricky
>
> repeat previous  two steps as needed
>
> at night scale back – cron scaling for a known cycle or use a controller
> like weoceo to detect and respond to load fluctuation
>
>  
>
> By the way the public AWS ami with the best resources that I have found
> is Ubuntu 7.10 Gutsy. The debian dependency tools are much easier to use
> and the resources are plentiful.
>
>  
>
> I’ve been toying with using an AWS stack adapted for serving some larger
> Postgis vector sets such as fully connected census demographic data and
> block polygons here in US. The idea would be to populate the data
> directly from the census SF* and TIGER with a background Java bot. There
> are some potentially novel 3D viewing approaches possible with xaml.
> Anyway lots of fun to have access to virtual systems like this.
>
>  
>
> As you can see I’m excited anyway.
>
>  
>
> randy
>
>  
>
>  
>
> *From:* [hidden email]
> [mailto:[hidden email]] *On Behalf Of
> *[hidden email]
> *Sent:* Monday, February 18, 2008 6:35 PM
> *To:* OSGeo Discussions
> *Subject:* [OSGeo-Discuss] OS Spatial environment 'sizing'
>
>  
>
>
> IMO:
>
>
> Hello everyone,
>
> I'm trying to get a feel for server 'sizing' for a **hypothetical**
> Corporate environment to support OS Spatial apps.
>
>
>
> Assume that:
>
> - this is a dedicated environment to allow the use of OS Spatial
> applications to serve Corporate OGC Services.
>
> - the applications of interest are GeoServer, Deegree, GeoNetwork,
> MapServer, MapGuide and Postgres/PostGIS.
>
> - the environment may need to scale relatively quickly.
>
> - it will be required to serve in the vicinty of 5 to 10 TB of data
> initially (WMS, WFS, WCS).
>
>
>
> Can anyone shed some light on the following questions please?
>
> - I'm assuming a Linux installation (SLES, Redhat or Debian) or possibly
> Intel Solaris. Has anyone experienced any issues in these (or other)
> environments that they'd like to share?
>
> - Are there any recommendations as to dedicated network bandwidth that
> should be allocated?
>
> - Has anyone done any work with load balancing and would like to share
> their experiences?
>
> - Of the above OS Spatial products, which ones could co-exist on the
> same server (excluding Postgres/PostGIS)?
>
>
> Any thoughts are appreciated.
>
>
> Bruce Bannerman
> Australia
>
> Notice:
> This email and any attachments may contain information that is personal,
> confidential,
> legally privileged and/or copyright. No part of it should be reproduced,
> adapted or communicated without the prior written consent of the
> copyright owner.
>
> It is the responsibility of the recipient to check for and remove viruses.
>
> If you have received this email in error, please notify the sender by
> return email, delete it from your system and destroy any copies. You are
> not authorised to use, communicate or rely on the information contained
> in this email.
>
> Please consider the environment before printing this email.
>
>  
>
>  
>
>  
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Discuss mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/discuss
_______________________________________________
Discuss mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/discuss
Reply | Threaded
Open this post in threaded view
|

RE: OS Spatial environment 'sizing'

Randy George
Hi Ivan,

        The most common advice I've seen says to leave raster out of the DB.
Of course footprints and meta data could be there, but you would want to
point Geoserver coverage to the image/image pyramid url somewhere in the
directory hierarchy.

Brent has a nice writeup here:
http://docs.codehaus.org/display/GEOSDOC/Load+NASA+Blue+Marble+Data

In an AWS sense my idea is to Java proxy the Geoserver Coverage Data URL to
S3 buckets and park the imagery over on the S3 side to take advantage of
stability and replication. Performance, though, might not be as good as a
local directory. Maybe a one time cache to a local directory would work
better.

Note: Amazon doesn't charge for inside AWS data transfers.

So in theory:
  PostGIS holds the footprint geometry + metadata
  EC2 Geoserver WFS handles footprint queries into an Svg/Xaml client, just
stick it on top of something like JPL BMNG. Once a user picks a coverage
switch to the Geoserver WMS/WCS service for zooming around in the selected
image pyramid
  S3 buckets contain the tiffs, pyramids ...
  EC2 Geoserver handles WMS/WCS service
  EC2 proxy pulls the imagery from the S3 side as needed

Sorry I haven't had time to try this so it is just theoretical. Of course
you can go traditional and just keep the coverage imagery files on the local
instance avoiding the S3 proxy idea. The reason I don't like that idea is
the imagery has to be loaded with every instance creation while an S3
approach would need only one copy.


randy

-----Original Message-----
From: Lucena, Ivan [mailto:[hidden email]]
Sent: Tuesday, February 19, 2008 2:59 PM
To: [hidden email]; OSGeo Discussions
Subject: Re: [OSGeo-Discuss] OS Spatial environment 'sizing'

Hi Randy, Bruce,

That is a nice piece of advise Randy. I am sorry to intrude the
conversation but I would like to ask how that "heavy raster"
manipulation would be treated by PostgreSQL/PostGIS, managed or unmanaged?

Best regards,

Ivan

Randy George wrote:

> Hi Bruce,
>
>  
>
>                 On the "scale relatively quickly" front, you should look
> at Amazon's EC2/S3 services. I've recently worked with it and find it an
> attractive platform for scaling http://www.cadmaps.com/gisblog
>
>  
>
> The stack I like is Ubuntu+Java+ Postgresql/PostGIS + Apache2 mod_jk
> Tomcat + Geoserver + custom SVG or XAML clients run out of Tomcat
>
>  
>
>                 If you use the larger instances the cost is higher but
> it sounds like you plan on some heavy raster services (WMS,WCS) and lots
> of memory will help.
>
> Small EC2 instance provides $0.10/hr:
>
> 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute
> Unit), 160 GB of instance storage, 32-bit platform
>
>  
>
> Large EC2 instances provide $0.40/hr:
>
> 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2
> Compute Units each), 850 GB of instance storage, 64-bit platform
>
>  
>
> Extra large EC2 instances $0.80/hr:
>
> 15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute
> Units each), 1690 GB of instance storage, 64-bit platform
>
>  
>
> Note: that the instances do not need to be permanent. Some people
> (WeoGeo) have been using a couple of failover small instances and then
> starting new large instances for specific requirements. The idea is to
> start and stop instances as required rather than having ongoing
> infrastructure costs. It only takes a minute or so to start an ec2
> instance. If you are running a corporate service there may be parts of
> the day with very little use so you just schedule your heavy duty
> instances for peak times. If you can connect your raster to S3 buckets
> rather than instance storage you have built in replicated backup.
>
>  
>
> I know that Java JAI can easily eat up memory and is core to Geoserver
> WMS/WCS so you probably want to look at large memory footprint for any
> platform with lots of raster service. I'm partial to Geoserver because
> of its Java foundation.  I think I would try to keep the Apache2 mod_jk
> Tomcat Geoserver on a separate server instance from PostGIS. This might
> avoid problems for instance startup since your database would need to be
> loaded separately. The instance ami resides in a 10G partition the
> balance of data will probably reside on a /mnt partition separate from
> ec2-run-instances. You may be able to avoid datadir problems by adding
> something like Elastra to the mix. Elastra beta is a wrapper for
> PostgreSql that puts the datadir on S3 rather than local to an instance.
> I suppose they still keep indices(GIST et al) on the local instance.
>
> (I still think it an interesting exercise to see what could be done
> connecting PostGIS to AWS SimpleDB services.)
>
>  
>
> So thinking out loud here is a possible architecture-
>
>     Basic permanent setup
>
> put raster in S3 - this may require some customization of Geoserver,
>
> build a datadir in a PostGIS and backup to S3
>
> create a private ami for Postgresql/PostGIS
>
> create a private ami for the load balancer instance
>
> create a private ami with your service stack for both a small and large
> instance for flexibility,
>
>    Startup services
>
> start a balancer instance
>
> point your DNS CNAME to this balancer instance
>
> start a PostGis instance (you could have more than one if necessary but
> it would be easier to just scale to a larger instance type if the load
> demands it)
>
> have a scripted download from an S3 BU to your PostGIS datadir (I'm
> assuming a relatively static data resource)
>
>    Variable services
>
> start service stack instance and connect to PostGIS
>
> update balancer to see new instance - this could be tricky
>
> repeat previous  two steps as needed
>
> at night scale back - cron scaling for a known cycle or use a controller
> like weoceo to detect and respond to load fluctuation
>
>  
>
> By the way the public AWS ami with the best resources that I have found
> is Ubuntu 7.10 Gutsy. The debian dependency tools are much easier to use
> and the resources are plentiful.
>
>  
>
> I've been toying with using an AWS stack adapted for serving some larger
> Postgis vector sets such as fully connected census demographic data and
> block polygons here in US. The idea would be to populate the data
> directly from the census SF* and TIGER with a background Java bot. There
> are some potentially novel 3D viewing approaches possible with xaml.
> Anyway lots of fun to have access to virtual systems like this.
>
>  
>
> As you can see I'm excited anyway.
>
>  
>
> randy
>
>  
>
>  
>
> *From:* [hidden email]
> [mailto:[hidden email]] *On Behalf Of
> *[hidden email]
> *Sent:* Monday, February 18, 2008 6:35 PM
> *To:* OSGeo Discussions
> *Subject:* [OSGeo-Discuss] OS Spatial environment 'sizing'
>
>  
>
>
> IMO:
>
>
> Hello everyone,
>
> I'm trying to get a feel for server 'sizing' for a **hypothetical**
> Corporate environment to support OS Spatial apps.
>
>
>
> Assume that:
>
> - this is a dedicated environment to allow the use of OS Spatial
> applications to serve Corporate OGC Services.
>
> - the applications of interest are GeoServer, Deegree, GeoNetwork,
> MapServer, MapGuide and Postgres/PostGIS.
>
> - the environment may need to scale relatively quickly.
>
> - it will be required to serve in the vicinty of 5 to 10 TB of data
> initially (WMS, WFS, WCS).
>
>
>
> Can anyone shed some light on the following questions please?
>
> - I'm assuming a Linux installation (SLES, Redhat or Debian) or possibly
> Intel Solaris. Has anyone experienced any issues in these (or other)
> environments that they'd like to share?
>
> - Are there any recommendations as to dedicated network bandwidth that
> should be allocated?
>
> - Has anyone done any work with load balancing and would like to share
> their experiences?
>
> - Of the above OS Spatial products, which ones could co-exist on the
> same server (excluding Postgres/PostGIS)?
>
>
> Any thoughts are appreciated.
>
>
> Bruce Bannerman
> Australia
>
> Notice:
> This email and any attachments may contain information that is personal,
> confidential,
> legally privileged and/or copyright. No part of it should be reproduced,
> adapted or communicated without the prior written consent of the
> copyright owner.
>
> It is the responsibility of the recipient to check for and remove viruses.
>
> If you have received this email in error, please notify the sender by
> return email, delete it from your system and destroy any copies. You are
> not authorised to use, communicate or rely on the information contained
> in this email.
>
> Please consider the environment before printing this email.
>
>  
>
>  
>
>  
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Discuss mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/discuss

_______________________________________________
Discuss mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/discuss
Reply | Threaded
Open this post in threaded view
|

RE: OS Spatial environment 'sizing' + Image Management

Bruce.Bannerman

IMO:


Hi Randy,

Thank you for your informative post. It has given me a lot to follow up on and think about.

I can see an immediate need that this type of solution could well be used for. I like it.

I suspect that in many larger corporate types of environments, it could well be used effectively for 'pilot' and 'pre-production' type tasks.

For 'production' type environments, there would be issues of integrating an external service hosting spatial data with internal services hosting corporate aspatial data sources and applications.



with regards to storing imagery in a database:

<rant>       (and not directed at you)

I've also seen a lot of reports suggesting that image management should be file based.

My personal preference is to use a database if possible, so that I can take advantage of corporate data management facilities, backups, point in time restores etc.

I've managed 70 GB orthophoto mosaics in ArcSDE / Oracle before with minimal problems. I found performance and response times to be comparable with other image web server options on the market that use file based solutions for storing data.

Ideally, I'm looking to manage state wide mosaics with a consistant look and feel that can be treated as a single 'layer' by client GIS / Remote Sensing applications (data integrity issues allowing).

One potential use is 'best available' data mosaics could undergo regular updates as more imagery is flown or captured. A database makes it easier to manage and deliver such data.

My definition of 'imagery' goes beyond aerial photographs and includes multi or hyper-spectral imagery; various geophysics data sources such as aeromagnetics, gravity, radiometrics; radar data etc.

Typically this data is required for digital image analysis purposes using a remote sensing application, so the integrity of 'the numbers' that make up the image is very important.

Many of today's image based solutions use a (lossy) wavelet compression that can corrupt the integrity of 'the numbers' describing the radiometric data in the image.

When we consider the big picture issues facing us today, such as Climate Change, I think that it is important to protect our definitive image libraries from such corruption as they will be invaluable sources of data for future multi-temporal analysis.

That said, if the end use is just for a picture, then a wavelet compression is a good option. Just protect the source data for future use.

</rant>      


So, does anyone know of a good open source spatial solution for storing and accessing (multi and hyperspectral) imagery in a database?    ;-)

WMS 1.3 and WCS are showing promise for serving imagery, including multi and hyperspectral data.



Bruce Bannerman





[hidden email] wrote on 20/02/2008 10:09:28 AM:

> Hi Ivan,
>
>    The most common advice I've seen says to leave raster out of the DB.
> Of course footprints and meta data could be there, but you would want to
> point Geoserver coverage to the image/image pyramid url somewhere in the
> directory hierarchy.
>
> Brent has a nice writeup here:
> http://docs.codehaus.org/display/GEOSDOC/Load+NASA+Blue+Marble+Data
>
> In an AWS sense my idea is to Java proxy the Geoserver Coverage Data URL to
> S3 buckets and park the imagery over on the S3 side to take advantage of
> stability and replication. Performance, though, might not be as good as a
> local directory. Maybe a one time cache to a local directory would work
> better.
>
> Note: Amazon doesn't charge for inside AWS data transfers.
>
> So in theory:
>   PostGIS holds the footprint geometry + metadata
>   EC2 Geoserver WFS handles footprint queries into an Svg/Xaml client, just
> stick it on top of something like JPL BMNG. Once a user picks a coverage
> switch to the Geoserver WMS/WCS service for zooming around in the selected
> image pyramid
>   S3 buckets contain the tiffs, pyramids ...
>   EC2 Geoserver handles WMS/WCS service
>   EC2 proxy pulls the imagery from the S3 side as needed
>
> Sorry I haven't had time to try this so it is just theoretical. Of course
> you can go traditional and just keep the coverage imagery files on the local
> instance avoiding the S3 proxy idea. The reason I don't like that idea is
> the imagery has to be loaded with every instance creation while an S3
> approach would need only one copy.
>
>
> randy
>
> -----Original Message-----
> From: Lucena, Ivan [mailto:[hidden email]]
> Sent: Tuesday, February 19, 2008 2:59 PM
> To: [hidden email]; OSGeo Discussions
> Subject: Re: [OSGeo-Discuss] OS Spatial environment 'sizing'
>
> Hi Randy, Bruce,
>
> That is a nice piece of advise Randy. I am sorry to intrude the
> conversation but I would like to ask how that "heavy raster"
> manipulation would be treated by PostgreSQL/PostGIS, managed or unmanaged?
>
> Best regards,
>
> Ivan
>
> Randy George wrote:
> > Hi Bruce,
> >
> >  
> >
> >                 On the "scale relatively quickly" front, you should look
> > at Amazon's EC2/S3 services. I've recently worked with it and find it an
> > attractive platform for scaling http://www.cadmaps.com/gisblog
> >
> >  
> >
> > The stack I like is Ubuntu+Java+ Postgresql/PostGIS + Apache2 mod_jk
> > Tomcat + Geoserver + custom SVG or XAML clients run out of Tomcat
> >
> >  
> >
> >                 If you use the larger instances the cost is higher but
> > it sounds like you plan on some heavy raster services (WMS,WCS) and lots
> > of memory will help.
> >
> > Small EC2 instance provides $0.10/hr:
> >
> > 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute
> > Unit), 160 GB of instance storage, 32-bit platform
> >
> >  
> >
> > Large EC2 instances provide $0.40/hr:
> >
> > 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2
> > Compute Units each), 850 GB of instance storage, 64-bit platform
> >
> >  
> >
> > Extra large EC2 instances $0.80/hr:
> >
> > 15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute
> > Units each), 1690 GB of instance storage, 64-bit platform
> >
> >  
> >
> > Note: that the instances do not need to be permanent. Some people
> > (WeoGeo) have been using a couple of failover small instances and then
> > starting new large instances for specific requirements. The idea is to
> > start and stop instances as required rather than having ongoing
> > infrastructure costs. It only takes a minute or so to start an ec2
> > instance. If you are running a corporate service there may be parts of
> > the day with very little use so you just schedule your heavy duty
> > instances for peak times. If you can connect your raster to S3 buckets
> > rather than instance storage you have built in replicated backup.
> >
> >  
> >
> > I know that Java JAI can easily eat up memory and is core to Geoserver
> > WMS/WCS so you probably want to look at large memory footprint for any
> > platform with lots of raster service. I'm partial to Geoserver because
> > of its Java foundation.  I think I would try to keep the Apache2 mod_jk
> > Tomcat Geoserver on a separate server instance from PostGIS. This might
> > avoid problems for instance startup since your database would need to be
> > loaded separately. The instance ami resides in a 10G partition the
> > balance of data will probably reside on a /mnt partition separate from
> > ec2-run-instances. You may be able to avoid datadir problems by adding
> > something like Elastra to the mix. Elastra beta is a wrapper for
> > PostgreSql that puts the datadir on S3 rather than local to an instance.
> > I suppose they still keep indices(GIST et al) on the local instance.
> >
> > (I still think it an interesting exercise to see what could be done
> > connecting PostGIS to AWS SimpleDB services.)
> >
> >  
> >
> > So thinking out loud here is a possible architecture-
> >
> >     Basic permanent setup
> >
> > put raster in S3 - this may require some customization of Geoserver,
> >
> > build a datadir in a PostGIS and backup to S3
> >
> > create a private ami for Postgresql/PostGIS
> >
> > create a private ami for the load balancer instance
> >
> > create a private ami with your service stack for both a small and large
> > instance for flexibility,
> >
> >    Startup services
> >
> > start a balancer instance
> >
> > point your DNS CNAME to this balancer instance
> >
> > start a PostGis instance (you could have more than one if necessary but
> > it would be easier to just scale to a larger instance type if the load
> > demands it)
> >
> > have a scripted download from an S3 BU to your PostGIS datadir (I'm
> > assuming a relatively static data resource)
> >
> >    Variable services
> >
> > start service stack instance and connect to PostGIS
> >
> > update balancer to see new instance - this could be tricky
> >
> > repeat previous  two steps as needed
> >
> > at night scale back - cron scaling for a known cycle or use a controller
> > like weoceo to detect and respond to load fluctuation
> >
> >  
> >
> > By the way the public AWS ami with the best resources that I have found
> > is Ubuntu 7.10 Gutsy. The debian dependency tools are much easier to use
> > and the resources are plentiful.
> >
> >  
> >
> > I've been toying with using an AWS stack adapted for serving some larger
> > Postgis vector sets such as fully connected census demographic data and
> > block polygons here in US. The idea would be to populate the data
> > directly from the census SF* and TIGER with a background Java bot. There
> > are some potentially novel 3D viewing approaches possible with xaml.
> > Anyway lots of fun to have access to virtual systems like this.
> >
> >  
> >
> > As you can see I'm excited anyway.
> >
> >  
> >
> > randy
> >
> >  
> >
> >  
> >
> > *From:* [hidden email]
> > [mailto:[hidden email]] *On Behalf Of
> > *[hidden email]
> > *Sent:* Monday, February 18, 2008 6:35 PM
> > *To:* OSGeo Discussions
> > *Subject:* [OSGeo-Discuss] OS Spatial environment 'sizing'
> >
> >  
> >
> >
> > IMO:
> >
> >
> > Hello everyone,
> >
> > I'm trying to get a feel for server 'sizing' for a **hypothetical**
> > Corporate environment to support OS Spatial apps.
> >
> >
> >
> > Assume that:
> >
> > - this is a dedicated environment to allow the use of OS Spatial
> > applications to serve Corporate OGC Services.
> >
> > - the applications of interest are GeoServer, Deegree, GeoNetwork,
> > MapServer, MapGuide and Postgres/PostGIS.
> >
> > - the environment may need to scale relatively quickly.
> >
> > - it will be required to serve in the vicinty of 5 to 10 TB of data
> > initially (WMS, WFS, WCS).
> >
> >
> >
> > Can anyone shed some light on the following questions please?
> >
> > - I'm assuming a Linux installation (SLES, Redhat or Debian) or possibly
> > Intel Solaris. Has anyone experienced any issues in these (or other)
> > environments that they'd like to share?
> >
> > - Are there any recommendations as to dedicated network bandwidth that
> > should be allocated?
> >
> > - Has anyone done any work with load balancing and would like to share
> > their experiences?
> >
> > - Of the above OS Spatial products, which ones could co-exist on the
> > same server (excluding Postgres/PostGIS)?
> >
> >
> > Any thoughts are appreciated.
> >
> >
> > Bruce Bannerman
> > Australia
> >
> > Notice:
> > This email and any attachments may contain information that is personal,
> > confidential,
> > legally privileged and/or copyright. No part of it should be reproduced,
> > adapted or communicated without the prior written consent of the
> > copyright owner.
> >
> > It is the responsibility of the recipient to check for and remove viruses.
> >
> > If you have received this email in error, please notify the sender by
> > return email, delete it from your system and destroy any copies. You are
> > not authorised to use, communicate or rely on the information contained
> > in this email.
> >
> > Please consider the environment before printing this email.
> >
> >  
> >
> >  
> >
> >  
> >
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Discuss mailing list
> > [hidden email]
> > http://lists.osgeo.org/mailman/listinfo/discuss
>
> _______________________________________________
> Discuss mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/discuss

_______________________________________________
Discuss mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/discuss
Reply | Threaded
Open this post in threaded view
|

Re: OS Spatial environment 'sizing' + Image Management

Ivan Lucena
Hi Bruce,

Here I am again...

Randy suggestion are pretty valuable and very well based but I have a
especial interest on storing raster on databases so that is why I asked
about it.

Yes, raster is chunky and not very fluid but I love to hear from
successful experience like Bruce's. And as Bruce also mentioned
analytical process often needs to query on cell space rather than bands.

Remember that decades ago some of us would be discussing the
disadvantage of storing "vector" on databases now it is the norm for
client/server application.

Bruce mentioned SDE and Oracle, but what are the *open source* options
to do *image management* on open source databases and who is using it?

I can only think of two, the PostGIS CHIP datatype and Terralib schemas
(MySQL, PostgreSQL, and commercial RDBMS) but I don't know of any
*sizable* project that is using then.

Does anybody know and would like to share?

Best regards,

Ivan


[hidden email] wrote:

>
> IMO:
>
>
> Hi Randy,
>
> Thank you for your informative post. It has given me a lot to follow up
> on and think about.
>
> I can see an immediate need that this type of solution could well be
> used for. I like it.
>
> I suspect that in many larger corporate types of environments, it could
> well be used effectively for 'pilot' and 'pre-production' type tasks.
>
> For 'production' type environments, there would be issues of integrating
> an external service hosting spatial data with internal services hosting
> corporate aspatial data sources and applications.
>
>
>
> with regards to storing imagery in a database:
>
> <rant>       (and not directed at you)
>
> I've also seen a lot of reports suggesting that image management should
> be file based.
>
> My personal preference is to use a database if possible, so that I can
> take advantage of corporate data management facilities, backups, point
> in time restores etc.
>
> I've managed 70 GB orthophoto mosaics in ArcSDE / Oracle before with
> minimal problems. I found performance and response times to be
> comparable with other image web server options on the market that use
> file based solutions for storing data.
>
> Ideally, I'm looking to manage state wide mosaics with a consistant look
> and feel that can be treated as a single 'layer' by client GIS / Remote
> Sensing applications (data integrity issues allowing).
>
> One potential use is 'best available' data mosaics could undergo regular
> updates as more imagery is flown or captured. A database makes it easier
> to manage and deliver such data.
>
> My definition of 'imagery' goes beyond aerial photographs and includes
> multi or hyper-spectral imagery; various geophysics data sources such as
> aeromagnetics, gravity, radiometrics; radar data etc.
>
> Typically this data is required for digital image analysis purposes
> using a remote sensing application, so the integrity of 'the numbers'
> that make up the image is very important.
>
> Many of today's image based solutions use a (lossy) wavelet compression
> that can corrupt the integrity of 'the numbers' describing the
> radiometric data in the image.
>
> When we consider the big picture issues facing us today, such as Climate
> Change, I think that it is important to protect our definitive image
> libraries from such corruption as they will be invaluable sources of
> data for future multi-temporal analysis.
>
> That said, if the end use is just for a picture, then a wavelet
> compression is a good option. Just protect the source data for future use.
>
> </rant>      
>
>
> So, does anyone know of a good open source spatial solution for storing
> and accessing (multi and hyperspectral) imagery in a database?    ;-)
>
> WMS 1.3 and WCS are showing promise for serving imagery, including multi
> and hyperspectral data.
>
>
>
> Bruce Bannerman
>
>
>
>
>
> [hidden email] wrote on 20/02/2008 10:09:28 AM:
>
>  > Hi Ivan,
>  >
>  >    The most common advice I've seen says to leave raster out of the DB.
>  > Of course footprints and meta data could be there, but you would want to
>  > point Geoserver coverage to the image/image pyramid url somewhere in the
>  > directory hierarchy.
>  >
>  > Brent has a nice writeup here:
>  > http://docs.codehaus.org/display/GEOSDOC/Load+NASA+Blue+Marble+Data
>  >
>  > In an AWS sense my idea is to Java proxy the Geoserver Coverage Data
> URL to
>  > S3 buckets and park the imagery over on the S3 side to take advantage of
>  > stability and replication. Performance, though, might not be as good as a
>  > local directory. Maybe a one time cache to a local directory would work
>  > better.
>  >
>  > Note: Amazon doesn't charge for inside AWS data transfers.
>  >
>  > So in theory:
>  >   PostGIS holds the footprint geometry + metadata
>  >   EC2 Geoserver WFS handles footprint queries into an Svg/Xaml
> client, just
>  > stick it on top of something like JPL BMNG. Once a user picks a coverage
>  > switch to the Geoserver WMS/WCS service for zooming around in the
> selected
>  > image pyramid
>  >   S3 buckets contain the tiffs, pyramids ...
>  >   EC2 Geoserver handles WMS/WCS service
>  >   EC2 proxy pulls the imagery from the S3 side as needed
>  >
>  > Sorry I haven't had time to try this so it is just theoretical. Of course
>  > you can go traditional and just keep the coverage imagery files on
> the local
>  > instance avoiding the S3 proxy idea. The reason I don't like that idea is
>  > the imagery has to be loaded with every instance creation while an S3
>  > approach would need only one copy.
>  >
>  >
>  > randy
>  >
>  > -----Original Message-----
>  > From: Lucena, Ivan [mailto:[hidden email]]
>  > Sent: Tuesday, February 19, 2008 2:59 PM
>  > To: [hidden email]; OSGeo Discussions
>  > Subject: Re: [OSGeo-Discuss] OS Spatial environment 'sizing'
>  >
>  > Hi Randy, Bruce,
>  >
>  > That is a nice piece of advise Randy. I am sorry to intrude the
>  > conversation but I would like to ask how that "heavy raster"
>  > manipulation would be treated by PostgreSQL/PostGIS, managed or
> unmanaged?
>  >
>  > Best regards,
>  >
>  > Ivan
>  >
>  > Randy George wrote:
>  > > Hi Bruce,
>  > >
>  > >  
>  > >
>  > >                 On the "scale relatively quickly" front, you should
> look
>  > > at Amazon's EC2/S3 services. I've recently worked with it and find
> it an
>  > > attractive platform for scaling http://www.cadmaps.com/gisblog
>  > >
>  > >  
>  > >
>  > > The stack I like is Ubuntu+Java+ Postgresql/PostGIS + Apache2 mod_jk
>  > > Tomcat + Geoserver + custom SVG or XAML clients run out of Tomcat
>  > >
>  > >  
>  > >
>  > >                 If you use the larger instances the cost is higher but
>  > > it sounds like you plan on some heavy raster services (WMS,WCS) and
> lots
>  > > of memory will help.
>  > >
>  > > Small EC2 instance provides $0.10/hr:
>  > >
>  > > 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2
> Compute
>  > > Unit), 160 GB of instance storage, 32-bit platform
>  > >
>  > >  
>  > >
>  > > Large EC2 instances provide $0.40/hr:
>  > >
>  > > 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2
>  > > Compute Units each), 850 GB of instance storage, 64-bit platform
>  > >
>  > >  
>  > >
>  > > Extra large EC2 instances $0.80/hr:
>  > >
>  > > 15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2
> Compute
>  > > Units each), 1690 GB of instance storage, 64-bit platform
>  > >
>  > >  
>  > >
>  > > Note: that the instances do not need to be permanent. Some people
>  > > (WeoGeo) have been using a couple of failover small instances and then
>  > > starting new large instances for specific requirements. The idea is to
>  > > start and stop instances as required rather than having ongoing
>  > > infrastructure costs. It only takes a minute or so to start an ec2
>  > > instance. If you are running a corporate service there may be parts of
>  > > the day with very little use so you just schedule your heavy duty
>  > > instances for peak times. If you can connect your raster to S3 buckets
>  > > rather than instance storage you have built in replicated backup.
>  > >
>  > >  
>  > >
>  > > I know that Java JAI can easily eat up memory and is core to Geoserver
>  > > WMS/WCS so you probably want to look at large memory footprint for any
>  > > platform with lots of raster service. I'm partial to Geoserver because
>  > > of its Java foundation.  I think I would try to keep the Apache2
> mod_jk
>  > > Tomcat Geoserver on a separate server instance from PostGIS. This
> might
>  > > avoid problems for instance startup since your database would need
> to be
>  > > loaded separately. The instance ami resides in a 10G partition the
>  > > balance of data will probably reside on a /mnt partition separate from
>  > > ec2-run-instances. You may be able to avoid datadir problems by adding
>  > > something like Elastra to the mix. Elastra beta is a wrapper for
>  > > PostgreSql that puts the datadir on S3 rather than local to an
> instance.
>  > > I suppose they still keep indices(GIST et al) on the local instance.
>  > >
>  > > (I still think it an interesting exercise to see what could be done
>  > > connecting PostGIS to AWS SimpleDB services.)
>  > >
>  > >  
>  > >
>  > > So thinking out loud here is a possible architecture-
>  > >
>  > >     Basic permanent setup
>  > >
>  > > put raster in S3 - this may require some customization of Geoserver,
>  > >
>  > > build a datadir in a PostGIS and backup to S3
>  > >
>  > > create a private ami for Postgresql/PostGIS
>  > >
>  > > create a private ami for the load balancer instance
>  > >
>  > > create a private ami with your service stack for both a small and
> large
>  > > instance for flexibility,
>  > >
>  > >    Startup services
>  > >
>  > > start a balancer instance
>  > >
>  > > point your DNS CNAME to this balancer instance
>  > >
>  > > start a PostGis instance (you could have more than one if necessary
> but
>  > > it would be easier to just scale to a larger instance type if the load
>  > > demands it)
>  > >
>  > > have a scripted download from an S3 BU to your PostGIS datadir (I'm
>  > > assuming a relatively static data resource)
>  > >
>  > >    Variable services
>  > >
>  > > start service stack instance and connect to PostGIS
>  > >
>  > > update balancer to see new instance - this could be tricky
>  > >
>  > > repeat previous  two steps as needed
>  > >
>  > > at night scale back - cron scaling for a known cycle or use a
> controller
>  > > like weoceo to detect and respond to load fluctuation
>  > >
>  > >  
>  > >
>  > > By the way the public AWS ami with the best resources that I have
> found
>  > > is Ubuntu 7.10 Gutsy. The debian dependency tools are much easier
> to use
>  > > and the resources are plentiful.
>  > >
>  > >  
>  > >
>  > > I've been toying with using an AWS stack adapted for serving some
> larger
>  > > Postgis vector sets such as fully connected census demographic data
> and
>  > > block polygons here in US. The idea would be to populate the data
>  > > directly from the census SF* and TIGER with a background Java bot.
> There
>  > > are some potentially novel 3D viewing approaches possible with xaml.
>  > > Anyway lots of fun to have access to virtual systems like this.
>  > >
>  > >  
>  > >
>  > > As you can see I'm excited anyway.
>  > >
>  > >  
>  > >
>  > > randy
>  > >
>  > >  
>  > >
>  > >  
>  > >
>  > > *From:* [hidden email]
>  > > [mailto:[hidden email]] *On Behalf Of
>  > > *[hidden email]
>  > > *Sent:* Monday, February 18, 2008 6:35 PM
>  > > *To:* OSGeo Discussions
>  > > *Subject:* [OSGeo-Discuss] OS Spatial environment 'sizing'
>  > >
>  > >  
>  > >
>  > >
>  > > IMO:
>  > >
>  > >
>  > > Hello everyone,
>  > >
>  > > I'm trying to get a feel for server 'sizing' for a **hypothetical**
>  > > Corporate environment to support OS Spatial apps.
>  > >
>  > >
>  > >
>  > > Assume that:
>  > >
>  > > - this is a dedicated environment to allow the use of OS Spatial
>  > > applications to serve Corporate OGC Services.
>  > >
>  > > - the applications of interest are GeoServer, Deegree, GeoNetwork,
>  > > MapServer, MapGuide and Postgres/PostGIS.
>  > >
>  > > - the environment may need to scale relatively quickly.
>  > >
>  > > - it will be required to serve in the vicinty of 5 to 10 TB of data
>  > > initially (WMS, WFS, WCS).
>  > >
>  > >
>  > >
>  > > Can anyone shed some light on the following questions please?
>  > >
>  > > - I'm assuming a Linux installation (SLES, Redhat or Debian) or
> possibly
>  > > Intel Solaris. Has anyone experienced any issues in these (or other)
>  > > environments that they'd like to share?
>  > >
>  > > - Are there any recommendations as to dedicated network bandwidth that
>  > > should be allocated?
>  > >
>  > > - Has anyone done any work with load balancing and would like to share
>  > > their experiences?
>  > >
>  > > - Of the above OS Spatial products, which ones could co-exist on the
>  > > same server (excluding Postgres/PostGIS)?
>  > >
>  > >
>  > > Any thoughts are appreciated.
>  > >
>  > >
>  > > Bruce Bannerman
>  > > Australia
>  > >
>  > > Notice:
>  > > This email and any attachments may contain information that is
> personal,
>  > > confidential,
>  > > legally privileged and/or copyright. No part of it should be
> reproduced,
>  > > adapted or communicated without the prior written consent of the
>  > > copyright owner.
>  > >
>  > > It is the responsibility of the recipient to check for and remove
> viruses.
>  > >
>  > > If you have received this email in error, please notify the sender by
>  > > return email, delete it from your system and destroy any copies.
> You are
>  > > not authorised to use, communicate or rely on the information
> contained
>  > > in this email.
>  > >
>  > > Please consider the environment before printing this email.
>  > >
>  > >  
>  > >
>  > >  
>  > >
>  > >  
>  > >
>  > >
>  > >
> ------------------------------------------------------------------------
>  > >
>  > > _______________________________________________
>  > > Discuss mailing list
>  > > [hidden email]
>  > > http://lists.osgeo.org/mailman/listinfo/discuss
>  >
>  > _______________________________________________
>  > Discuss mailing list
>  > [hidden email]
>  > http://lists.osgeo.org/mailman/listinfo/discuss
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Discuss mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/discuss
_______________________________________________
Discuss mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/discuss
Reply | Threaded
Open this post in threaded view
|

RE: OS Spatial environment 'sizing' + Image Management

Randy George
Hi Ivan and Bruce,

        Interesting, other than using JAI a bit on Space Imaging data (this
was awhile back) I have been mostly using vectors.

        I am curious to know what advantage an arcSDE/Oracle stack would
provide on image storage. I had understood imagery was simply stored as
large blob fields and streamed in and out of the DB where it is
processed/viewed etc. The original state I had understood was unchanged
(lossy, wavelet, pk or otherwise happening outside the DB), just residing in
the DB directory rather than the disk hierarchy. Other than possible table
corruption issues I imagined that the overhead for streaming a blob into an
image object was the only real concern on DB storage.

But I'm getting the idea that something a bit more is going on. Does the
image actually get retiled (celled) and then stored in multiple fields? Is a
multispectral broken into bands first before storing in separate fields?
CHIP sounds more like an additional database function to optimize chipping
inside a DBTable so that an entire image doesn't have to be read just to
grab a small viewbox. Does arcSDE add similar functions to the base DB or
does it just grab out an image and chip, threshold, convolute, histogram,
etc after the fact?

I'm just curious since I've been fascinated with the prospects of
hyperspectral imagery.

>From an AWS perspective very large imagery would need some type of tiling
since there is a 5Gb limit on S3 objects. Larger objects are typically tar
gzipped and split before storage. It is hard to imagine a tiling scheme that
large anyway. For example Google's Digital Globe tiling pyramid uses
miniscule tiles at 256x256 compressed to approx 18kb/tile
http://kh.google.com/kh?v=3&t=trtsqtqsqqqt
http://www.cadmaps.com/gisblog/?p=7

>From a web perspective analysis could proceed along a highly tiled approach.
So the original 70Gb image becomes a tiled pyramid with the browser view
changing position inside the image pyramid. Small patches flow in and out of
the view with each zoom and pan. Analysis, WPS, adds some complexity since
things like convolution algorithms need to be rewritten to take into account
tile boundaries. Or, alternatively the viewbox is re-mosaiced before running
a server side convolution that is subsequently streamed back to the browser
view, not extremely fast. Hyper-spectral bands would reside in separate tile
pyramids so that Boolean layer operations could proceed server side for
viewing at the browser. Analysis really can't take advantage of predigested
read only schemes like Google's since the whole point is to create new
images from combinations of image bands. Consequently WPS seems to be moving
slower than WMS, WCS, WFS

Thanks
Randy


-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Lucena, Ivan
Sent: Tuesday, February 19, 2008 10:16 PM
To: OSGeo Discussions
Subject: Re: [OSGeo-Discuss] OS Spatial environment 'sizing' + Image
Management

Hi Bruce,

Here I am again...

Randy suggestion are pretty valuable and very well based but I have a
especial interest on storing raster on databases so that is why I asked
about it.

Yes, raster is chunky and not very fluid but I love to hear from
successful experience like Bruce's. And as Bruce also mentioned
analytical process often needs to query on cell space rather than bands.

Remember that decades ago some of us would be discussing the
disadvantage of storing "vector" on databases now it is the norm for
client/server application.

Bruce mentioned SDE and Oracle, but what are the *open source* options
to do *image management* on open source databases and who is using it?

I can only think of two, the PostGIS CHIP datatype and Terralib schemas
(MySQL, PostgreSQL, and commercial RDBMS) but I don't know of any
*sizable* project that is using then.

Does anybody know and would like to share?

Best regards,

Ivan


[hidden email] wrote:

>
> IMO:
>
>
> Hi Randy,
>
> Thank you for your informative post. It has given me a lot to follow up
> on and think about.
>
> I can see an immediate need that this type of solution could well be
> used for. I like it.
>
> I suspect that in many larger corporate types of environments, it could
> well be used effectively for 'pilot' and 'pre-production' type tasks.
>
> For 'production' type environments, there would be issues of integrating
> an external service hosting spatial data with internal services hosting
> corporate aspatial data sources and applications.
>
>
>
> with regards to storing imagery in a database:
>
> <rant>       (and not directed at you)
>
> I've also seen a lot of reports suggesting that image management should
> be file based.
>
> My personal preference is to use a database if possible, so that I can
> take advantage of corporate data management facilities, backups, point
> in time restores etc.
>
> I've managed 70 GB orthophoto mosaics in ArcSDE / Oracle before with
> minimal problems. I found performance and response times to be
> comparable with other image web server options on the market that use
> file based solutions for storing data.
>
> Ideally, I'm looking to manage state wide mosaics with a consistant look
> and feel that can be treated as a single 'layer' by client GIS / Remote
> Sensing applications (data integrity issues allowing).
>
> One potential use is 'best available' data mosaics could undergo regular
> updates as more imagery is flown or captured. A database makes it easier
> to manage and deliver such data.
>
> My definition of 'imagery' goes beyond aerial photographs and includes
> multi or hyper-spectral imagery; various geophysics data sources such as
> aeromagnetics, gravity, radiometrics; radar data etc.
>
> Typically this data is required for digital image analysis purposes
> using a remote sensing application, so the integrity of 'the numbers'
> that make up the image is very important.
>
> Many of today's image based solutions use a (lossy) wavelet compression
> that can corrupt the integrity of 'the numbers' describing the
> radiometric data in the image.
>
> When we consider the big picture issues facing us today, such as Climate
> Change, I think that it is important to protect our definitive image
> libraries from such corruption as they will be invaluable sources of
> data for future multi-temporal analysis.
>
> That said, if the end use is just for a picture, then a wavelet
> compression is a good option. Just protect the source data for future use.
>
> </rant>      
>
>
> So, does anyone know of a good open source spatial solution for storing
> and accessing (multi and hyperspectral) imagery in a database?    ;-)
>
> WMS 1.3 and WCS are showing promise for serving imagery, including multi
> and hyperspectral data.
>
>
>
> Bruce Bannerman
>
>
>
>
>
> [hidden email] wrote on 20/02/2008 10:09:28 AM:
>
>  > Hi Ivan,
>  >
>  >    The most common advice I've seen says to leave raster out of the DB.
>  > Of course footprints and meta data could be there, but you would want
to
>  > point Geoserver coverage to the image/image pyramid url somewhere in
the
>  > directory hierarchy.
>  >
>  > Brent has a nice writeup here:
>  > http://docs.codehaus.org/display/GEOSDOC/Load+NASA+Blue+Marble+Data
>  >
>  > In an AWS sense my idea is to Java proxy the Geoserver Coverage Data
> URL to
>  > S3 buckets and park the imagery over on the S3 side to take advantage
of
>  > stability and replication. Performance, though, might not be as good as
a

>  > local directory. Maybe a one time cache to a local directory would work
>  > better.
>  >
>  > Note: Amazon doesn't charge for inside AWS data transfers.
>  >
>  > So in theory:
>  >   PostGIS holds the footprint geometry + metadata
>  >   EC2 Geoserver WFS handles footprint queries into an Svg/Xaml
> client, just
>  > stick it on top of something like JPL BMNG. Once a user picks a
coverage
>  > switch to the Geoserver WMS/WCS service for zooming around in the
> selected
>  > image pyramid
>  >   S3 buckets contain the tiffs, pyramids ...
>  >   EC2 Geoserver handles WMS/WCS service
>  >   EC2 proxy pulls the imagery from the S3 side as needed
>  >
>  > Sorry I haven't had time to try this so it is just theoretical. Of
course
>  > you can go traditional and just keep the coverage imagery files on
> the local
>  > instance avoiding the S3 proxy idea. The reason I don't like that idea
is

>  > the imagery has to be loaded with every instance creation while an S3
>  > approach would need only one copy.
>  >
>  >
>  > randy
>  >
>  > -----Original Message-----
>  > From: Lucena, Ivan [mailto:[hidden email]]
>  > Sent: Tuesday, February 19, 2008 2:59 PM
>  > To: [hidden email]; OSGeo Discussions
>  > Subject: Re: [OSGeo-Discuss] OS Spatial environment 'sizing'
>  >
>  > Hi Randy, Bruce,
>  >
>  > That is a nice piece of advise Randy. I am sorry to intrude the
>  > conversation but I would like to ask how that "heavy raster"
>  > manipulation would be treated by PostgreSQL/PostGIS, managed or
> unmanaged?
>  >
>  > Best regards,
>  >
>  > Ivan
>  >
>  > Randy George wrote:
>  > > Hi Bruce,
>  > >
>  > >  
>  > >
>  > >                 On the "scale relatively quickly" front, you should
> look
>  > > at Amazon's EC2/S3 services. I've recently worked with it and find
> it an
>  > > attractive platform for scaling http://www.cadmaps.com/gisblog
>  > >
>  > >  
>  > >
>  > > The stack I like is Ubuntu+Java+ Postgresql/PostGIS + Apache2 mod_jk
>  > > Tomcat + Geoserver + custom SVG or XAML clients run out of Tomcat
>  > >
>  > >  
>  > >
>  > >                 If you use the larger instances the cost is higher
but

>  > > it sounds like you plan on some heavy raster services (WMS,WCS) and
> lots
>  > > of memory will help.
>  > >
>  > > Small EC2 instance provides $0.10/hr:
>  > >
>  > > 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2
> Compute
>  > > Unit), 160 GB of instance storage, 32-bit platform
>  > >
>  > >  
>  > >
>  > > Large EC2 instances provide $0.40/hr:
>  > >
>  > > 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2
>  > > Compute Units each), 850 GB of instance storage, 64-bit platform
>  > >
>  > >  
>  > >
>  > > Extra large EC2 instances $0.80/hr:
>  > >
>  > > 15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2
> Compute
>  > > Units each), 1690 GB of instance storage, 64-bit platform
>  > >
>  > >  
>  > >
>  > > Note: that the instances do not need to be permanent. Some people
>  > > (WeoGeo) have been using a couple of failover small instances and
then
>  > > starting new large instances for specific requirements. The idea is
to
>  > > start and stop instances as required rather than having ongoing
>  > > infrastructure costs. It only takes a minute or so to start an ec2
>  > > instance. If you are running a corporate service there may be parts
of
>  > > the day with very little use so you just schedule your heavy duty
>  > > instances for peak times. If you can connect your raster to S3
buckets
>  > > rather than instance storage you have built in replicated backup.
>  > >
>  > >  
>  > >
>  > > I know that Java JAI can easily eat up memory and is core to
Geoserver
>  > > WMS/WCS so you probably want to look at large memory footprint for
any
>  > > platform with lots of raster service. I'm partial to Geoserver
because
>  > > of its Java foundation.  I think I would try to keep the Apache2
> mod_jk
>  > > Tomcat Geoserver on a separate server instance from PostGIS. This
> might
>  > > avoid problems for instance startup since your database would need
> to be
>  > > loaded separately. The instance ami resides in a 10G partition the
>  > > balance of data will probably reside on a /mnt partition separate
from
>  > > ec2-run-instances. You may be able to avoid datadir problems by
adding

>  > > something like Elastra to the mix. Elastra beta is a wrapper for
>  > > PostgreSql that puts the datadir on S3 rather than local to an
> instance.
>  > > I suppose they still keep indices(GIST et al) on the local instance.
>  > >
>  > > (I still think it an interesting exercise to see what could be done
>  > > connecting PostGIS to AWS SimpleDB services.)
>  > >
>  > >  
>  > >
>  > > So thinking out loud here is a possible architecture-
>  > >
>  > >     Basic permanent setup
>  > >
>  > > put raster in S3 - this may require some customization of Geoserver,
>  > >
>  > > build a datadir in a PostGIS and backup to S3
>  > >
>  > > create a private ami for Postgresql/PostGIS
>  > >
>  > > create a private ami for the load balancer instance
>  > >
>  > > create a private ami with your service stack for both a small and
> large
>  > > instance for flexibility,
>  > >
>  > >    Startup services
>  > >
>  > > start a balancer instance
>  > >
>  > > point your DNS CNAME to this balancer instance
>  > >
>  > > start a PostGis instance (you could have more than one if necessary
> but
>  > > it would be easier to just scale to a larger instance type if the
load

>  > > demands it)
>  > >
>  > > have a scripted download from an S3 BU to your PostGIS datadir (I'm
>  > > assuming a relatively static data resource)
>  > >
>  > >    Variable services
>  > >
>  > > start service stack instance and connect to PostGIS
>  > >
>  > > update balancer to see new instance - this could be tricky
>  > >
>  > > repeat previous  two steps as needed
>  > >
>  > > at night scale back - cron scaling for a known cycle or use a
> controller
>  > > like weoceo to detect and respond to load fluctuation
>  > >
>  > >  
>  > >
>  > > By the way the public AWS ami with the best resources that I have
> found
>  > > is Ubuntu 7.10 Gutsy. The debian dependency tools are much easier
> to use
>  > > and the resources are plentiful.
>  > >
>  > >  
>  > >
>  > > I've been toying with using an AWS stack adapted for serving some
> larger
>  > > Postgis vector sets such as fully connected census demographic data
> and
>  > > block polygons here in US. The idea would be to populate the data
>  > > directly from the census SF* and TIGER with a background Java bot.
> There
>  > > are some potentially novel 3D viewing approaches possible with xaml.
>  > > Anyway lots of fun to have access to virtual systems like this.
>  > >
>  > >  
>  > >
>  > > As you can see I'm excited anyway.
>  > >
>  > >  
>  > >
>  > > randy
>  > >
>  > >  
>  > >
>  > >  
>  > >
>  > > *From:* [hidden email]
>  > > [mailto:[hidden email]] *On Behalf Of
>  > > *[hidden email]
>  > > *Sent:* Monday, February 18, 2008 6:35 PM
>  > > *To:* OSGeo Discussions
>  > > *Subject:* [OSGeo-Discuss] OS Spatial environment 'sizing'
>  > >
>  > >  
>  > >
>  > >
>  > > IMO:
>  > >
>  > >
>  > > Hello everyone,
>  > >
>  > > I'm trying to get a feel for server 'sizing' for a **hypothetical**
>  > > Corporate environment to support OS Spatial apps.
>  > >
>  > >
>  > >
>  > > Assume that:
>  > >
>  > > - this is a dedicated environment to allow the use of OS Spatial
>  > > applications to serve Corporate OGC Services.
>  > >
>  > > - the applications of interest are GeoServer, Deegree, GeoNetwork,
>  > > MapServer, MapGuide and Postgres/PostGIS.
>  > >
>  > > - the environment may need to scale relatively quickly.
>  > >
>  > > - it will be required to serve in the vicinty of 5 to 10 TB of data
>  > > initially (WMS, WFS, WCS).
>  > >
>  > >
>  > >
>  > > Can anyone shed some light on the following questions please?
>  > >
>  > > - I'm assuming a Linux installation (SLES, Redhat or Debian) or
> possibly
>  > > Intel Solaris. Has anyone experienced any issues in these (or other)
>  > > environments that they'd like to share?
>  > >
>  > > - Are there any recommendations as to dedicated network bandwidth
that
>  > > should be allocated?
>  > >
>  > > - Has anyone done any work with load balancing and would like to
share

>  > > their experiences?
>  > >
>  > > - Of the above OS Spatial products, which ones could co-exist on the
>  > > same server (excluding Postgres/PostGIS)?
>  > >
>  > >
>  > > Any thoughts are appreciated.
>  > >
>  > >
>  > > Bruce Bannerman
>  > > Australia
>  > >
>  > > Notice:
>  > > This email and any attachments may contain information that is
> personal,
>  > > confidential,
>  > > legally privileged and/or copyright. No part of it should be
> reproduced,
>  > > adapted or communicated without the prior written consent of the
>  > > copyright owner.
>  > >
>  > > It is the responsibility of the recipient to check for and remove
> viruses.
>  > >
>  > > If you have received this email in error, please notify the sender by
>  > > return email, delete it from your system and destroy any copies.
> You are
>  > > not authorised to use, communicate or rely on the information
> contained
>  > > in this email.
>  > >
>  > > Please consider the environment before printing this email.
>  > >
>  > >  
>  > >
>  > >  
>  > >
>  > >  
>  > >
>  > >
>  > >
> ------------------------------------------------------------------------
>  > >
>  > > _______________________________________________
>  > > Discuss mailing list
>  > > [hidden email]
>  > > http://lists.osgeo.org/mailman/listinfo/discuss
>  >
>  > _______________________________________________
>  > Discuss mailing list
>  > [hidden email]
>  > http://lists.osgeo.org/mailman/listinfo/discuss
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Discuss mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/discuss
_______________________________________________
Discuss mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/discuss

_______________________________________________
Discuss mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/discuss
Reply | Threaded
Open this post in threaded view
|

Re: OS Spatial environment 'sizing' + Image Management

Arnulf Christl (OSGeo)
In reply to this post by Ivan Lucena
Lucena, Ivan wrote:

> Hi Bruce,
>
> Here I am again...
>
> Randy suggestion are pretty valuable and very well based but I have a
> especial interest on storing raster on databases so that is why I asked
> about it.
>
> Yes, raster is chunky and not very fluid but I love to hear from
> successful experience like Bruce's. And as Bruce also mentioned
> analytical process often needs to query on cell space rather than bands.
>
> Remember that decades ago some of us would be discussing the
> disadvantage of storing "vector" on databases now it is the norm for
> client/server application.

Hehe, good move, that one caught my attention. Nonetheless I will stick with suggesting to keep rasters out of databases - especially if you want to keep it for generations to come. You never know whether you will be able to get the stuff back out of SDE and Oracle in time to back it up and use it elsewhere[1]. So if you want to keep it for generations to come save it as uncompressed pixels on some large hard disks. Better still, publish them openly so that people can take them home and store it there for further reference. Distributing and making it widely available will assure that it survives (if you really mean it).

> Bruce mentioned SDE and Oracle, but what are the *open source* options
> to do *image management* on open source databases and who is using it?

There are none because nobody is using them. Thats an old and very useful Open Source deadlock. If there is no use for it, don't implement it. Fun part aside - asfaik to this day no need has grown for any real world Open Source implementations.

Regards,


[1] The brutal truth is this: when your key business processes are executed by opaque blocks of bits that you can't even see inside (let alone modify) you have lost control of your business. You need your supplier more than your supplier needs you--and you will pay, and pay, and pay again for that power imbalance. You'll pay in higher prices, you'll pay in lost opportunities, and you'll pay in lock-in that grows worse over time as the supplier (who has refined its game on a lot of previous victims) tightens its hold.

http://www.oreilly.com/catalog/cathbazpaper/chapter/ch05.html#AUTOID-1787

> I can only think of two, the PostGIS CHIP datatype and Terralib schemas
> (MySQL, PostgreSQL, and commercial RDBMS) but I don't know of any
> *sizable* project that is using then.
>
> Does anybody know and would like to share?
>
> Best regards,
>
> Ivan
>
>
> [hidden email] wrote:
>>
>> IMO:
>>
>>
>> Hi Randy,
>>
>> Thank you for your informative post. It has given me a lot to follow
>> up on and think about.
>>
>> I can see an immediate need that this type of solution could well be
>> used for. I like it.
>>
>> I suspect that in many larger corporate types of environments, it
>> could well be used effectively for 'pilot' and 'pre-production' type
>> tasks.
>>
>> For 'production' type environments, there would be issues of
>> integrating an external service hosting spatial data with internal
>> services hosting corporate aspatial data sources and applications.
>>
>>
>>
>> with regards to storing imagery in a database:
>>
>> <rant>       (and not directed at you)
>>
>> I've also seen a lot of reports suggesting that image management
>> should be file based.
>>
>> My personal preference is to use a database if possible, so that I can
>> take advantage of corporate data management facilities, backups, point
>> in time restores etc.
>>
>> I've managed 70 GB orthophoto mosaics in ArcSDE / Oracle before with
>> minimal problems. I found performance and response times to be
>> comparable with other image web server options on the market that use
>> file based solutions for storing data.
>>
>> Ideally, I'm looking to manage state wide mosaics with a consistant
>> look and feel that can be treated as a single 'layer' by client GIS /
>> Remote Sensing applications (data integrity issues allowing).
>>
>> One potential use is 'best available' data mosaics could undergo
>> regular updates as more imagery is flown or captured. A database makes
>> it easier to manage and deliver such data.
>>
>> My definition of 'imagery' goes beyond aerial photographs and includes
>> multi or hyper-spectral imagery; various geophysics data sources such
>> as aeromagnetics, gravity, radiometrics; radar data etc.
>>
>> Typically this data is required for digital image analysis purposes
>> using a remote sensing application, so the integrity of 'the numbers'
>> that make up the image is very important.
>>
>> Many of today's image based solutions use a (lossy) wavelet
>> compression that can corrupt the integrity of 'the numbers' describing
>> the radiometric data in the image.
>>
>> When we consider the big picture issues facing us today, such as
>> Climate Change, I think that it is important to protect our definitive
>> image libraries from such corruption as they will be invaluable
>> sources of data for future multi-temporal analysis.
>>
>> That said, if the end use is just for a picture, then a wavelet
>> compression is a good option. Just protect the source data for future
>> use.
>>
>> </rant>    
>>
>> So, does anyone know of a good open source spatial solution for
>> storing and accessing (multi and hyperspectral) imagery in a
>> database?    ;-)
>>
>> WMS 1.3 and WCS are showing promise for serving imagery, including
>> multi and hyperspectral data.
>>
>>
>>
>> Bruce Bannerman
>>
>>
>>
>>
>>
>> [hidden email] wrote on 20/02/2008 10:09:28 AM:
>>
>>  > Hi Ivan,
>>  >
>>  >    The most common advice I've seen says to leave raster out of the
>> DB.
>>  > Of course footprints and meta data could be there, but you would
>> want to
>>  > point Geoserver coverage to the image/image pyramid url somewhere
>> in the
>>  > directory hierarchy.
>>  >
>>  > Brent has a nice writeup here:
>>  > http://docs.codehaus.org/display/GEOSDOC/Load+NASA+Blue+Marble+Data
>>  >
>>  > In an AWS sense my idea is to Java proxy the Geoserver Coverage
>> Data URL to
>>  > S3 buckets and park the imagery over on the S3 side to take
>> advantage of
>>  > stability and replication. Performance, though, might not be as
>> good as a
>>  > local directory. Maybe a one time cache to a local directory would
>> work
>>  > better.
>>  >
>>  > Note: Amazon doesn't charge for inside AWS data transfers.
>>  >
>>  > So in theory:
>>  >   PostGIS holds the footprint geometry + metadata
>>  >   EC2 Geoserver WFS handles footprint queries into an Svg/Xaml
>> client, just
>>  > stick it on top of something like JPL BMNG. Once a user picks a
>> coverage
>>  > switch to the Geoserver WMS/WCS service for zooming around in the
>> selected
>>  > image pyramid
>>  >   S3 buckets contain the tiffs, pyramids ...
>>  >   EC2 Geoserver handles WMS/WCS service
>>  >   EC2 proxy pulls the imagery from the S3 side as needed
>>  >
>>  > Sorry I haven't had time to try this so it is just theoretical. Of
>> course
>>  > you can go traditional and just keep the coverage imagery files on
>> the local
>>  > instance avoiding the S3 proxy idea. The reason I don't like that
>> idea is
>>  > the imagery has to be loaded with every instance creation while an S3
>>  > approach would need only one copy.
>>  >
>>  >
>>  > randy
>>  >
>>  > -----Original Message-----
>>  > From: Lucena, Ivan [mailto:[hidden email]]
>>  > Sent: Tuesday, February 19, 2008 2:59 PM
>>  > To: [hidden email]; OSGeo Discussions
>>  > Subject: Re: [OSGeo-Discuss] OS Spatial environment 'sizing'
>>  >
>>  > Hi Randy, Bruce,
>>  >
>>  > That is a nice piece of advise Randy. I am sorry to intrude the
>>  > conversation but I would like to ask how that "heavy raster"
>>  > manipulation would be treated by PostgreSQL/PostGIS, managed or
>> unmanaged?
>>  >
>>  > Best regards,
>>  >
>>  > Ivan
>>  >
>>  > Randy George wrote:
>>  > > Hi Bruce,
>>  > >
>>  > >   > >
>>  > >                 On the "scale relatively quickly" front, you
>> should look
>>  > > at Amazon's EC2/S3 services. I've recently worked with it and
>> find it an
>>  > > attractive platform for scaling http://www.cadmaps.com/gisblog
>>  > >
>>  > >   > >
>>  > > The stack I like is Ubuntu+Java+ Postgresql/PostGIS + Apache2 mod_jk
>>  > > Tomcat + Geoserver + custom SVG or XAML clients run out of Tomcat
>>  > >
>>  > >   > >
>>  > >                 If you use the larger instances the cost is
>> higher but
>>  > > it sounds like you plan on some heavy raster services (WMS,WCS)
>> and lots
>>  > > of memory will help.
>>  > >
>>  > > Small EC2 instance provides $0.10/hr:
>>  > >
>>  > > 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2
>> Compute
>>  > > Unit), 160 GB of instance storage, 32-bit platform
>>  > >
>>  > >   > >
>>  > > Large EC2 instances provide $0.40/hr:
>>  > >
>>  > > 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2
>>  > > Compute Units each), 850 GB of instance storage, 64-bit platform
>>  > >
>>  > >   > >
>>  > > Extra large EC2 instances $0.80/hr:
>>  > >
>>  > > 15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2
>> Compute
>>  > > Units each), 1690 GB of instance storage, 64-bit platform
>>  > >
>>  > >   > >
>>  > > Note: that the instances do not need to be permanent. Some people
>>  > > (WeoGeo) have been using a couple of failover small instances and
>> then
>>  > > starting new large instances for specific requirements. The idea
>> is to
>>  > > start and stop instances as required rather than having ongoing
>>  > > infrastructure costs. It only takes a minute or so to start an ec2
>>  > > instance. If you are running a corporate service there may be
>> parts of
>>  > > the day with very little use so you just schedule your heavy duty
>>  > > instances for peak times. If you can connect your raster to S3
>> buckets
>>  > > rather than instance storage you have built in replicated backup.
>>  > >
>>  > >   > >
>>  > > I know that Java JAI can easily eat up memory and is core to
>> Geoserver
>>  > > WMS/WCS so you probably want to look at large memory footprint
>> for any
>>  > > platform with lots of raster service. I'm partial to Geoserver
>> because
>>  > > of its Java foundation.  I think I would try to keep the Apache2
>> mod_jk
>>  > > Tomcat Geoserver on a separate server instance from PostGIS. This
>> might
>>  > > avoid problems for instance startup since your database would
>> need to be
>>  > > loaded separately. The instance ami resides in a 10G partition the
>>  > > balance of data will probably reside on a /mnt partition separate
>> from
>>  > > ec2-run-instances. You may be able to avoid datadir problems by
>> adding
>>  > > something like Elastra to the mix. Elastra beta is a wrapper for
>>  > > PostgreSql that puts the datadir on S3 rather than local to an
>> instance.
>>  > > I suppose they still keep indices(GIST et al) on the local instance.
>>  > >
>>  > > (I still think it an interesting exercise to see what could be done
>>  > > connecting PostGIS to AWS SimpleDB services.)
>>  > >
>>  > >   > >
>>  > > So thinking out loud here is a possible architecture-
>>  > >
>>  > >     Basic permanent setup
>>  > >
>>  > > put raster in S3 - this may require some customization of Geoserver,
>>  > >
>>  > > build a datadir in a PostGIS and backup to S3
>>  > >
>>  > > create a private ami for Postgresql/PostGIS
>>  > >
>>  > > create a private ami for the load balancer instance
>>  > >
>>  > > create a private ami with your service stack for both a small and
>> large
>>  > > instance for flexibility,
>>  > >
>>  > >    Startup services
>>  > >
>>  > > start a balancer instance
>>  > >
>>  > > point your DNS CNAME to this balancer instance
>>  > >
>>  > > start a PostGis instance (you could have more than one if
>> necessary but
>>  > > it would be easier to just scale to a larger instance type if the
>> load
>>  > > demands it)
>>  > >
>>  > > have a scripted download from an S3 BU to your PostGIS datadir (I'm
>>  > > assuming a relatively static data resource)
>>  > >
>>  > >    Variable services
>>  > >
>>  > > start service stack instance and connect to PostGIS
>>  > >
>>  > > update balancer to see new instance - this could be tricky
>>  > >
>>  > > repeat previous  two steps as needed
>>  > >
>>  > > at night scale back - cron scaling for a known cycle or use a
>> controller
>>  > > like weoceo to detect and respond to load fluctuation
>>  > >
>>  > >   > >
>>  > > By the way the public AWS ami with the best resources that I have
>> found
>>  > > is Ubuntu 7.10 Gutsy. The debian dependency tools are much easier
>> to use
>>  > > and the resources are plentiful.
>>  > >
>>  > >   > >
>>  > > I've been toying with using an AWS stack adapted for serving some
>> larger
>>  > > Postgis vector sets such as fully connected census demographic
>> data and
>>  > > block polygons here in US. The idea would be to populate the data
>>  > > directly from the census SF* and TIGER with a background Java
>> bot. There
>>  > > are some potentially novel 3D viewing approaches possible with xaml.
>>  > > Anyway lots of fun to have access to virtual systems like this.
>>  > >
>>  > >   > >
>>  > > As you can see I'm excited anyway.
>>  > >
>>  > >   > >
>>  > > randy
>>  > >
>>  > >   > >
>>  > >   > >
>>  > > *From:* [hidden email]
>>  > > [mailto:[hidden email]] *On Behalf Of
>>  > > *[hidden email]
>>  > > *Sent:* Monday, February 18, 2008 6:35 PM
>>  > > *To:* OSGeo Discussions
>>  > > *Subject:* [OSGeo-Discuss] OS Spatial environment 'sizing'
>>  > >
>>  > >   > >
>>  > >
>>  > > IMO:
>>  > >
>>  > >
>>  > > Hello everyone,
>>  > >
>>  > > I'm trying to get a feel for server 'sizing' for a **hypothetical**
>>  > > Corporate environment to support OS Spatial apps.
>>  > >
>>  > >
>>  > >
>>  > > Assume that:
>>  > >
>>  > > - this is a dedicated environment to allow the use of OS Spatial
>>  > > applications to serve Corporate OGC Services.
>>  > >
>>  > > - the applications of interest are GeoServer, Deegree, GeoNetwork,
>>  > > MapServer, MapGuide and Postgres/PostGIS.
>>  > >
>>  > > - the environment may need to scale relatively quickly.
>>  > >
>>  > > - it will be required to serve in the vicinty of 5 to 10 TB of data
>>  > > initially (WMS, WFS, WCS).
>>  > >
>>  > >
>>  > >
>>  > > Can anyone shed some light on the following questions please?
>>  > >
>>  > > - I'm assuming a Linux installation (SLES, Redhat or Debian) or
>> possibly
>>  > > Intel Solaris. Has anyone experienced any issues in these (or other)
>>  > > environments that they'd like to share?
>>  > >
>>  > > - Are there any recommendations as to dedicated network bandwidth
>> that
>>  > > should be allocated?
>>  > >
>>  > > - Has anyone done any work with load balancing and would like to
>> share
>>  > > their experiences?
>>  > >
>>  > > - Of the above OS Spatial products, which ones could co-exist on the
>>  > > same server (excluding Postgres/PostGIS)?
>>  > >
>>  > >
>>  > > Any thoughts are appreciated.
>>  > >
>>  > >
>>  > > Bruce Bannerman
>>  > > Australia
>>  > >
>>  > > Notice:
>>  > > This email and any attachments may contain information that is
>> personal,
>>  > > confidential,
>>  > > legally privileged and/or copyright. No part of it should be
>> reproduced,
>>  > > adapted or communicated without the prior written consent of the
>>  > > copyright owner.
>>  > >
>>  > > It is the responsibility of the recipient to check for and remove
>> viruses.
>>  > >
>>  > > If you have received this email in error, please notify the
>> sender by
>>  > > return email, delete it from your system and destroy any copies.
>> You are
>>  > > not authorised to use, communicate or rely on the information
>> contained
>>  > > in this email.
>>  > >
>>  > > Please consider the environment before printing this email.
>>  > >
>>  > >   > >
>>  > >   > >
>>  > >   > >
>>  > >
>>  > >
>> ------------------------------------------------------------------------
>>  > >
>>  > > _______________________________________________
>>  > > Discuss mailing list
>>  > > [hidden email]
>>  > > http://lists.osgeo.org/mailman/listinfo/discuss
>>  >
>>  > _______________________________________________
>>  > Discuss mailing list
>>  > [hidden email]
>>  > http://lists.osgeo.org/mailman/listinfo/discuss
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Discuss mailing list
>> [hidden email]
>> http://lists.osgeo.org/mailman/listinfo/discuss
> _______________________________________________
> Discuss mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/discuss

_______________________________________________
Discuss mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/discuss
Reply | Threaded
Open this post in threaded view
|

RE: Image Management in an RDBMS...(was OS Spatial environment 'sizing')

Bruce.Bannerman
In reply to this post by Randy George

IMO:

Hi Randy, Ivan and Arnulf,


I seem to have spawned another thread, moving away from my original post and Randy's excellent response.

Sorry.

I've renamed this thread accordingly.

more below...


[hidden email] wrote on 21/02/2008 02:38:13 AM:

> Hi Ivan and Bruce,
>
>
>    I am curious to know what advantage an arcSDE/Oracle stack would
> provide on image storage. I had understood imagery was simply stored as
> large blob fields and streamed in and out of the DB where it is
> processed/viewed etc. The original state I had understood was unchanged
> (lossy, wavelet, pk or otherwise happening outside the DB), just residing in
> the DB directory rather than the disk hierarchy. Other than possible table
> corruption issues I imagined that the overhead for streaming a blob into an
> image object was the only real concern on DB storage.




The ArcSDE storage of imagery solution that I described in my earlier post was at a previous place of employment. They still utilise the solution effectively.

While the storage of imagery using ArcSDE can technically utilise multiple bands of radiometric data, it is mainly using a set of blob records as Randy identified. This limits the usefulness of the product when you want a flexible tool to manage multi or hyper spectral data. This is also one of the reasons that I'm looking for alternate RDBMS based solutions.

Having said that I found ArcSDE to be quite effective for orthophoto mosaics of aerial photography as I described earlier.

The data that we used was:

aerial photography:

- approx 500 individual images from around fifteen runs of photography
- approx 140 panelled ground control points and airbourne GPS
- photography was scanned and aerotriangulated
- imagery was then mosaiced, orthorectified and colour balanced
- imagery then diced into around 70 RGB TIFF6 files, each around 1 GB, ~6 cm ground resolution.
- imagery loaded into Oracle/ArcSDE
- positional accuracy determined (~0.1m) using stats and spread of error viewed usinging krieging techniques.


In short ESRI's approach with ArcSDE (as I understand it) is:

- images broken down into small blobs (we used 128k x 128k tiles, LZ77 compressed TIFF) and loaded with one 128k blob per database record.

- statistics calculated on imagery

- 7 pyramid layers created


This gave us the ability to:

- store a relatively large amount imagery and utilise it as a single entity (e.g. a layer).

- only retrieve the records (tiles) required for the geographic area being viewed. That is
  we did not need to load the entire mosaic into memory, just stream the records required.

- only utilised an appropriate image sample for the viewing scale utilised via the pyramid
  layers (a common technique used by RS products).

- if required, add additional data to the mosaic.

- take advantage of corporate data management techniques as discussed previously.


As Arnulf correctly identified, there is a black box behind the data storage. But this is equally true for the majority of spatial data that is under active management around the world. Ideally we would utilise an open format for storage and an open format for delivery.

Also for Arnulf:

- I think that the user requirement is there for storing raster data in a DB. We have had two uses identified by myself and by Ivan.

- When you consider the complexities that Google must be facing with GE in trying to manage 256x256k tiles of imagery over the entire world, at multiple pyramid layers and with constant revision of imagery, you can soon see that a file based approach would lead to a major headache.

- I personally think that the case for raster in a DB has been made.



Now what I'd ideally like to find is a good solution for managing multi and hyper spectral data in an RDB with the ability to serve whatever band combination that a **user** requires via an appropriate standard (possibly via WMS 1.3+ or WCS).

Does anyone know of any solutions, preferrably OS?


I do recall a product that came out of a German Uni around 2003. There was some talk on GIS-L at the time, however it has slipped off my radar. Does anyone know what became of it. I do recall that they'd had discussions with Oracle. It was around this time that Oracle announced their Georaster format.



Bruce Bannerman








Notice:
This email and any attachments may contain information that is personal, confidential,
legally privileged and/or copyright.
No part of it should be reproduced, adapted or communicated without the prior written consent of the copyright owner.

It is the responsibility of the recipient to check for and remove viruses.

If you have received this email in error, please notify the sender by return email, delete it from your system and destroy any copies. You are not authorised to use, communicate or rely on the information contained in this email.

Please consider the environment before printing this email.

 

 

 


_______________________________________________
Discuss mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/discuss
Reply | Threaded
Open this post in threaded view
|

Re: Image Management in an RDBMS...(was OS Spatial environment 'sizing')

Chris Holmes-2

> - When you consider the complexities that Google must be facing with GE
> in trying to manage 256x256k tiles of imagery over the entire world, at
> multiple pyramid layers and with constant revision of imagery, you can
> soon see that a file based approach would lead to a major headache.

He he, I think I'd write that same sentence but substitute 'database
approach' for 'file based approach'.  I'd be pretty shocked if Google
were using any kind of database for their tiles.  They certainly aren't
paying oracle or arcsde license fees.  They could have a custom mysql
solution, but I'd guess it's all on the Google File System:
http://labs.google.com/papers/gfs.html

Also, I think it's still in pretty beta development, but Geomatys has
been working on PostGRID -
http://seagis.sourceforge.net/postgrid/index.html and
http://www.foss4g2007.org/presentations/view.php?abstract_id=225 have
some information.  I believe is pretty attached to java, but I think
does some of what you want, managing the metadata in the database.
Though I could be wrong about if it's close to what you're thinking of,
my understanding of the raster side of the fence has never been that strong.

best regards,

Chris

_______________________________________________
Discuss mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/discuss

cholmes.vcf (294 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: Image Management in an RDBMS...(was OS Spatialenvironment 'sizing')

Ed McNierney-4
Well, I'm just one opinion, but I USED to store 17 million rasters in a database, and got tired of the hassles, so I switched to a file-based storage system.  I try to manage lots of tiles of imagery over the United States and Canada, with multiple pyramid layers and constant revision of imagery, and it's not that big a deal.  It's been a very part-time job for one guy for several years now.

It's very helpful to store the metadata in a database (PostgreSQL/PostGIS) but I don't see the benefit of storing the raster data there, too - and I don't like having the mechanics of my raster access be a mystery to me.  I like to know where exactly the data is and how it's accessed.

     - Ed

Ed McNierney
Chief Mapmaker
Demand Media / TopoZone.com
73 Princeton Street, Suite 305
North Chelmsford, MA  01863
Phone: 978-251-4242, Fax: 978-251-1396
[hidden email]

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Chris Holmes
Sent: Wednesday, February 20, 2008 7:46 PM
To: OSGeo Discussions
Subject: Re: [OSGeo-Discuss] Image Management in an RDBMS...(was OS Spatialenvironment 'sizing')


> - When you consider the complexities that Google must be facing with
> GE in trying to manage 256x256k tiles of imagery over the entire
> world, at multiple pyramid layers and with constant revision of
> imagery, you can soon see that a file based approach would lead to a major headache.

He he, I think I'd write that same sentence but substitute 'database approach' for 'file based approach'.  I'd be pretty shocked if Google were using any kind of database for their tiles.  They certainly aren't paying oracle or arcsde license fees.  They could have a custom mysql solution, but I'd guess it's all on the Google File System:
http://labs.google.com/papers/gfs.html

Also, I think it's still in pretty beta development, but Geomatys has been working on PostGRID - http://seagis.sourceforge.net/postgrid/index.html and
http://www.foss4g2007.org/presentations/view.php?abstract_id=225 have some information.  I believe is pretty attached to java, but I think does some of what you want, managing the metadata in the database.
Though I could be wrong about if it's close to what you're thinking of, my understanding of the raster side of the fence has never been that strong.

best regards,

Chris
_______________________________________________
Discuss mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/discuss
Reply | Threaded
Open this post in threaded view
|

Re: Image Management in an RDBMS...(was OS Spatial environment 'sizing')

Tyler Mitchell (OSGeo)
In reply to this post by Bruce.Bannerman
On 20-Feb-08, at 4:29 PM, [hidden email] wrote:

> - When you consider the complexities that Google must be facing  
> with GE in trying to manage 256x256k tiles of imagery over the  
> entire world, at multiple pyramid layers and with constant revision  
> of imagery, you can soon see that a file based approach would lead  
> to a major headache.

Hi Bruce,
Am I correct in believing that the two things people desire with  
images in an RDB,  is having an abstract 1) storage framework  
(tables) and 2) a common access language (SQL) for managing the  
framework.   You could have the most complex storage set up behind  
the scenes, but as long as the access interface plays well, the  
complexity could be minimised by good UI design.   At least I think  
so, but haven't done it before.

However, I did manage massive (at the time) amounts of vector files  
in the file system, and was dreaming about using a db.  All the while  
I watched some others make the wholesale shift to vectors in a  
transactional db.  I grimaced when asking them for data, only to wait  
while they batched up an SQL request to extract back into files --  
what used to be a simple copy command to move a zip file into an FTP  
folder.  I admired their commitment, but was frustrated by usability  
in the end (as were they).  It was good food for thought nonetheless  
as I planned my own vector-in-db direction :)

Some more recent raster in db discussion here:
http://spatialgalaxy.net/2008/02/15/rasters-in-the-database-why-bother/

Tyler

_______________________________________________
Discuss mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/discuss
Reply | Threaded
Open this post in threaded view
|

RE: OS Spatial environment 'sizing' + Image Management

Michael P. Gerlek
In reply to this post by Randy George
Interesting thread...  A couple points from the sidelines:

My company sells a store-your-images-in-a-database product, for storing
JPEG 2000 and MrSID imagery; there are indeed people who see value in
using a DB to manage their raster assets.

Our product is not open source, but when using it with JPEG 2000 images
it *is* designed around the appropriate JP2 standards for storing the
data.  Very loosely speaking, it uses JP2's internal tiling scheme and
each such tile is stored as a blob.  Each band can be stored separately,
for the sort of workflows you describe.  (Also, note that "wavelet" does
not necessarily imply "lossy" anymore, as many assume.  Story of my
life.)

The whole
how-do-I-do-an-image-processing-workflows-across-a-chain-of-servers
thing keeps me up at night, esp. the points you bring up below (eek,
tile boundaries!).  I think WPS is "moving slower" for a few reasons:
OGC specs proceed at a deliberate pace, there are (relatively) few
people involved in WPS, and -- most importantly to me -- the workflows
are still not well understood enough to have a critical mass of people
pushing for a baseline functionality set.

-mpg

 

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Randy George
> Sent: Wednesday, February 20, 2008 7:38 AM
> To: 'OSGeo Discussions'
> Subject: RE: [OSGeo-Discuss] OS Spatial environment 'sizing'
> + Image Management
>
> Hi Ivan and Bruce,
>
> Interesting, other than using JAI a bit on Space
> Imaging data (this
> was awhile back) I have been mostly using vectors.
>
> I am curious to know what advantage an arcSDE/Oracle stack would
> provide on image storage. I had understood imagery was simply
> stored as
> large blob fields and streamed in and out of the DB where it is
> processed/viewed etc. The original state I had understood was
> unchanged
> (lossy, wavelet, pk or otherwise happening outside the DB),
> just residing in
> the DB directory rather than the disk hierarchy. Other than
> possible table
> corruption issues I imagined that the overhead for streaming
> a blob into an
> image object was the only real concern on DB storage.
>
> But I'm getting the idea that something a bit more is going
> on. Does the
> image actually get retiled (celled) and then stored in
> multiple fields? Is a
> multispectral broken into bands first before storing in
> separate fields?
> CHIP sounds more like an additional database function to
> optimize chipping
> inside a DBTable so that an entire image doesn't have to be
> read just to
> grab a small viewbox. Does arcSDE add similar functions to
> the base DB or
> does it just grab out an image and chip, threshold,
> convolute, histogram,
> etc after the fact?
>
> I'm just curious since I've been fascinated with the prospects of
> hyperspectral imagery.
>
> >From an AWS perspective very large imagery would need some
> type of tiling
> since there is a 5Gb limit on S3 objects. Larger objects are
> typically tar
> gzipped and split before storage. It is hard to imagine a
> tiling scheme that
> large anyway. For example Google's Digital Globe tiling pyramid uses
> miniscule tiles at 256x256 compressed to approx 18kb/tile
> http://kh.google.com/kh?v=3&t=trtsqtqsqqqt
> http://www.cadmaps.com/gisblog/?p=7
>
> >From a web perspective analysis could proceed along a highly
> tiled approach.
> So the original 70Gb image becomes a tiled pyramid with the
> browser view
> changing position inside the image pyramid. Small patches
> flow in and out of
> the view with each zoom and pan. Analysis, WPS, adds some
> complexity since
> things like convolution algorithms need to be rewritten to
> take into account
> tile boundaries. Or, alternatively the viewbox is re-mosaiced
> before running
> a server side convolution that is subsequently streamed back
> to the browser
> view, not extremely fast. Hyper-spectral bands would reside
> in separate tile
> pyramids so that Boolean layer operations could proceed
> server side for
> viewing at the browser. Analysis really can't take advantage
> of predigested
> read only schemes like Google's since the whole point is to create new
> images from combinations of image bands. Consequently WPS
> seems to be moving
> slower than WMS, WCS, WFS
>
> Thanks
> Randy


[snip]
_______________________________________________
Discuss mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/discuss
Reply | Threaded
Open this post in threaded view
|

RE: OS Spatial environment 'sizing' + Image Management

Bruce.Bannerman

IMO:

Thanks for the comments Michael,

I was wondering if you'd contribute    ;-)



(Also, note that "wavelet" does
> not necessarily imply "lossy" anymore, as many assume.  Story of my
> life.)
>


Can you point me to any studies to support the claim that JPEG2000 can indeed be indeed non-lossy?


I've seen the claims over the years, but nothing to support it (not that I've actively gone looking for the info, as I haven't had the need).


Bruce







Notice:
This email and any attachments may contain information that is personal, confidential,
legally privileged and/or copyright.
No part of it should be reproduced, adapted or communicated without the prior written consent of the copyright owner.

It is the responsibility of the recipient to check for and remove viruses.

If you have received this email in error, please notify the sender by return email, delete it from your system and destroy any copies. You are not authorised to use, communicate or rely on the information contained in this email.

Please consider the environment before printing this email.

 

 

 


_______________________________________________
Discuss mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/discuss
Reply | Threaded
Open this post in threaded view
|

Re: Image Management in an RDBMS...(was OS Spatial environment 'sizing')

Bruce.Bannerman
In reply to this post by Tyler Mitchell (OSGeo)

IMO:



Hi Tyler,

Thanks for the reply.


> Am I correct in believing that the two things people desire with  
> images in an RDB,  is having an abstract 1) storage framework  
> (tables) and

> 2) a common access language (SQL) for managing the  
> framework.   You could have the most complex storage set up behind  
> the scenes, but as long as the access interface plays well, the  
> complexity could be minimised by good UI design.   At least I think  
> so, but haven't done it before.




I can't speak for other people's needs, but only give my own opinion.

My experience with storing spatial data in a database is mainly limited to ESRI's solutions based on ArcSDE/Oracle (~7 years). I have had a cursory look at Oracle Spatial and PostGres/PostGIS and intend looking a lot closer at both. I've also used a number of spatial tools over the years from a number of vendors.

I've implemented and managed a number of ArcSDE instances over the years. As skeptical as I am about the decision to rename the product as ArcGIS Server basic (or whatever its called now), ESRI have done a great job with the product. Particularly with its integration with ArcGIS Desktop as the primary user interface for adding, maintaining and managing spatial data from a GUI. You don't need to use SQL, but it also has its advantages (when appropriate).

I've found a number of benefits with managing spatial data in a corporate database environment. These comments apply to both vector and image  data. I'm sure that these comments are equally pertinent to most RDBMS maintained spatial data. Some examples are:

- Within a large organisation, it is possible to get rid of most of the islands of data that are hidden in a wide variety of departments. If implemented right, people come to see the database as the authoritive source of their data and respect it as such.

- This can remove the situation where you get multiple copies of the same dataset around your organisation, with different people making their own independent edits to the data and expecting someone to reconcile the edits with the authoritative data set at a later time (if you're lucky).

- It can also remove the situation where someone takes a copy of a critical data set and does not update it for several years, leaving business people making critical decisions on suspect data.

- You can start managing your data for a given geographic phenomena as a single entity covering a large geographic region, without having to resort to tiles and all the related edge matching problems that we had in the past (e.g. mismatching pixels, lines, polygons that just end at the tile boundary or have an incorrect attibute on the matching sheet etc).  

- Some of the biggest advantages though, come from the corporate IT support that you come to rely on, e.g. regular backups, large disk capacity on fast SAN devices, secure access to data by authorised custodians, redundant databases for disaster recovery, point in time restoration of data etc.


To date, I have not found a suitable solution for managing imagery that includes multi and hyper spectral data in a database. But I'm looking.



Ideally the solution will use open data formats for storage and delivery. I'm getting sick of having to redevelop corporate applications with the same functionality because a vendor has decided to change their technology and data formats. This results in a lot of wasted time and money that would be better used implementing effective decision support tools that allow businesses to better understand and exploit their data.



>
> Some more recent raster in db discussion here:
> http://spatialgalaxy.net/2008/02/15/rasters-in-the-database-why-bother/
>

Thanks for the heads-up Tyler. I obviously don't know what I'm talking about ;-) (eh Tim?).


Bruce



Notice:
This email and any attachments may contain information that is personal, confidential,
legally privileged and/or copyright.
No part of it should be reproduced, adapted or communicated without the prior written consent of the copyright owner.

It is the responsibility of the recipient to check for and remove viruses.

If you have received this email in error, please notify the sender by return email, delete it from your system and destroy any copies. You are not authorised to use, communicate or rely on the information contained in this email.

Please consider the environment before printing this email.

 

 

 


_______________________________________________
Discuss mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/discuss
Reply | Threaded
Open this post in threaded view
|

RE: OS Spatial environment 'sizing' + Image Management

Traian Stanev
In reply to this post by Bruce.Bannerman

 

JPEG2K supports lossless via a reversible wavelet transform with integral coefficients (which make it reversible, and so lossless). Here is a reference:

 

http://www.ece.uvic.ca/~mdadams/publications/pacrim2001.pdf

 

 

 

Traian

 

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of [hidden email]
Sent: Thursday, February 21, 2008 3:16 AM
To: OSGeo Discussions
Subject: RE: [OSGeo-Discuss] OS Spatial environment 'sizing' + Image Management

 


IMO:

Thanks for the comments Michael,

I was wondering if you'd contribute    ;-)



(Also, note that "wavelet" does
> not necessarily imply "lossy" anymore, as many assume.  Story of my
> life.)
>


Can you point me to any studies to support the claim that JPEG2000 can indeed be indeed non-lossy?


I've seen the claims over the years, but nothing to support it (not that I've actively gone looking for the info, as I haven't had the need).


Bruce






Notice:
This email and any attachments may contain information that is personal, confidential,
legally privileged and/or copyright.
No part of it should be reproduced, adapted or communicated without the prior written consent of the copyright owner.

It is the responsibility of the recipient to check for and remove viruses.

If you have received this email in error, please notify the sender by return email, delete it from your system and destroy any copies. You are not authorised to use, communicate or rely on the information contained in this email.

Please consider the environment before printing this email.

 

 

 


_______________________________________________
Discuss mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/discuss
Reply | Threaded
Open this post in threaded view
|

Re: Image Management in an RDBMS...(was OS Spatial environment 'sizing')

Arnulf Christl (OSGeo)
In reply to this post by Bruce.Bannerman
[hidden email] wrote:

> IMO:
>
>
>
> Hi Tyler,
>
> Thanks for the reply.
>
>
>> Am I correct in believing that the two things people desire with
>> images in an RDB,  is having an abstract 1) storage framework
>> (tables) and
>> 2) a common access language (SQL) for managing the
>> framework.   You could have the most complex storage set up behind
>> the scenes, but as long as the access interface plays well, the
>> complexity could be minimised by good UI design.   At least I think
>> so, but haven't done it before.
>
>
>
> I can't speak for other people's needs, but only give my own opinion.

Hi,
sorry to bother again but I still see the need to clarify that there are two issues here. One is pragmatic experience with raster storage in a proprietary set of software. The other is whether there is a *need* to store rasters in a database at all. We have a customer who loves her SDE/Oracle and would never switch back to file based access because she feels it is falling back into stone age. I tell her that she has been brain washed by certain creative minds who "sell things" (instead of develop software) into thinking that files based systems are stone age. She hates me for it and curses her vendors at the same time now... There are arguments for file based approaches, one of them is that hardware is still developing really fast. Even disk read head caches are part of the spatial data infrastructure. It is just a question on whether you make use of them. Whether they are accessed by an operating system directly or by a database that sits on top of the operating system will
surely make a difference in performance.

> My experience with storing spatial data in a database is mainly limited to
> ESRI's solutions based on ArcSDE/Oracle (~7 years). I have had a cursory
> look at Oracle Spatial and PostGres/PostGIS and intend looking a lot
> closer at both. I've also used a number of spatial tools over the years
> from a number of vendors.
>
> I've implemented and managed a number of ArcSDE instances over the years.
> As skeptical as I am about the decision to rename the product as ArcGIS
> Server basic (or whatever its called now), ESRI have done a great job with
> the product. Particularly with its integration with ArcGIS Desktop as the
> primary user interface for adding, maintaining and managing spatial data
> from a GUI. You don't need to use SQL, but it also has its advantages
> (when appropriate).

This should probably rather read "with ArcGIS Desktop as the *only* user interface" - which makes you depend on that vendor. I would amend the second part to read "Unfortunately you can't use SQL,..." but thats also just a personal opinion.

> I've found a number of benefits with managing spatial data in a corporate
> database environment. These comments apply to both vector and image  data.
> I'm sure that these comments are equally pertinent to most RDBMS
> maintained spatial data. Some examples are:

Just to make sure: "corporate database environment" refers to any database software, not just Oracle? SDE is not a corporate database environment or do you see it as a part of it? I can follow and underline the arguments related to holding vector data in a database but still fail to understand the need for rasters (which is probably mainly due to my ignorance).

> - Within a large organisation, it is possible to get rid of most of the
> islands of data that are hidden in a wide variety of departments. If
> implemented right, people come to see the database as the authoritive
> source of their data and respect it as such.

Yes. But you would not want to have people talk to the database (ugh - SQL) but rather to a service. Give people a link to a service instead of a file to store on their own machine. There is no explicit need for a database, just encapsulate whatever you have behind a service. Users don't care what is behind the service, it can be a database or a lump of files on a SAN.

> - This can remove the situation where you get multiple copies of the same
> dataset around your organisation, with different people making their own
> independent edits to the data and expecting someone to reconcile the edits
> with the authoritative data set at a later time (if you're lucky).

Absolutely. But I fail to see the need to store rasters in a database arise from this argument.

> - It can also remove the situation where someone takes a copy of a
> critical data set and does not update it for several years, leaving
> business people making critical decisions on suspect data.
>
> - You can start managing your data for a given geographic phenomena as a
> single entity covering a large geographic region, without having to resort
> to tiles and all the related edge matching problems that we had in the
> past (e.g. mismatching pixels, lines, polygons that just end at the tile
> boundary or have an incorrect attibute on the matching sheet etc).

I am probably too vector oriented to understand the problems involved here but my experience is that there should be no issue if you have your services configured all right.

> - Some of the biggest advantages though, come from the corporate IT
> support that you come to rely on, e.g. regular backups, large disk
> capacity on fast SAN devices, secure access to data by authorised
> custodians, redundant databases for disaster recovery, point in time
> restoration of data etc.

But there is no difference whether you back up files or a database. Well, actually there is a difference. Backing up files is a lot easier, more robust and much much better scalable than cramming everything into one or even several databases and trying to replicate that once it needs to be scaled up.

> To date, I have not found a suitable solution for managing imagery that
> includes multi and hyper spectral data in a database. But I'm looking.
>
>
>
> Ideally the solution will use open data formats for storage and delivery.
> I'm getting sick of having to redevelop corporate applications with the
> same functionality because a vendor has decided to change their technology
> and data formats. This results in a lot of wasted time and money that
> would be better used implementing effective decision support tools that
> allow businesses to better understand and exploit their data.

I can hear you loud and clear.

Best regards,
Arnulf

>> Some more recent raster in db discussion here:
>> http://spatialgalaxy.net/2008/02/15/rasters-in-the-database-why-bother/
>>
>
> Thanks for the heads-up Tyler. I obviously don't know what I'm talking
> about ;-) (eh Tim?).
>
>
> Bruce
>
>
>
>
> Notice:
> This email and any attachments may contain information that is personal,
> confidential, legally privileged and/or copyright.No part of it should be reproduced,
> adapted or communicated without the prior written consent of the copyright owner.
>
> It is the responsibility of the recipient to check for and remove viruses.
> If you have received this email in error, please notify the sender by return email, delete
> it from your system and destroy any copies. You are not authorised to use, communicate or rely on the information
> contained in this email.
>
> Please consider the environment before printing this email.
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Discuss mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/discuss

_______________________________________________
Discuss mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/discuss
Reply | Threaded
Open this post in threaded view
|

RE: Image Management in an RDBMS...(was OS Spatialenvironment 'sizing')

Fawcett, David (MNIT)


>> - Some of the biggest advantages though, come from the corporate IT
>> support that you come to rely on, e.g. regular backups, large disk
>> capacity on fast SAN devices, secure access to data by authorised
>> custodians, redundant databases for disaster recovery, point in time
>> restoration of data etc.

>But there is no difference whether you back up files or a database.
Well, actually there is a         >difference. Backing up files is a lot
easier, more robust and much much better scalable than cramming
>everything into one or even several databases and trying to replicate
that once it needs to be scaled >up.

It is a little ironic that their Secret Sauce Du Jour, Map Caching, is
just pre-rendering the images and storing them on the file system.  ; /
_______________________________________________
Discuss mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/discuss
123