Guidelines for netCDF file and opendap accesss within pywps

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Guidelines for netCDF file and opendap accesss within pywps

David Huard
Hi all, 

I'd like to contribute a pull request to better handle netCDF files in pywps but I don't know where to start. 

We have a number of processes taking netCDF files as inputs. For those less familiar with the format, netCDF is based on HDF5 and a set of conventions. It is the standard data format in oceanography and climatology. netCDF files are usually stored on servers with support for opendap. This means that users can either download the netCDF file and then open it locally, or use the opendap protocol to open it remotely. What that means is that you can do 

from netCDF4 import nc
ds1 = nc.Dataset("<path to local file>")
ds2 = nc.Dataset("<link to opendap address>")

and both ds1 and ds2 will behave identically. However ds2 is not downloaded locally, but rather read remotely on demand. If a file contains a 3D matrix (time, lat, lon), you can read one slice of the matrix without downloading it all. 

Some of our pywps.Process support both netCDF file and opendap access. We define a ComplexInput for the address to an actual netCDF file, and a LiteralInput for the opendap address.

My question is whether there would be a clean way for pywps to support both modes with one ComplexInput? Internally, pywps would check if the address supports opendap (just check if nc.Dataset(url) works), and if not, would download the file locally to the server. 

In both cases, we could do 

ds = nc.Dataset(requests.inputs['resource'][0].file)

I'm willing to put the time to do it, I just don't know where to start. 

Thanks,

David








_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev
Reply | Threaded
Open this post in threaded view
|

Re: Guidelines for netCDF file and opendap accesss within pywps

Jachym Cepicky
Hi,

yes ComplexInput should work for you - you can pass the url with the data using "<Reference ... />" element.. see [1] for example

Any Format can have (and has by default) `validator` function, which return's, whether the input data are valid or no [3]. You can also use `get_format` function [4] and set the validator there.

Example, how validating function can look can be shapefile or gml validators [5]

You should probably extend foramts [2] with NetCDF mimetype

But, this will check the file only after it was downloaded to PyWPS - not the URL. Still. is that sufficient?

Jachym




čt 21. 6. 2018 v 17:15 odesílatel David Huard <[hidden email]> napsal:
Hi all, 

I'd like to contribute a pull request to better handle netCDF files in pywps but I don't know where to start. 

We have a number of processes taking netCDF files as inputs. For those less familiar with the format, netCDF is based on HDF5 and a set of conventions. It is the standard data format in oceanography and climatology. netCDF files are usually stored on servers with support for opendap. This means that users can either download the netCDF file and then open it locally, or use the opendap protocol to open it remotely. What that means is that you can do 

from netCDF4 import nc
ds1 = nc.Dataset("<path to local file>")
ds2 = nc.Dataset("<link to opendap address>")

and both ds1 and ds2 will behave identically. However ds2 is not downloaded locally, but rather read remotely on demand. If a file contains a 3D matrix (time, lat, lon), you can read one slice of the matrix without downloading it all. 

Some of our pywps.Process support both netCDF file and opendap access. We define a ComplexInput for the address to an actual netCDF file, and a LiteralInput for the opendap address.

My question is whether there would be a clean way for pywps to support both modes with one ComplexInput? Internally, pywps would check if the address supports opendap (just check if nc.Dataset(url) works), and if not, would download the file locally to the server. 

In both cases, we could do 

ds = nc.Dataset(requests.inputs['resource'][0].file)

I'm willing to put the time to do it, I just don't know where to start. 

Thanks,

David







_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev


--
Jachym Cepicky
e-mail: jachym.cepicky gmail com
URL: http://les-ejk.cz
GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp

_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev
Reply | Threaded
Open this post in threaded view
|

Re: Guidelines for netCDF file and opendap accesss within pywps

David Huard
Hi Jachym, 

Thanks for the pointers, I've started writing validators for netCDF. I'm still wondering where the decision to download a file is made? Can I shortcut that decision and avoid a file download if the href is a valid opendap link, ie it passes the validatenetcdf checks?


On Fri, Jun 22, 2018 at 4:53 AM Jachym Cepicky <[hidden email]> wrote:
Hi,

yes ComplexInput should work for you - you can pass the url with the data using "<Reference ... />" element.. see [1] for example

Any Format can have (and has by default) `validator` function, which return's, whether the input data are valid or no [3]. You can also use `get_format` function [4] and set the validator there.

Example, how validating function can look can be shapefile or gml validators [5]

You should probably extend foramts [2] with NetCDF mimetype

But, this will check the file only after it was downloaded to PyWPS - not the URL. Still. is that sufficient?

Jachym




čt 21. 6. 2018 v 17:15 odesílatel David Huard <[hidden email]> napsal:
Hi all, 

I'd like to contribute a pull request to better handle netCDF files in pywps but I don't know where to start. 

We have a number of processes taking netCDF files as inputs. For those less familiar with the format, netCDF is based on HDF5 and a set of conventions. It is the standard data format in oceanography and climatology. netCDF files are usually stored on servers with support for opendap. This means that users can either download the netCDF file and then open it locally, or use the opendap protocol to open it remotely. What that means is that you can do 

from netCDF4 import nc
ds1 = nc.Dataset("<path to local file>")
ds2 = nc.Dataset("<link to opendap address>")

and both ds1 and ds2 will behave identically. However ds2 is not downloaded locally, but rather read remotely on demand. If a file contains a 3D matrix (time, lat, lon), you can read one slice of the matrix without downloading it all. 

Some of our pywps.Process support both netCDF file and opendap access. We define a ComplexInput for the address to an actual netCDF file, and a LiteralInput for the opendap address.

My question is whether there would be a clean way for pywps to support both modes with one ComplexInput? Internally, pywps would check if the address supports opendap (just check if nc.Dataset(url) works), and if not, would download the file locally to the server. 

In both cases, we could do 

ds = nc.Dataset(requests.inputs['resource'][0].file)

I'm willing to put the time to do it, I just don't know where to start. 

Thanks,

David







_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev


--
Jachym Cepicky
e-mail: jachym.cepicky gmail com
URL: http://les-ejk.cz
GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev

_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev
Reply | Threaded
Open this post in threaded view
|

Re: Guidelines for netCDF file and opendap accesss within pywps

Jachym Cepicky
I belive,


út 26. 6. 2018 v 14:57 odesílatel David Huard <[hidden email]> napsal:
Hi Jachym, 

Thanks for the pointers, I've started writing validators for netCDF. I'm still wondering where the decision to download a file is made? Can I shortcut that decision and avoid a file download if the href is a valid opendap link, ie it passes the validatenetcdf checks?


On Fri, Jun 22, 2018 at 4:53 AM Jachym Cepicky <[hidden email]> wrote:
Hi,

yes ComplexInput should work for you - you can pass the url with the data using "<Reference ... />" element.. see [1] for example

Any Format can have (and has by default) `validator` function, which return's, whether the input data are valid or no [3]. You can also use `get_format` function [4] and set the validator there.

Example, how validating function can look can be shapefile or gml validators [5]

You should probably extend foramts [2] with NetCDF mimetype

But, this will check the file only after it was downloaded to PyWPS - not the URL. Still. is that sufficient?

Jachym




čt 21. 6. 2018 v 17:15 odesílatel David Huard <[hidden email]> napsal:
Hi all, 

I'd like to contribute a pull request to better handle netCDF files in pywps but I don't know where to start. 

We have a number of processes taking netCDF files as inputs. For those less familiar with the format, netCDF is based on HDF5 and a set of conventions. It is the standard data format in oceanography and climatology. netCDF files are usually stored on servers with support for opendap. This means that users can either download the netCDF file and then open it locally, or use the opendap protocol to open it remotely. What that means is that you can do 

from netCDF4 import nc
ds1 = nc.Dataset("<path to local file>")
ds2 = nc.Dataset("<link to opendap address>")

and both ds1 and ds2 will behave identically. However ds2 is not downloaded locally, but rather read remotely on demand. If a file contains a 3D matrix (time, lat, lon), you can read one slice of the matrix without downloading it all. 

Some of our pywps.Process support both netCDF file and opendap access. We define a ComplexInput for the address to an actual netCDF file, and a LiteralInput for the opendap address.

My question is whether there would be a clean way for pywps to support both modes with one ComplexInput? Internally, pywps would check if the address supports opendap (just check if nc.Dataset(url) works), and if not, would download the file locally to the server. 

In both cases, we could do 

ds = nc.Dataset(requests.inputs['resource'][0].file)

I'm willing to put the time to do it, I just don't know where to start. 

Thanks,

David







_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev


--
Jachym Cepicky
e-mail: jachym.cepicky gmail com
URL: http://les-ejk.cz
GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev


--
Jachym Cepicky
e-mail: jachym.cepicky gmail com
URL: http://les-ejk.cz
GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp

_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev
Reply | Threaded
Open this post in threaded view
|

Re: Guidelines for netCDF file and opendap accesss within pywps

David Huard
Thanks !

I'll look at it and come back with a PR. 

On Tue, Jun 26, 2018 at 9:16 AM Jachym Cepicky <[hidden email]> wrote:
I belive,


út 26. 6. 2018 v 14:57 odesílatel David Huard <[hidden email]> napsal:
Hi Jachym, 

Thanks for the pointers, I've started writing validators for netCDF. I'm still wondering where the decision to download a file is made? Can I shortcut that decision and avoid a file download if the href is a valid opendap link, ie it passes the validatenetcdf checks?


On Fri, Jun 22, 2018 at 4:53 AM Jachym Cepicky <[hidden email]> wrote:
Hi,

yes ComplexInput should work for you - you can pass the url with the data using "<Reference ... />" element.. see [1] for example

Any Format can have (and has by default) `validator` function, which return's, whether the input data are valid or no [3]. You can also use `get_format` function [4] and set the validator there.

Example, how validating function can look can be shapefile or gml validators [5]

You should probably extend foramts [2] with NetCDF mimetype

But, this will check the file only after it was downloaded to PyWPS - not the URL. Still. is that sufficient?

Jachym




čt 21. 6. 2018 v 17:15 odesílatel David Huard <[hidden email]> napsal:
Hi all, 

I'd like to contribute a pull request to better handle netCDF files in pywps but I don't know where to start. 

We have a number of processes taking netCDF files as inputs. For those less familiar with the format, netCDF is based on HDF5 and a set of conventions. It is the standard data format in oceanography and climatology. netCDF files are usually stored on servers with support for opendap. This means that users can either download the netCDF file and then open it locally, or use the opendap protocol to open it remotely. What that means is that you can do 

from netCDF4 import nc
ds1 = nc.Dataset("<path to local file>")
ds2 = nc.Dataset("<link to opendap address>")

and both ds1 and ds2 will behave identically. However ds2 is not downloaded locally, but rather read remotely on demand. If a file contains a 3D matrix (time, lat, lon), you can read one slice of the matrix without downloading it all. 

Some of our pywps.Process support both netCDF file and opendap access. We define a ComplexInput for the address to an actual netCDF file, and a LiteralInput for the opendap address.

My question is whether there would be a clean way for pywps to support both modes with one ComplexInput? Internally, pywps would check if the address supports opendap (just check if nc.Dataset(url) works), and if not, would download the file locally to the server. 

In both cases, we could do 

ds = nc.Dataset(requests.inputs['resource'][0].file)

I'm willing to put the time to do it, I just don't know where to start. 

Thanks,

David







_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev


--
Jachym Cepicky
e-mail: jachym.cepicky gmail com
URL: http://les-ejk.cz
GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev


--
Jachym Cepicky
e-mail: jachym.cepicky gmail com
URL: http://les-ejk.cz
GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev

_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev
Reply | Threaded
Open this post in threaded view
|

Re: Guidelines for netCDF file and opendap accesss within pywps

David Huard
I've got something working, but it's not pretty... If the input mime type is application/x-ogc-dods, the href handler skips the downloads and assigns the link to the `data` attribute. If I ask for the file attribute, pywps will download the file locally. 

Now if I want my process to support both netCDF files and opendap link, for file input it's the file_handler that'll set the file attribute, but then the data attribute will hold the actual file's content, not the path to the file. I guess I could special case the netcdf mime type in the file_handler to set data to the file path, but it feels clunky. 

I'm wondering if anyone has a better design idea in mind, that could extend gracefully to other mime types? Should ComplexInput be subclassed by mimetype, so that the file, stream and data handling as well as validation is encapsulated in a class ? 

One problem I can see cropping up is that as pywps extends support for other "special" mimetypes, the dependencies will become harder to maintain. Indeed, the netcdfvalidator requires netCDF4 to be installed, which is not a light dependency. My guess is that pywps should support out of the box the "light" mime types, and have a plugin mechanism for more complicated ones.  

David
 

On Tue, Jun 26, 2018 at 9:23 AM David Huard <[hidden email]> wrote:
Thanks !

I'll look at it and come back with a PR. 

On Tue, Jun 26, 2018 at 9:16 AM Jachym Cepicky <[hidden email]> wrote:
I belive,


út 26. 6. 2018 v 14:57 odesílatel David Huard <[hidden email]> napsal:
Hi Jachym, 

Thanks for the pointers, I've started writing validators for netCDF. I'm still wondering where the decision to download a file is made? Can I shortcut that decision and avoid a file download if the href is a valid opendap link, ie it passes the validatenetcdf checks?


On Fri, Jun 22, 2018 at 4:53 AM Jachym Cepicky <[hidden email]> wrote:
Hi,

yes ComplexInput should work for you - you can pass the url with the data using "<Reference ... />" element.. see [1] for example

Any Format can have (and has by default) `validator` function, which return's, whether the input data are valid or no [3]. You can also use `get_format` function [4] and set the validator there.

Example, how validating function can look can be shapefile or gml validators [5]

You should probably extend foramts [2] with NetCDF mimetype

But, this will check the file only after it was downloaded to PyWPS - not the URL. Still. is that sufficient?

Jachym




čt 21. 6. 2018 v 17:15 odesílatel David Huard <[hidden email]> napsal:
Hi all, 

I'd like to contribute a pull request to better handle netCDF files in pywps but I don't know where to start. 

We have a number of processes taking netCDF files as inputs. For those less familiar with the format, netCDF is based on HDF5 and a set of conventions. It is the standard data format in oceanography and climatology. netCDF files are usually stored on servers with support for opendap. This means that users can either download the netCDF file and then open it locally, or use the opendap protocol to open it remotely. What that means is that you can do 

from netCDF4 import nc
ds1 = nc.Dataset("<path to local file>")
ds2 = nc.Dataset("<link to opendap address>")

and both ds1 and ds2 will behave identically. However ds2 is not downloaded locally, but rather read remotely on demand. If a file contains a 3D matrix (time, lat, lon), you can read one slice of the matrix without downloading it all. 

Some of our pywps.Process support both netCDF file and opendap access. We define a ComplexInput for the address to an actual netCDF file, and a LiteralInput for the opendap address.

My question is whether there would be a clean way for pywps to support both modes with one ComplexInput? Internally, pywps would check if the address supports opendap (just check if nc.Dataset(url) works), and if not, would download the file locally to the server. 

In both cases, we could do 

ds = nc.Dataset(requests.inputs['resource'][0].file)

I'm willing to put the time to do it, I just don't know where to start. 

Thanks,

David







_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev


--
Jachym Cepicky
e-mail: jachym.cepicky gmail com
URL: http://les-ejk.cz
GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev


--
Jachym Cepicky
e-mail: jachym.cepicky gmail com
URL: http://les-ejk.cz
GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev

_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev
Reply | Threaded
Open this post in threaded view
|

Re: Guidelines for netCDF file and opendap accesss within pywps

Jachym Cepicky
Hi David,
I do not have much insight view to netCDF format and opendap. I can imagine, that beside current validators, which do validate on-drive-available files, we could add some pre_fetch validators too.


If I understand correctly, PyWPS first parses the request and makes WPSRequest object, then, based on this structure, Process instance along with in- and outputs is contstructed. We need to rewrite pywps, so it does not download data [2] and then the file object is set to the complex input

We could add set_url and get_url setter and getter methods to IOHandler, which could behave like set_file and get_file or set_data and get_data (and memory_object), which could implement the special behaviour ?

J



st 27. 6. 2018 v 14:48 odesílatel David Huard <[hidden email]> napsal:
I've got something working, but it's not pretty... If the input mime type is application/x-ogc-dods, the href handler skips the downloads and assigns the link to the `data` attribute. If I ask for the file attribute, pywps will download the file locally. 

Now if I want my process to support both netCDF files and opendap link, for file input it's the file_handler that'll set the file attribute, but then the data attribute will hold the actual file's content, not the path to the file. I guess I could special case the netcdf mime type in the file_handler to set data to the file path, but it feels clunky. 

I'm wondering if anyone has a better design idea in mind, that could extend gracefully to other mime types? Should ComplexInput be subclassed by mimetype, so that the file, stream and data handling as well as validation is encapsulated in a class ? 

One problem I can see cropping up is that as pywps extends support for other "special" mimetypes, the dependencies will become harder to maintain. Indeed, the netcdfvalidator requires netCDF4 to be installed, which is not a light dependency. My guess is that pywps should support out of the box the "light" mime types, and have a plugin mechanism for more complicated ones.  

David
 

On Tue, Jun 26, 2018 at 9:23 AM David Huard <[hidden email]> wrote:
Thanks !

I'll look at it and come back with a PR. 

On Tue, Jun 26, 2018 at 9:16 AM Jachym Cepicky <[hidden email]> wrote:
I belive,


út 26. 6. 2018 v 14:57 odesílatel David Huard <[hidden email]> napsal:
Hi Jachym, 

Thanks for the pointers, I've started writing validators for netCDF. I'm still wondering where the decision to download a file is made? Can I shortcut that decision and avoid a file download if the href is a valid opendap link, ie it passes the validatenetcdf checks?


On Fri, Jun 22, 2018 at 4:53 AM Jachym Cepicky <[hidden email]> wrote:
Hi,

yes ComplexInput should work for you - you can pass the url with the data using "<Reference ... />" element.. see [1] for example

Any Format can have (and has by default) `validator` function, which return's, whether the input data are valid or no [3]. You can also use `get_format` function [4] and set the validator there.

Example, how validating function can look can be shapefile or gml validators [5]

You should probably extend foramts [2] with NetCDF mimetype

But, this will check the file only after it was downloaded to PyWPS - not the URL. Still. is that sufficient?

Jachym




čt 21. 6. 2018 v 17:15 odesílatel David Huard <[hidden email]> napsal:
Hi all, 

I'd like to contribute a pull request to better handle netCDF files in pywps but I don't know where to start. 

We have a number of processes taking netCDF files as inputs. For those less familiar with the format, netCDF is based on HDF5 and a set of conventions. It is the standard data format in oceanography and climatology. netCDF files are usually stored on servers with support for opendap. This means that users can either download the netCDF file and then open it locally, or use the opendap protocol to open it remotely. What that means is that you can do 

from netCDF4 import nc
ds1 = nc.Dataset("<path to local file>")
ds2 = nc.Dataset("<link to opendap address>")

and both ds1 and ds2 will behave identically. However ds2 is not downloaded locally, but rather read remotely on demand. If a file contains a 3D matrix (time, lat, lon), you can read one slice of the matrix without downloading it all. 

Some of our pywps.Process support both netCDF file and opendap access. We define a ComplexInput for the address to an actual netCDF file, and a LiteralInput for the opendap address.

My question is whether there would be a clean way for pywps to support both modes with one ComplexInput? Internally, pywps would check if the address supports opendap (just check if nc.Dataset(url) works), and if not, would download the file locally to the server. 

In both cases, we could do 

ds = nc.Dataset(requests.inputs['resource'][0].file)

I'm willing to put the time to do it, I just don't know where to start. 

Thanks,

David







_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev


--
Jachym Cepicky
e-mail: jachym.cepicky gmail com
URL: http://les-ejk.cz
GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev


--
Jachym Cepicky
e-mail: jachym.cepicky gmail com
URL: http://les-ejk.cz
GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev


--
Jachym Cepicky
e-mail: jachym.cepicky gmail com
URL: http://les-ejk.cz
GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp

_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev
Reply | Threaded
Open this post in threaded view
|

Re: Guidelines for netCDF file and opendap accesss within pywps

David Huard
The URL idea sounds good. Will try it. 

How do you feel about the dependency issue though ? 

One option I've been playing around is to dynamically add methods to ComplexInput when the mimetype is discovered. That is, the various handlers (href, file, data) could be methods of a MimeInput class that can be specialized for different mimetypes. After create_complex_input determines the input's mimetype, instead of doing a source.clone(), would instead instantiate a mixin class combining ComplexInput and MimeInput. By creating a registry of mimetypes and their associated class, users could special case the handlers (and the validators) for mimetypes not supported out of the box by pywps. 

These functionalities could be provided as plugins, so that users would pip install pywps.netcdf to get the netcdf support. 





On Fri, Jun 29, 2018 at 10:42 AM Jachym Cepicky <[hidden email]> wrote:
Hi David,
I do not have much insight view to netCDF format and opendap. I can imagine, that beside current validators, which do validate on-drive-available files, we could add some pre_fetch validators too.


If I understand correctly, PyWPS first parses the request and makes WPSRequest object, then, based on this structure, Process instance along with in- and outputs is contstructed. We need to rewrite pywps, so it does not download data [2] and then the file object is set to the complex input

We could add set_url and get_url setter and getter methods to IOHandler, which could behave like set_file and get_file or set_data and get_data (and memory_object), which could implement the special behaviour ?

J



st 27. 6. 2018 v 14:48 odesílatel David Huard <[hidden email]> napsal:
I've got something working, but it's not pretty... If the input mime type is application/x-ogc-dods, the href handler skips the downloads and assigns the link to the `data` attribute. If I ask for the file attribute, pywps will download the file locally. 

Now if I want my process to support both netCDF files and opendap link, for file input it's the file_handler that'll set the file attribute, but then the data attribute will hold the actual file's content, not the path to the file. I guess I could special case the netcdf mime type in the file_handler to set data to the file path, but it feels clunky. 

I'm wondering if anyone has a better design idea in mind, that could extend gracefully to other mime types? Should ComplexInput be subclassed by mimetype, so that the file, stream and data handling as well as validation is encapsulated in a class ? 

One problem I can see cropping up is that as pywps extends support for other "special" mimetypes, the dependencies will become harder to maintain. Indeed, the netcdfvalidator requires netCDF4 to be installed, which is not a light dependency. My guess is that pywps should support out of the box the "light" mime types, and have a plugin mechanism for more complicated ones.  

David
 

On Tue, Jun 26, 2018 at 9:23 AM David Huard <[hidden email]> wrote:
Thanks !

I'll look at it and come back with a PR. 

On Tue, Jun 26, 2018 at 9:16 AM Jachym Cepicky <[hidden email]> wrote:
I belive,


út 26. 6. 2018 v 14:57 odesílatel David Huard <[hidden email]> napsal:
Hi Jachym, 

Thanks for the pointers, I've started writing validators for netCDF. I'm still wondering where the decision to download a file is made? Can I shortcut that decision and avoid a file download if the href is a valid opendap link, ie it passes the validatenetcdf checks?


On Fri, Jun 22, 2018 at 4:53 AM Jachym Cepicky <[hidden email]> wrote:
Hi,

yes ComplexInput should work for you - you can pass the url with the data using "<Reference ... />" element.. see [1] for example

Any Format can have (and has by default) `validator` function, which return's, whether the input data are valid or no [3]. You can also use `get_format` function [4] and set the validator there.

Example, how validating function can look can be shapefile or gml validators [5]

You should probably extend foramts [2] with NetCDF mimetype

But, this will check the file only after it was downloaded to PyWPS - not the URL. Still. is that sufficient?

Jachym




čt 21. 6. 2018 v 17:15 odesílatel David Huard <[hidden email]> napsal:
Hi all, 

I'd like to contribute a pull request to better handle netCDF files in pywps but I don't know where to start. 

We have a number of processes taking netCDF files as inputs. For those less familiar with the format, netCDF is based on HDF5 and a set of conventions. It is the standard data format in oceanography and climatology. netCDF files are usually stored on servers with support for opendap. This means that users can either download the netCDF file and then open it locally, or use the opendap protocol to open it remotely. What that means is that you can do 

from netCDF4 import nc
ds1 = nc.Dataset("<path to local file>")
ds2 = nc.Dataset("<link to opendap address>")

and both ds1 and ds2 will behave identically. However ds2 is not downloaded locally, but rather read remotely on demand. If a file contains a 3D matrix (time, lat, lon), you can read one slice of the matrix without downloading it all. 

Some of our pywps.Process support both netCDF file and opendap access. We define a ComplexInput for the address to an actual netCDF file, and a LiteralInput for the opendap address.

My question is whether there would be a clean way for pywps to support both modes with one ComplexInput? Internally, pywps would check if the address supports opendap (just check if nc.Dataset(url) works), and if not, would download the file locally to the server. 

In both cases, we could do 

ds = nc.Dataset(requests.inputs['resource'][0].file)

I'm willing to put the time to do it, I just don't know where to start. 

Thanks,

David







_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev


--
Jachym Cepicky
e-mail: jachym.cepicky gmail com
URL: http://les-ejk.cz
GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev


--
Jachym Cepicky
e-mail: jachym.cepicky gmail com
URL: http://les-ejk.cz
GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev


--
Jachym Cepicky
e-mail: jachym.cepicky gmail com
URL: http://les-ejk.cz
GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev

_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev
Reply | Threaded
Open this post in threaded view
|

Re: Guidelines for netCDF file and opendap accesss within pywps

Jachym Cepicky
Hmm,

I actaully like the idea, that Format [1] would "serve" the file/data/memory_object methods to the input (and output). This btw could be used for future implementation of WFS/WCS services as output option.

Registry of mimetypes should be in pywps/inout/formats/__init__.py [2] and we have GSoC student Jan Pišl, working on it's extension too [3], so this would fit IMHO

J


pá 29. 6. 2018 v 17:03 odesílatel David Huard <[hidden email]> napsal:
The URL idea sounds good. Will try it. 

How do you feel about the dependency issue though ? 

One option I've been playing around is to dynamically add methods to ComplexInput when the mimetype is discovered. That is, the various handlers (href, file, data) could be methods of a MimeInput class that can be specialized for different mimetypes. After create_complex_input determines the input's mimetype, instead of doing a source.clone(), would instead instantiate a mixin class combining ComplexInput and MimeInput. By creating a registry of mimetypes and their associated class, users could special case the handlers (and the validators) for mimetypes not supported out of the box by pywps. 

These functionalities could be provided as plugins, so that users would pip install pywps.netcdf to get the netcdf support. 





On Fri, Jun 29, 2018 at 10:42 AM Jachym Cepicky <[hidden email]> wrote:
Hi David,
I do not have much insight view to netCDF format and opendap. I can imagine, that beside current validators, which do validate on-drive-available files, we could add some pre_fetch validators too.


If I understand correctly, PyWPS first parses the request and makes WPSRequest object, then, based on this structure, Process instance along with in- and outputs is contstructed. We need to rewrite pywps, so it does not download data [2] and then the file object is set to the complex input

We could add set_url and get_url setter and getter methods to IOHandler, which could behave like set_file and get_file or set_data and get_data (and memory_object), which could implement the special behaviour ?

J



st 27. 6. 2018 v 14:48 odesílatel David Huard <[hidden email]> napsal:
I've got something working, but it's not pretty... If the input mime type is application/x-ogc-dods, the href handler skips the downloads and assigns the link to the `data` attribute. If I ask for the file attribute, pywps will download the file locally. 

Now if I want my process to support both netCDF files and opendap link, for file input it's the file_handler that'll set the file attribute, but then the data attribute will hold the actual file's content, not the path to the file. I guess I could special case the netcdf mime type in the file_handler to set data to the file path, but it feels clunky. 

I'm wondering if anyone has a better design idea in mind, that could extend gracefully to other mime types? Should ComplexInput be subclassed by mimetype, so that the file, stream and data handling as well as validation is encapsulated in a class ? 

One problem I can see cropping up is that as pywps extends support for other "special" mimetypes, the dependencies will become harder to maintain. Indeed, the netcdfvalidator requires netCDF4 to be installed, which is not a light dependency. My guess is that pywps should support out of the box the "light" mime types, and have a plugin mechanism for more complicated ones.  

David
 

On Tue, Jun 26, 2018 at 9:23 AM David Huard <[hidden email]> wrote:
Thanks !

I'll look at it and come back with a PR. 

On Tue, Jun 26, 2018 at 9:16 AM Jachym Cepicky <[hidden email]> wrote:
I belive,


út 26. 6. 2018 v 14:57 odesílatel David Huard <[hidden email]> napsal:
Hi Jachym, 

Thanks for the pointers, I've started writing validators for netCDF. I'm still wondering where the decision to download a file is made? Can I shortcut that decision and avoid a file download if the href is a valid opendap link, ie it passes the validatenetcdf checks?


On Fri, Jun 22, 2018 at 4:53 AM Jachym Cepicky <[hidden email]> wrote:
Hi,

yes ComplexInput should work for you - you can pass the url with the data using "<Reference ... />" element.. see [1] for example

Any Format can have (and has by default) `validator` function, which return's, whether the input data are valid or no [3]. You can also use `get_format` function [4] and set the validator there.

Example, how validating function can look can be shapefile or gml validators [5]

You should probably extend foramts [2] with NetCDF mimetype

But, this will check the file only after it was downloaded to PyWPS - not the URL. Still. is that sufficient?

Jachym




čt 21. 6. 2018 v 17:15 odesílatel David Huard <[hidden email]> napsal:
Hi all, 

I'd like to contribute a pull request to better handle netCDF files in pywps but I don't know where to start. 

We have a number of processes taking netCDF files as inputs. For those less familiar with the format, netCDF is based on HDF5 and a set of conventions. It is the standard data format in oceanography and climatology. netCDF files are usually stored on servers with support for opendap. This means that users can either download the netCDF file and then open it locally, or use the opendap protocol to open it remotely. What that means is that you can do 

from netCDF4 import nc
ds1 = nc.Dataset("<path to local file>")
ds2 = nc.Dataset("<link to opendap address>")

and both ds1 and ds2 will behave identically. However ds2 is not downloaded locally, but rather read remotely on demand. If a file contains a 3D matrix (time, lat, lon), you can read one slice of the matrix without downloading it all. 

Some of our pywps.Process support both netCDF file and opendap access. We define a ComplexInput for the address to an actual netCDF file, and a LiteralInput for the opendap address.

My question is whether there would be a clean way for pywps to support both modes with one ComplexInput? Internally, pywps would check if the address supports opendap (just check if nc.Dataset(url) works), and if not, would download the file locally to the server. 

In both cases, we could do 

ds = nc.Dataset(requests.inputs['resource'][0].file)

I'm willing to put the time to do it, I just don't know where to start. 

Thanks,

David







_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev


--
Jachym Cepicky
e-mail: jachym.cepicky gmail com
URL: http://les-ejk.cz
GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev


--
Jachym Cepicky
e-mail: jachym.cepicky gmail com
URL: http://les-ejk.cz
GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev


--
Jachym Cepicky
e-mail: jachym.cepicky gmail com
URL: http://les-ejk.cz
GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev


--
Jachym Cepicky
e-mail: jachym.cepicky gmail com
URL: http://les-ejk.cz
GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp

_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev
Reply | Threaded
Open this post in threaded view
|

Re: Guidelines for netCDF file and opendap accesss within pywps

David Huard
Excellent. Will do. 

On Fri, Jun 29, 2018 at 11:34 AM Jachym Cepicky <[hidden email]> wrote:
Hmm,

I actaully like the idea, that Format [1] would "serve" the file/data/memory_object methods to the input (and output). This btw could be used for future implementation of WFS/WCS services as output option.

Registry of mimetypes should be in pywps/inout/formats/__init__.py [2] and we have GSoC student Jan Pišl, working on it's extension too [3], so this would fit IMHO

J


pá 29. 6. 2018 v 17:03 odesílatel David Huard <[hidden email]> napsal:
The URL idea sounds good. Will try it. 

How do you feel about the dependency issue though ? 

One option I've been playing around is to dynamically add methods to ComplexInput when the mimetype is discovered. That is, the various handlers (href, file, data) could be methods of a MimeInput class that can be specialized for different mimetypes. After create_complex_input determines the input's mimetype, instead of doing a source.clone(), would instead instantiate a mixin class combining ComplexInput and MimeInput. By creating a registry of mimetypes and their associated class, users could special case the handlers (and the validators) for mimetypes not supported out of the box by pywps. 

These functionalities could be provided as plugins, so that users would pip install pywps.netcdf to get the netcdf support. 





On Fri, Jun 29, 2018 at 10:42 AM Jachym Cepicky <[hidden email]> wrote:
Hi David,
I do not have much insight view to netCDF format and opendap. I can imagine, that beside current validators, which do validate on-drive-available files, we could add some pre_fetch validators too.


If I understand correctly, PyWPS first parses the request and makes WPSRequest object, then, based on this structure, Process instance along with in- and outputs is contstructed. We need to rewrite pywps, so it does not download data [2] and then the file object is set to the complex input

We could add set_url and get_url setter and getter methods to IOHandler, which could behave like set_file and get_file or set_data and get_data (and memory_object), which could implement the special behaviour ?

J



st 27. 6. 2018 v 14:48 odesílatel David Huard <[hidden email]> napsal:
I've got something working, but it's not pretty... If the input mime type is application/x-ogc-dods, the href handler skips the downloads and assigns the link to the `data` attribute. If I ask for the file attribute, pywps will download the file locally. 

Now if I want my process to support both netCDF files and opendap link, for file input it's the file_handler that'll set the file attribute, but then the data attribute will hold the actual file's content, not the path to the file. I guess I could special case the netcdf mime type in the file_handler to set data to the file path, but it feels clunky. 

I'm wondering if anyone has a better design idea in mind, that could extend gracefully to other mime types? Should ComplexInput be subclassed by mimetype, so that the file, stream and data handling as well as validation is encapsulated in a class ? 

One problem I can see cropping up is that as pywps extends support for other "special" mimetypes, the dependencies will become harder to maintain. Indeed, the netcdfvalidator requires netCDF4 to be installed, which is not a light dependency. My guess is that pywps should support out of the box the "light" mime types, and have a plugin mechanism for more complicated ones.  

David
 

On Tue, Jun 26, 2018 at 9:23 AM David Huard <[hidden email]> wrote:
Thanks !

I'll look at it and come back with a PR. 

On Tue, Jun 26, 2018 at 9:16 AM Jachym Cepicky <[hidden email]> wrote:
I belive,


út 26. 6. 2018 v 14:57 odesílatel David Huard <[hidden email]> napsal:
Hi Jachym, 

Thanks for the pointers, I've started writing validators for netCDF. I'm still wondering where the decision to download a file is made? Can I shortcut that decision and avoid a file download if the href is a valid opendap link, ie it passes the validatenetcdf checks?


On Fri, Jun 22, 2018 at 4:53 AM Jachym Cepicky <[hidden email]> wrote:
Hi,

yes ComplexInput should work for you - you can pass the url with the data using "<Reference ... />" element.. see [1] for example

Any Format can have (and has by default) `validator` function, which return's, whether the input data are valid or no [3]. You can also use `get_format` function [4] and set the validator there.

Example, how validating function can look can be shapefile or gml validators [5]

You should probably extend foramts [2] with NetCDF mimetype

But, this will check the file only after it was downloaded to PyWPS - not the URL. Still. is that sufficient?

Jachym




čt 21. 6. 2018 v 17:15 odesílatel David Huard <[hidden email]> napsal:
Hi all, 

I'd like to contribute a pull request to better handle netCDF files in pywps but I don't know where to start. 

We have a number of processes taking netCDF files as inputs. For those less familiar with the format, netCDF is based on HDF5 and a set of conventions. It is the standard data format in oceanography and climatology. netCDF files are usually stored on servers with support for opendap. This means that users can either download the netCDF file and then open it locally, or use the opendap protocol to open it remotely. What that means is that you can do 

from netCDF4 import nc
ds1 = nc.Dataset("<path to local file>")
ds2 = nc.Dataset("<link to opendap address>")

and both ds1 and ds2 will behave identically. However ds2 is not downloaded locally, but rather read remotely on demand. If a file contains a 3D matrix (time, lat, lon), you can read one slice of the matrix without downloading it all. 

Some of our pywps.Process support both netCDF file and opendap access. We define a ComplexInput for the address to an actual netCDF file, and a LiteralInput for the opendap address.

My question is whether there would be a clean way for pywps to support both modes with one ComplexInput? Internally, pywps would check if the address supports opendap (just check if nc.Dataset(url) works), and if not, would download the file locally to the server. 

In both cases, we could do 

ds = nc.Dataset(requests.inputs['resource'][0].file)

I'm willing to put the time to do it, I just don't know where to start. 

Thanks,

David







_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev


--
Jachym Cepicky
e-mail: jachym.cepicky gmail com
URL: http://les-ejk.cz
GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev


--
Jachym Cepicky
e-mail: jachym.cepicky gmail com
URL: http://les-ejk.cz
GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev


--
Jachym Cepicky
e-mail: jachym.cepicky gmail com
URL: http://les-ejk.cz
GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev


--
Jachym Cepicky
e-mail: jachym.cepicky gmail com
URL: http://les-ejk.cz
GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev

_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev