Containerization PyWPS processes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Containerization PyWPS processes

Adam Laža
Hi devs,

I am student of geoinformatics at CTU in Prague. Currently I'm looking for my ma final thesis topic. Yesterday I met with Jachym and we discussed about containerization PyWPS processes (probably with Docker). It could be handy for killing/pausing a process which is as far as I know quite crucial in WPS 2.0.

I'd like to know if somebody already researched this posibility or whether you have any suggestion or advice.

Thanks in advance.
Adam

_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev
Reply | Threaded
Open this post in threaded view
|

Re: Containerization PyWPS processes

jorge.dejesus

Hi to all

Interresting research topic, but you have a problem with that approach:  starting the process will have a massive overhead (compared with a thread) and will consume alot of disk space and resources  !!!

You would have to create the docker image and process  when you install pywps , and then start the docker when the user calls the process.  I am a bit against using such a big system in PyWPS unless someone tries to implement it and run it and show or not that is feasable, we dont know it until we try it

Those were my 2cent :)

J.



On 20-09-17 15:41, Adam Laža wrote:
Hi devs,

I am student of geoinformatics at CTU in Prague. Currently I'm looking for my ma final thesis topic. Yesterday I met with Jachym and we discussed about containerization PyWPS processes (probably with Docker). It could be handy for killing/pausing a process which is as far as I know quite crucial in WPS 2.0.

I'd like to know if somebody already researched this posibility or whether you have any suggestion or advice.

Thanks in advance.
Adam


_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev


_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev
Reply | Threaded
Open this post in threaded view
|

Re: Containerization PyWPS processes

Jachym Cepicky
Hi,

I'm in touch with Adam. 

 It has big impact on the disc space, I agree - but afaik, it opens new possibilities (imagine, being able to deploy running job to e.g. open shift instaces ..) and other system resources impact should not  be that big?

What can I say, all the points raised by Jorge are valid - so let's give it a try?

J

st 20. 9. 2017 v 18:07 odesílatel jorge.dejesus <[hidden email]> napsal:

Hi to all

Interresting research topic, but you have a problem with that approach:  starting the process will have a massive overhead (compared with a thread) and will consume alot of disk space and resources  !!!

You would have to create the docker image and process  when you install pywps , and then start the docker when the user calls the process.  I am a bit against using such a big system in PyWPS unless someone tries to implement it and run it and show or not that is feasable, we dont know it until we try it

Those were my 2cent :)

J.



On 20-09-17 15:41, Adam Laža wrote:
Hi devs,

I am student of geoinformatics at CTU in Prague. Currently I'm looking for my ma final thesis topic. Yesterday I met with Jachym and we discussed about containerization PyWPS processes (probably with Docker). It could be handy for killing/pausing a process which is as far as I know quite crucial in WPS 2.0.

I'd like to know if somebody already researched this posibility or whether you have any suggestion or advice.

Thanks in advance.
Adam


_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev

_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev

_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev
Reply | Threaded
Open this post in threaded view
|

Re: Containerization PyWPS processes

jorge.dejesus

Hi to all

With new systems you have new problems but new possibilities, another possibility would be accountabilty, meaning a process (mianly scientific)  can be run, all the logs are inside the docker instance and then commit it (freeze it)  and  used by another person to check the logs and/or data.

Disk space in docker is a funny thing, can go between 6mb to 600mb in a blink of an eye by changing OS of not cleaning packages etc etc, so lot of effort has to be done in optimizing it.  Another advantage is that you can determine CPU and resources on docker therefore we have a very refined Job resrouces control

With the new support of Job batch we could extend things to run things in docker swarms.

A bit from experience.... docker systems need a a bit of "love and attention" in the beginning and then things run without problems, other issue is the extremely fast speed of docker development, you prepare things and the docker community makes some changes and everything  breaks, had situation once that docker-machine internally was calling some scripts for package update and during the night someone in the docker community made a small change and for a couple of hours your couldnt run docker-machine (until we discovered  it was a big problem).

If this project goes ahead I would ask if Geocat could sponsor it with working hours. 

Cheers
Jorge

On Wed, Sep 20, 2017 at 6:38 PM, Jachym Cepicky <[hidden email]> wrote:
Hi,

I'm in touch with Adam. 

 It has big impact on the disc space, I agree - but afaik, it opens new possibilities (imagine, being able to deploy running job to e.g. open shift instaces ..) and other system resources impact should not  be that big?

What can I say, all the points raised by Jorge are valid - so let's give it a try?

J

st 20. 9. 2017 v 18:07 odesílatel jorge.dejesus <[hidden email]> napsal:

Hi to all

Interresting research topic, but you have a problem with that approach:  starting the process will have a massive overhead (compared with a thread) and will consume alot of disk space and resources  !!!

You would have to create the docker image and process  when you install pywps , and then start the docker when the user calls the process.  I am a bit against using such a big system in PyWPS unless someone tries to implement it and run it and show or not that is feasable, we dont know it until we try it

Those were my 2cent :)

J.



On 20-09-17 15:41, Adam Laža wrote:
Hi devs,

I am student of geoinformatics at CTU in Prague. Currently I'm looking for my ma final thesis topic. Yesterday I met with Jachym and we discussed about containerization PyWPS processes (probably with Docker). It could be handy for killing/pausing a process which is as far as I know quite crucial in WPS 2.0.

I'd like to know if somebody already researched this posibility or whether you have any suggestion or advice.

Thanks in advance.
Adam


_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev

_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev


_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev
Reply | Threaded
Open this post in threaded view
|

Re: Containerization PyWPS processes

Carsten Ehbrecht-3
I find it useful to have the possibility to launch processes via Docker
containers. I would see this as an *optional extension* to PyWPS ... it
shouldn't be the default, like I have done it for the scheduler
extension. We would have then three ways to launch a processing job:

1. running locally on the PyWPS server (default).
2. launching a docker container.
3. using a batch scheduler system like Slurm and GridEngine.

Option 2) and 3) might need additional Python dependencies ... and of
course a lot more infrastructure around, which needs to be installed
separately.

The "cancel" function necessary for WPS 2.0.0 needs to be implemented
differently for each of these "job delegation" mechanisms. I have
started this for the "batch scheduler" extension ... just the interface,
no implementation yet:

https://github.com/bird-house/pywps/blob/issue-277_scheduler-extension-v2/pywps/processing/basic.py#L21

I haven't looked into it yet ... but I guess the Docker extension might
look similar to the scheduler extension ... in the way how it is handled
by the PyWPS code:

https://github.com/bird-house/pywps/blob/issue-277_scheduler-extension-v2/docs/extensions.rst

Cheers,
Carsten


On 09/21/2017 08:45 AM, Jorge Mendes de Jesus wrote:

>
> Hi to all
>
> With new systems you have new problems but new possibilities, another
> possibility would be accountabilty, meaning a process (mianly
> scientific)  can be run, all the logs are inside the docker instance and
> then commit it (freeze it)  and  used by another person to check the
> logs and/or data.
>
> Disk space in docker is a funny thing, can go between 6mb to 600mb in a
> blink of an eye by changing OS of not cleaning packages etc etc, so lot
> of effort has to be done in optimizing it.  Another advantage is that
> you can determine CPU and resources on docker therefore we have a very
> refined Job resrouces control
>
> With the new support of Job batch we could extend things to run things
> in docker swarms.
>
> A bit from experience.... docker systems need a a bit of "love and
> attention" in the beginning and then things run without problems, other
> issue is the extremely fast speed of docker development, you prepare
> things and the docker community makes some changes and everything 
> breaks, had situation once that docker-machine internally was calling
> some scripts for package update and during the night someone in the
> docker community made a small change and for a couple of hours your
> couldnt run docker-machine (until we discovered  it was a big problem).
>
> If this project goes ahead I would ask if Geocat could sponsor it with
> working hours. 
>
> Cheers
> Jorge
>
> On Wed, Sep 20, 2017 at 6:38 PM, Jachym Cepicky
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Hi,
>
>     I'm in touch with Adam. 
>
>      It has big impact on the disc space, I agree - but afaik, it opens
>     new possibilities (imagine, being able to deploy running job to e.g.
>     open shift instaces ..) and other system resources impact should not
>      be that big?
>
>     What can I say, all the points raised by Jorge are valid - so let's
>     give it a try?
>
>     J
>
>     st 20. 9. 2017 v 18:07 odesílatel jorge.dejesus
>     <[hidden email] <mailto:[hidden email]>> napsal:
>
>         Hi to all
>
>         Interresting research topic, but you have a problem with that
>         approach:  starting the process will have a massive overhead
>         (compared with a thread) and will consume alot of disk space and
>         resources  !!!
>
>         You would have to create the docker image and process  when you
>         install pywps , and then start the docker when the user calls
>         the process.  I am a bit against using such a big system in
>         PyWPS unless someone tries to implement it and run it and show
>         or not that is feasable, *we dont know it until we try it*
>
>         Those were my 2cent :)
>
>         J.
>
>
>
>         On 20-09-17 15:41, Adam Laža wrote:
>>         Hi devs,
>>
>>         I am student of geoinformatics at CTU in Prague. Currently I'm
>>         looking for my ma final thesis topic. Yesterday I met with
>>         Jachym and we discussed about containerization PyWPS processes
>>         (probably with Docker). It could be handy for killing/pausing
>>         a process which is as far as I know quite crucial in WPS 2.0.
>>
>>         I'd like to know if somebody already researched this
>>         posibility or whether you have any suggestion or advice.
>>
>>         Thanks in advance.
>>         Adam
>>
>>
>>         _______________________________________________
>>         pywps-dev mailing list
>>         [hidden email] <mailto:[hidden email]>
>>         https://lists.osgeo.org/mailman/listinfo/pywps-dev
>>         <https://lists.osgeo.org/mailman/listinfo/pywps-dev>
>
>         _______________________________________________
>         pywps-dev mailing list
>         [hidden email] <mailto:[hidden email]>
>         https://lists.osgeo.org/mailman/listinfo/pywps-dev
>         <https://lists.osgeo.org/mailman/listinfo/pywps-dev>
>
>
>
>
> _______________________________________________
> pywps-dev mailing list
> [hidden email]
> https://lists.osgeo.org/mailman/listinfo/pywps-dev
>

--
Carsten Ehbrecht
Abteilung Datenmanagement

Deutsches Klimarechenzentrum GmbH (DKRZ)
Bundesstraße 45 a • D-20146 Hamburg • Germany

Phone: +49 40 460094-148
FAX:   +49 40 460094-270
Email: [hidden email]
URL:   www.dkrz.de

Geschäftsführer: Prof. Dr. Thomas Ludwig
Sitz der Gesellschaft: Hamburg
Amtsgericht Hamburg HRB 39784
_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev
Reply | Threaded
Open this post in threaded view
|

Re: Containerization PyWPS processes

Luí­s Moreira de Sousa
In reply to this post by jorge.dejesus
Dear all,

Using Docker as a mechanism to implement the CANCEL request is like using an RPG to kill a fly. It is possible to implement the CANCEL request with Multiprocessing, just needs implementation. We are then left with the PAUSE request, which can not be solved directly with Multiprocessing, but still I would like to try it with a Pythonic approach first.

That said, Docker can be very useful in other tasks. Last year Benjamin Pross floated this idea of executing the same process with different backends. And Jachym is always reminding us of security issues - each execution must be sandboxed, isolated from the host system. In essence, it looks like an interesting development, but as Carsten writes, it will be better as an extension - lets not scare away entry levels users.

Cheers.

--
Luís Moreira de Sousa
Im Grund 6
CH-8600 Dübendorf
Switzerland

Phone: +41 (0)79 812 62 65



_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev
Reply | Threaded
Open this post in threaded view
|

Re: Containerization PyWPS processes

Jachym Cepicky
More to this topic for @Adam to read: https://wiki.python.org/moin/SandboxedPython may be useful (or may be not)

čt 21. 9. 2017 v 17:11 odesílatel Luí­s Moreira de Sousa <[hidden email]> napsal:
Dear all,

Using Docker as a mechanism to implement the CANCEL request is like using an RPG to kill a fly. It is possible to implement the CANCEL request with Multiprocessing, just needs implementation. We are then left with the PAUSE request, which can not be solved directly with Multiprocessing, but still I would like to try it with a Pythonic approach first.

That said, Docker can be very useful in other tasks. Last year Benjamin Pross floated this idea of executing the same process with different backends. And Jachym is always reminding us of security issues - each execution must be sandboxed, isolated from the host system. In essence, it looks like an interesting development, but as Carsten writes, it will be better as an extension - lets not scare away entry levels users.

Cheers.

--
Luís Moreira de Sousa
Im Grund 6
CH-8600 Dübendorf
Switzerland

Phone: <a href="tel:+41%2079%20812%2062%2065" value="+41798126265" target="_blank">+41 (0)79 812 62 65

_______________________________________________
pywps-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pywps-dev