[pdal] PDAL Pipeline Extensibility

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[pdal] PDAL Pipeline Extensibility

Howard Butler-3
I'd like to find out if people see the need to be able to extend PDAL's pipeline JSON syntax, and if they do, what uses the ability to extend it would allow you that you do not have currently. Consider the following pipeline that uses the writers.gdal to interpolate a raster surface from the 255.laz file. At the moment with 1.4, if any of the attributes of a stage are not known, PDAL (rightly) complains with an error telling you it didn't recognize it.

> {
>   "pipeline":[
>     {
>         "type":"readers.las",
>         "filename":"/data/255.laz"
>     },
>     {
>         "type":"writers.gdal",
>         "radius":10.5,
>         "resolution":6,
>         "filename":"/data/dem.tif"
>     }
>   ]
> }


I would like to modify PDAL's Pipeline to support any stage having an 'application' node:

> {
>   "pipeline":[
>     {
>     "type" : "readers.las",
>     "filename" : "/data/255.laz",
>     "application": {
>         "something": 42,
>         "something_else": {"key":"value"},
>         "lots_of_things":[1,2,4,8]
>     }
>     },
>     {
>         "type":"writers.gdal",
>         "radius":10.5,
>         "resolution":6,
>         "filename":"/data/dem.tif",
>         "application": {
>             "comment": "a string",
>             "my_app": {"key":"value"},
>             "user_who_made_this": "howard",
>             "center_point":{ "type": "Point", "coordinates": [100.0, 0.0] },
>             "box":{ "type": "Polygon", "coordinates": [[ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0] ]]}
>         }
>     }
>  ]
> }

The idea is applications can use this 'application' node to transmit and communicate their own information through PDAL pipelines. I have three questions:

1) Is this useful enough to support?
2) Do you have a better name than 'application'?
3) Is the a standard convention that people use in JSON to do this kind of thing?

Thanks,

Howard
_______________________________________________
pdal mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/pdal
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] PDAL Pipeline Extensibility

Adam Steer
Just trying to unpack this idea: is the thinking around enabling some external program - for example some bespoke, super efficient, parallelised, on-the-fly classifer; or some other point modifier that isn’t, or won’t/can’t be in PDAL - to modify points as they’re read/written?

Without having actually had a need to use such a capacity, it seems pretty handy and might open doors to a whole host of new capabilities in PDAL.

So:
q1: without having actually had a need to use such a capacity, it seems useful
q2: keeping mind the answer to q1 - nope. ‘application’ seems to be a good label for that particular box
q3: will need to defer to others - I consider myself a PDAL/JSON beginner

Standing back a bit, probably being quite ignorant about a bunch of stuff, and thinking about supporting PDAL in N years time - I see a small risk of slowing some feature development in PDAL (maybe? can I think of a case of how? possibly developer X’s killer app never makes it into PDAL as a feature because it no longer needs to?) and based on this a small risk of making more dependency issues (using developer X’s killer app, now anyone deploying it has to maintain versions of two things). However, in this scenario it is also reasonable to expect that an organisation relying on an OSS project would contribute some resources in order to mitigate those issues.

Hope that helps - and I hope more experienced folk chime in!



>> {
>>  "pipeline":[
>>    {
>>    "type" : "readers.las",
>>    "filename" : "/data/255.laz”,
>>    "application": {
>>        "something": 42,
>>        "something_else": {"key":"value"},
>>        "lots_of_things":[1,2,4,8]
>>    }
>>    },
>>    {
>>        "type":"writers.gdal",
>>        "radius":10.5,
>>        "resolution":6,
>>        "filename":"/data/dem.tif",
>>        "application": {
>>            "comment": "a string",
>>            "my_app": {"key":"value"},
>>            "user_who_made_this": "howard",
>>            "center_point":{ "type": "Point", "coordinates": [100.0, 0.0] },
>>            "box":{ "type": "Polygon", "coordinates": [[ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0] ]]}
>>        }
>>    }
>> ]
>> }
>


> The idea is applications can use this 'application' node to transmit and communicate their own information through PDAL pipelines. I have three questions:
>
> 1) Is this useful enough to support?
> 2) Do you have a better name than 'application'?
> 3) Is the a standard convention that people use in JSON to do this kind of thing?
>
> Thanks,
>
> Howard
> _______________________________________________
> pdal mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/pdal

_______________________________________________
pdal mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/pdal
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] PDAL Pipeline Extensibility

Rob Emanuele
I haven't had a need for it, but I'll comment on this: "application" is a confusing term, and at first blush I read it as "shelling out to some external application". Re-reading your description, I believe what this feature would enable is to allow for JSON to be inserted into pipeline stages that PDAL will completely ignore but not complain about. So in case I was parsing the pipeline, or programmatically creating it/reading from it in my own application outside of PDAL, I could insert any JSON I wanted to and PDAL wouldn't have a problem with it (and not do anything with it). Is that a correct read?

If that's correct, I would say instead of "application", some better terms would be "tags", "userData", "userTags", or something along those lines.


On Wed, Dec 21, 2016 at 7:05 PM, Adam Steer <[hidden email]> wrote:
Just trying to unpack this idea: is the thinking around enabling some external program - for example some bespoke, super efficient, parallelised, on-the-fly classifer; or some other point modifier that isn’t, or won’t/can’t be in PDAL - to modify points as they’re read/written?

Without having actually had a need to use such a capacity, it seems pretty handy and might open doors to a whole host of new capabilities in PDAL.

So:
q1: without having actually had a need to use such a capacity, it seems useful
q2: keeping mind the answer to q1 - nope. ‘application’ seems to be a good label for that particular box
q3: will need to defer to others - I consider myself a PDAL/JSON beginner

Standing back a bit, probably being quite ignorant about a bunch of stuff, and thinking about supporting PDAL in N years time - I see a small risk of slowing some feature development in PDAL (maybe? can I think of a case of how? possibly developer X’s killer app never makes it into PDAL as a feature because it no longer needs to?) and based on this a small risk of making more dependency issues (using developer X’s killer app, now anyone deploying it has to maintain versions of two things). However, in this scenario it is also reasonable to expect that an organisation relying on an OSS project would contribute some resources in order to mitigate those issues.

Hope that helps - and I hope more experienced folk chime in!



>> {
>>  "pipeline":[
>>    {
>>    "type" : "readers.las",
>>    "filename" : "/data/255.laz”,
>>    "application": {
>>        "something": 42,
>>        "something_else": {"key":"value"},
>>        "lots_of_things":[1,2,4,8]
>>    }
>>    },
>>    {
>>        "type":"writers.gdal",
>>        "radius":10.5,
>>        "resolution":6,
>>        "filename":"/data/dem.tif",
>>        "application": {
>>            "comment": "a string",
>>            "my_app": {"key":"value"},
>>            "user_who_made_this": "howard",
>>            "center_point":{ "type": "Point", "coordinates": [100.0, 0.0] },
>>            "box":{ "type": "Polygon", "coordinates": [[ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0] ]]}
>>        }
>>    }
>> ]
>> }
>


> The idea is applications can use this 'application' node to transmit and communicate their own information through PDAL pipelines. I have three questions:
>
> 1) Is this useful enough to support?
> 2) Do you have a better name than 'application'?
> 3) Is the a standard convention that people use in JSON to do this kind of thing?
>
> Thanks,
>
> Howard
> _______________________________________________
> pdal mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/pdal

_______________________________________________
pdal mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/pdal


_______________________________________________
pdal mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/pdal
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] PDAL Pipeline Extensibility

Howard Butler-3

> On Dec 21, 2016, at 9:36 PM, Rob Emanuele <[hidden email]> wrote:
>
> If that's correct, I would say instead of "application", some better terms would be "tags", "userData", "userTags", or something along those lines.

Yes, I was hoping for a better name here. I was also hoping there was a standard way that people were extending JSON like this, and that we would just support it. I haven't found anything thus far though.

Howard
_______________________________________________
pdal mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/pdal
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] PDAL Pipeline Extensibility

Howard Butler-3

> On Dec 22, 2016, at 10:51 AM, Howard Butler <[hidden email]> wrote:
>
>
>> On Dec 21, 2016, at 9:36 PM, Rob Emanuele <[hidden email]> wrote:
>>
>> If that's correct, I would say instead of "application", some better terms would be "tags", "userData", "userTags", or something along those lines.
>
> Yes, I was hoping for a better name here. I was also hoping there was a standard way that people were extending JSON like this, and that we would just support it. I haven't found anything thus far though.

I renamed it to 'userData'. Thanks for the feedback.

Howard
_______________________________________________
pdal mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/pdal
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] PDAL Pipeline Extensibility

Kristian Evers-2
Howard,
 
Recently I have had a need for feeding user data to the programmable filter. I have an application that creates a PDAL pipeline based on the input and various settings determine the output. In some cases I use the programmable filter to do something that is outside the current scope of PDAL. If it were possible to access the pipeline JSON, and the userData section of the pipeline, from the Python function behind the filter I would be able to make more generic filters.
A recent example is that I wanted to create a vertical gridshift filter (similar to what Proj.4 does). I would have liked to pass the grid name to the programmable filter as a PDAL pipeline parameter but that is not possible. Instead I ended up hard coding it. It did the job but wasn't a very satisfying solution. I have since realized that I can in fact use the reprojection filter to do this (via Proj.4). This is just the example that came to mind – I have been in similar situations before but the exact details escape my memory.
 
The Python function prototype will have to be changed. It is best demonstrated with an example. I have extended the multiply_z function from the filters.programmable doc page with my suggestion:
 
import numpy as np
 
def multiply_z(ins, outs, pipeline):
    f = pipeline[1]['userData']['z_factor']
    Z = ins['Z']
    Z = Z * f
    outs['Z'] = Z
    return True
 
Info about the other stages in a pipeline should also be available. For the above case the pipeline dictionary would be constructed with:
 
        with open(‘pipeline.json’) as json_file:
            pipeline = json.load(json_file)[‘pipeline’]
 
 
 
It is not exactly what you are asking about here but I think it is a nice addition to the "userData" concept that you are introducing. My suggestion can of course be extended to filters.predicate as well.
 
/Kristian
 
 
> -----Oprindelig meddelelse-----
> Fra: pdal [[hidden email]] På vegne af Howard Butler
> Sendt: 29. december 2016 19:45
> Til: pdal
> Emne: Re: [pdal] PDAL Pipeline Extensibility
>
>
> > On Dec 22, 2016, at 10:51 AM, Howard Butler <[hidden email]> wrote:
> >
> >
> >> On Dec 21, 2016, at 9:36 PM, Rob Emanuele <[hidden email]>
> wrote:
> >>
> >> If that's correct, I would say instead of "application", some better terms
> would be "tags", "userData", "userTags", or something along those lines.
> >
> > Yes, I was hoping for a better name here. I was also hoping there was a
> standard way that people were extending JSON like this, and that we would
> just support it. I haven't found anything thus far though.
>
> I renamed it to 'userData'. Thanks for the feedback.
>
> Howard
> _______________________________________________
> pdal mailing list
 

_______________________________________________
pdal mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/pdal
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] PDAL Pipeline Extensibility

Howard Butler-3

> On Dec 29, 2016, at 3:16 PM, Kristian Evers <[hidden email]> wrote:
>
> Recently I have had a need for feeding user data to the programmable filter. I have an application that creates a PDAL pipeline based on the input and various settings determine the output. In some cases I use the programmable filter to do something that is outside the current scope of PDAL. If it were possible to access the pipeline JSON, and the userData section of the pipeline, from the Python function behind the filter I would be able to make more generic filters.

This is a good idea.

I'll make a dict of globals called 'pipeline' available to both filters.programmable and filters.predicate that represent the following:

pipeline['schema'] -> json dict
pipeline['metadata'] -> json dict
pipeline['pipeline'] -> json dict
pipelien['log'] -> writeable log stream


> A recent example is that I wanted to create a vertical gridshift filter (similar to what Proj.4 does). I would have liked to pass the grid name to the programmable filter as a PDAL pipeline parameter but that is not possible. Instead I ended up hard coding it. It did the job but wasn't a very satisfying solution. I have since realized that I can in fact use the reprojection filter to do this (via Proj.4). This is just the example that came to mind – I have been in similar situations before but the exact details escape my memory.

Thanks for your documentation update that demonstrates how to do gridshifts and proj.4 strings in PDAL http://www.pdal.io/stages/filters.reprojection.html#examle-2 . I guess I assumed people familiar with GDAL's SRS handling tools would just know that the same skills and techniques would transfer.

Howard
_______________________________________________
pdal mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/pdal
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] PDAL Pipeline Extensibility

Kristian Evers-2
Also a good idea to extend it to include schema, metadata etc. Hadn't even crossed my mind! What about just calling the dictionary globals? The way you intend to implement it, it is more than just the pipeline and pipeline['pipeline'] is repeating itself.

About the SRS definition, that's what I eventually figured out because I knew that GDAL and Proj.4 is powering coordinate transformation. Maybe this should be added to the docs as well? Ideally with a link to a relevant section of GDAL documentation.

/Kristian

> -----Oprindelig meddelelse-----
> Fra: pdal [mailto:[hidden email]] På vegne af Howard Butler
> Sendt: 30. december 2016 16:53
> Til: Kristian Evers
> Cc: [hidden email]
> Emne: Re: [pdal] PDAL Pipeline Extensibility
>
>
> > On Dec 29, 2016, at 3:16 PM, Kristian Evers <[hidden email]> wrote:
> >
> > Recently I have had a need for feeding user data to the programmable
> filter. I have an application that creates a PDAL pipeline based on the input
> and various settings determine the output. In some cases I use the
> programmable filter to do something that is outside the current scope of
> PDAL. If it were possible to access the pipeline JSON, and the userData
> section of the pipeline, from the Python function behind the filter I would be
> able to make more generic filters.
>
> This is a good idea.
>
> I'll make a dict of globals called 'pipeline' available to both
> filters.programmable and filters.predicate that represent the following:
>
> pipeline['schema'] -> json dict
> pipeline['metadata'] -> json dict
> pipeline['pipeline'] -> json dict
> pipelien['log'] -> writeable log stream
>
>
> > A recent example is that I wanted to create a vertical gridshift filter (similar
> to what Proj.4 does). I would have liked to pass the grid name to the
> programmable filter as a PDAL pipeline parameter but that is not possible.
> Instead I ended up hard coding it. It did the job but wasn't a very satisfying
> solution. I have since realized that I can in fact use the reprojection filter to
> do this (via Proj.4). This is just the example that came to mind – I have been
> in similar situations before but the exact details escape my memory.
>
> Thanks for your documentation update that demonstrates how to do
> gridshifts and proj.4 strings in PDAL
> http://www.pdal.io/stages/filters.reprojection.html#examle-2 . I guess I
> assumed people familiar with GDAL's SRS handling tools would just know that
> the same skills and techniques would transfer.
>
> Howard
> _______________________________________________
> pdal mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/pdal
_______________________________________________
pdal mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/pdal
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] PDAL Pipeline Extensibility

Howard Butler-3
After thinking about it a bit, my plan is to put everything in the kwargs dict of the function and simply call the function with it. This would allow you to set things like the stage metadata via this mechanism as well.

Please make a PR where you think the docs are deficient about GDAL + Proj.4. I'll happily merge stuff in that makes navigating that fog bank any easier.

Howard



> On Dec 30, 2016, at 3:39 PM, Kristian Evers <[hidden email]> wrote:
>
> Also a good idea to extend it to include schema, metadata etc. Hadn't even crossed my mind! What about just calling the dictionary globals? The way you intend to implement it, it is more than just the pipeline and pipeline['pipeline'] is repeating itself.
>
> About the SRS definition, that's what I eventually figured out because I knew that GDAL and Proj.4 is powering coordinate transformation. Maybe this should be added to the docs as well? Ideally with a link to a relevant section of GDAL documentation.
>
> /Kristian
>
>> -----Oprindelig meddelelse-----
>> Fra: pdal [mailto:[hidden email]] På vegne af Howard Butler
>> Sendt: 30. december 2016 16:53
>> Til: Kristian Evers
>> Cc: [hidden email]
>> Emne: Re: [pdal] PDAL Pipeline Extensibility
>>
>>
>>> On Dec 29, 2016, at 3:16 PM, Kristian Evers <[hidden email]> wrote:
>>>
>>> Recently I have had a need for feeding user data to the programmable
>> filter. I have an application that creates a PDAL pipeline based on the input
>> and various settings determine the output. In some cases I use the
>> programmable filter to do something that is outside the current scope of
>> PDAL. If it were possible to access the pipeline JSON, and the userData
>> section of the pipeline, from the Python function behind the filter I would be
>> able to make more generic filters.
>>
>> This is a good idea.
>>
>> I'll make a dict of globals called 'pipeline' available to both
>> filters.programmable and filters.predicate that represent the following:
>>
>> pipeline['schema'] -> json dict
>> pipeline['metadata'] -> json dict
>> pipeline['pipeline'] -> json dict
>> pipelien['log'] -> writeable log stream
>>
>>
>>> A recent example is that I wanted to create a vertical gridshift filter (similar
>> to what Proj.4 does). I would have liked to pass the grid name to the
>> programmable filter as a PDAL pipeline parameter but that is not possible.
>> Instead I ended up hard coding it. It did the job but wasn't a very satisfying
>> solution. I have since realized that I can in fact use the reprojection filter to
>> do this (via Proj.4). This is just the example that came to mind – I have been
>> in similar situations before but the exact details escape my memory.
>>
>> Thanks for your documentation update that demonstrates how to do
>> gridshifts and proj.4 strings in PDAL
>> http://www.pdal.io/stages/filters.reprojection.html#examle-2 . I guess I
>> assumed people familiar with GDAL's SRS handling tools would just know that
>> the same skills and techniques would transfer.
>>
>> Howard
>> _______________________________________________
>> pdal mailing list
>> [hidden email]
>> http://lists.osgeo.org/mailman/listinfo/pdal

_______________________________________________
pdal mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/pdal
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] PDAL Pipeline Extensibility

Howard Butler-3

> On Dec 30, 2016, at 3:42 PM, Howard Butler <[hidden email]> wrote:
>
> After thinking about it a bit, my plan is to put everything in the kwargs dict of the function and simply call the function with it. This would allow you to set things like the stage metadata via this mechanism as well.

A followup to let you know this is now complete in master. A 'schema', 'metadata', and 'spatialreference' dict are now available to filters.programmable [1] and filters.predicate [2] Python filters. Additionally, you can modify/create inline metadata using this mechanism by updating the "global" metadata dict in your function. See the unreleased docs [3] for more detail.

Howard

[1] http://www.pdal.io/stages/filters.programmable.html
[2] http://www.pdal.io/stages/filters.predicate.html
[3] https://github.com/PDAL/PDAL/blob/master/doc/stages/filters.programmable.rst#module-globals
_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal