[pdal] Does Entwine support distributed builds?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[pdal] Does Entwine support distributed builds?

Piero Toffanin

Hi there,

I have a question regarding the usage of Entwine and was hoping somebody could help me? The use case is merging point clouds that have been generated on different machines. Each of these point clouds is part to the same final dataset. Entwine works great with the current workflow:

entwine scan -i a.las b.las ... -o output/

for i in {a, b, ... }

    entwine build -i output/scan.json -o output/ --run 1

The "--run 1" is done to lower the memory usage. On small datasets runtime is excellent, but with more models the runtime starts to increase quite a bit. I'm looking specifically to see if there are ways to speed the generation of the EPT index. In particular, since I generate the various LAS files on different machines, I was wondering if there was a way to let each machine contribute its part of the index from the individual LAS files (such index mapped to a network location) or if a workflow is supported in which each machine can build its own EPT index and then merge all EPT indexes into one? I don't think this is possible, but wanted to check.

Thank you for any help,

-Piero



_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] Does Entwine support distributed builds?

Connor Manning
The `subset` option lets each iteration of the build run a spatially distinct region, which can be trivially merged afterward, which sounds like what you're after.  Another option could be to simply use multiple indexes - potree can accept multiple input EPT sources, and a PDAL pipeline may have multiple EPT readers.

On Thu, Jun 13, 2019 at 6:46 AM Piero Toffanin <[hidden email]> wrote:

Hi there,

I have a question regarding the usage of Entwine and was hoping somebody could help me? The use case is merging point clouds that have been generated on different machines. Each of these point clouds is part to the same final dataset. Entwine works great with the current workflow:

entwine scan -i a.las b.las ... -o output/

for i in {a, b, ... }

    entwine build -i output/scan.json -o output/ --run 1

The "--run 1" is done to lower the memory usage. On small datasets runtime is excellent, but with more models the runtime starts to increase quite a bit. I'm looking specifically to see if there are ways to speed the generation of the EPT index. In particular, since I generate the various LAS files on different machines, I was wondering if there was a way to let each machine contribute its part of the index from the individual LAS files (such index mapped to a network location) or if a workflow is supported in which each machine can build its own EPT index and then merge all EPT indexes into one? I don't think this is possible, but wanted to check.

Thank you for any help,

-Piero


_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal

_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] Does Entwine support distributed builds?

Piero Toffanin

Hey Connor,

thanks for the reply. I have looked at the subset option and I think it would work well for the case where I have already computed all the models. For example if I have a folder with:

1.las
2.las
...

Then I could spin four machines and do:

1] entwine build -i 1.las 2.las --subset 1 4 -o out1
2] entwine build -i 1.las 2.las --subset 2 4 -o out2
3] entwine build -i 1.las 2.las --subset 3 4 -o out3
4] entwine build -i 1.las 2.las --subset 4 4 -o out4

Then merge the results. I've noticed two things with this. It seemed that as the number of input files increased, the memory and time required to create each subset seemed increased also (that's why I opted to use scan + build --run 1). The second is that I need to wait for all point clouds to be available (both 1.las and 2.las need to be available before I can start processing them).

I wanted to rule out whether it was possible to do something like (on two separate machines):

1] entwine build -i 1.las -o out1
2] entwine build -i 2.las -o out2

And then merge the resulting EPT indexes into a "global" one:

entwine merge -i out1 out2 -o merged

But I don't think it's possible, correct?

-Piero



On 6/13/19 10:43 AM, Connor Manning wrote:
The `subset` option lets each iteration of the build run a spatially distinct region, which can be trivially merged afterward, which sounds like what you're after.  Another option could be to simply use multiple indexes - potree can accept multiple input EPT sources, and a PDAL pipeline may have multiple EPT readers.

On Thu, Jun 13, 2019 at 6:46 AM Piero Toffanin <[hidden email]> wrote:

Hi there,

I have a question regarding the usage of Entwine and was hoping somebody could help me? The use case is merging point clouds that have been generated on different machines. Each of these point clouds is part to the same final dataset. Entwine works great with the current workflow:

entwine scan -i a.las b.las ... -o output/

for i in {a, b, ... }

    entwine build -i output/scan.json -o output/ --run 1

The "--run 1" is done to lower the memory usage. On small datasets runtime is excellent, but with more models the runtime starts to increase quite a bit. I'm looking specifically to see if there are ways to speed the generation of the EPT index. In particular, since I generate the various LAS files on different machines, I was wondering if there was a way to let each machine contribute its part of the index from the individual LAS files (such index mapped to a network location) or if a workflow is supported in which each machine can build its own EPT index and then merge all EPT indexes into one? I don't think this is possible, but wanted to check.

Thank you for any help,

-Piero


_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal
--

Piero Toffanin
Drone Solutions Engineer

masseranolabs.com
piero.dev



_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] Does Entwine support distributed builds?

Connor Manning
Correct - that is not possible.

On Thu, Jun 13, 2019 at 10:16 AM Piero Toffanin <[hidden email]> wrote:

Hey Connor,

thanks for the reply. I have looked at the subset option and I think it would work well for the case where I have already computed all the models. For example if I have a folder with:

1.las
2.las
...

Then I could spin four machines and do:

1] entwine build -i 1.las 2.las --subset 1 4 -o out1
2] entwine build -i 1.las 2.las --subset 2 4 -o out2
3] entwine build -i 1.las 2.las --subset 3 4 -o out3
4] entwine build -i 1.las 2.las --subset 4 4 -o out4

Then merge the results. I've noticed two things with this. It seemed that as the number of input files increased, the memory and time required to create each subset seemed increased also (that's why I opted to use scan + build --run 1). The second is that I need to wait for all point clouds to be available (both 1.las and 2.las need to be available before I can start processing them).

I wanted to rule out whether it was possible to do something like (on two separate machines):

1] entwine build -i 1.las -o out1
2] entwine build -i 2.las -o out2

And then merge the resulting EPT indexes into a "global" one:

entwine merge -i out1 out2 -o merged

But I don't think it's possible, correct?

-Piero



On 6/13/19 10:43 AM, Connor Manning wrote:
The `subset` option lets each iteration of the build run a spatially distinct region, which can be trivially merged afterward, which sounds like what you're after.  Another option could be to simply use multiple indexes - potree can accept multiple input EPT sources, and a PDAL pipeline may have multiple EPT readers.

On Thu, Jun 13, 2019 at 6:46 AM Piero Toffanin <[hidden email]> wrote:

Hi there,

I have a question regarding the usage of Entwine and was hoping somebody could help me? The use case is merging point clouds that have been generated on different machines. Each of these point clouds is part to the same final dataset. Entwine works great with the current workflow:

entwine scan -i a.las b.las ... -o output/

for i in {a, b, ... }

    entwine build -i output/scan.json -o output/ --run 1

The "--run 1" is done to lower the memory usage. On small datasets runtime is excellent, but with more models the runtime starts to increase quite a bit. I'm looking specifically to see if there are ways to speed the generation of the EPT index. In particular, since I generate the various LAS files on different machines, I was wondering if there was a way to let each machine contribute its part of the index from the individual LAS files (such index mapped to a network location) or if a workflow is supported in which each machine can build its own EPT index and then merge all EPT indexes into one? I don't think this is possible, but wanted to check.

Thank you for any help,

-Piero


_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal
--

Piero Toffanin
Drone Solutions Engineer

masseranolabs.com
piero.dev



_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] Does Entwine support distributed builds?

Piero Toffanin

Thanks, suspected that was the case but wanted to confirm.

In regard to building subsets, is there an advantage to using "entwine scan" vs. the input files directly to "entwine build" in terms of performance (or is scan a simple utility to simplify finding datasets within a folder)?

Are there any tips or tricks that I should be aware of in terms of memory usage when building using subset? For example, is it memory efficient to do:

entwine build -i 1.las 2.las [...] 399.las 400.las --subset 1 64 -o out1

?

As compared to perhaps running 400 times:

entwine build -i 1.las 2.las [...] 399.las 400.las --subset 1 64 -o out1 --run 1

?

Sorry for all the questions!

On 6/13/19 11:39 AM, Connor Manning wrote:
Correct - that is not possible.

On Thu, Jun 13, 2019 at 10:16 AM Piero Toffanin <[hidden email]> wrote:

Hey Connor,

thanks for the reply. I have looked at the subset option and I think it would work well for the case where I have already computed all the models. For example if I have a folder with:

1.las
2.las
...

Then I could spin four machines and do:

1] entwine build -i 1.las 2.las --subset 1 4 -o out1
2] entwine build -i 1.las 2.las --subset 2 4 -o out2
3] entwine build -i 1.las 2.las --subset 3 4 -o out3
4] entwine build -i 1.las 2.las --subset 4 4 -o out4

Then merge the results. I've noticed two things with this. It seemed that as the number of input files increased, the memory and time required to create each subset seemed increased also (that's why I opted to use scan + build --run 1). The second is that I need to wait for all point clouds to be available (both 1.las and 2.las need to be available before I can start processing them).

I wanted to rule out whether it was possible to do something like (on two separate machines):

1] entwine build -i 1.las -o out1
2] entwine build -i 2.las -o out2

And then merge the resulting EPT indexes into a "global" one:

entwine merge -i out1 out2 -o merged

But I don't think it's possible, correct?

-Piero



On 6/13/19 10:43 AM, Connor Manning wrote:
The `subset` option lets each iteration of the build run a spatially distinct region, which can be trivially merged afterward, which sounds like what you're after.  Another option could be to simply use multiple indexes - potree can accept multiple input EPT sources, and a PDAL pipeline may have multiple EPT readers.

On Thu, Jun 13, 2019 at 6:46 AM Piero Toffanin <[hidden email]> wrote:

Hi there,

I have a question regarding the usage of Entwine and was hoping somebody could help me? The use case is merging point clouds that have been generated on different machines. Each of these point clouds is part to the same final dataset. Entwine works great with the current workflow:

entwine scan -i a.las b.las ... -o output/

for i in {a, b, ... }

    entwine build -i output/scan.json -o output/ --run 1

The "--run 1" is done to lower the memory usage. On small datasets runtime is excellent, but with more models the runtime starts to increase quite a bit. I'm looking specifically to see if there are ways to speed the generation of the EPT index. In particular, since I generate the various LAS files on different machines, I was wondering if there was a way to let each machine contribute its part of the index from the individual LAS files (such index mapped to a network location) or if a workflow is supported in which each machine can build its own EPT index and then merge all EPT indexes into one? I don't think this is possible, but wanted to check.

Thank you for any help,

-Piero


_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal
--

Piero Toffanin
Drone Solutions Engineer

masseranolabs.com
piero.dev


--

Piero Toffanin
Drone Solutions Engineer

masseranolabs.com
piero.dev



_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] Does Entwine support distributed builds?

adam steer-2
Hi Piero

I'm watching your questions with interest - many have been on my mind also!

...did your second proposal (run 400 times) work?

that would, on the surface, use less memory since you're reading from one las file at a time rather than (400/64) las files (potentially, assuming a lot about how the data are distributed in space). ...but would also mean partial writing of each entwine chunk, which will eventually contain data from potentially (400/64) of your files...  

...so the question there is 'can entwine support partial writing of subsets'?



On Fri, 14 Jun 2019 at 01:57, Piero Toffanin <[hidden email]> wrote:

Thanks, suspected that was the case but wanted to confirm.

In regard to building subsets, is there an advantage to using "entwine scan" vs. the input files directly to "entwine build" in terms of performance (or is scan a simple utility to simplify finding datasets within a folder)?

Are there any tips or tricks that I should be aware of in terms of memory usage when building using subset? For example, is it memory efficient to do:

entwine build -i 1.las 2.las [...] 399.las 400.las --subset 1 64 -o out1

?

As compared to perhaps running 400 times:

entwine build -i 1.las 2.las [...] 399.las 400.las --subset 1 64 -o out1 --run 1

?

Sorry for all the questions!

On 6/13/19 11:39 AM, Connor Manning wrote:
Correct - that is not possible.

On Thu, Jun 13, 2019 at 10:16 AM Piero Toffanin <[hidden email]> wrote:

Hey Connor,

thanks for the reply. I have looked at the subset option and I think it would work well for the case where I have already computed all the models. For example if I have a folder with:

1.las
2.las
...

Then I could spin four machines and do:

1] entwine build -i 1.las 2.las --subset 1 4 -o out1
2] entwine build -i 1.las 2.las --subset 2 4 -o out2
3] entwine build -i 1.las 2.las --subset 3 4 -o out3
4] entwine build -i 1.las 2.las --subset 4 4 -o out4

Then merge the results. I've noticed two things with this. It seemed that as the number of input files increased, the memory and time required to create each subset seemed increased also (that's why I opted to use scan + build --run 1). The second is that I need to wait for all point clouds to be available (both 1.las and 2.las need to be available before I can start processing them).

I wanted to rule out whether it was possible to do something like (on two separate machines):

1] entwine build -i 1.las -o out1
2] entwine build -i 2.las -o out2

And then merge the resulting EPT indexes into a "global" one:

entwine merge -i out1 out2 -o merged

But I don't think it's possible, correct?

-Piero



On 6/13/19 10:43 AM, Connor Manning wrote:
The `subset` option lets each iteration of the build run a spatially distinct region, which can be trivially merged afterward, which sounds like what you're after.  Another option could be to simply use multiple indexes - potree can accept multiple input EPT sources, and a PDAL pipeline may have multiple EPT readers.

On Thu, Jun 13, 2019 at 6:46 AM Piero Toffanin <[hidden email]> wrote:

Hi there,

I have a question regarding the usage of Entwine and was hoping somebody could help me? The use case is merging point clouds that have been generated on different machines. Each of these point clouds is part to the same final dataset. Entwine works great with the current workflow:

entwine scan -i a.las b.las ... -o output/

for i in {a, b, ... }

    entwine build -i output/scan.json -o output/ --run 1

The "--run 1" is done to lower the memory usage. On small datasets runtime is excellent, but with more models the runtime starts to increase quite a bit. I'm looking specifically to see if there are ways to speed the generation of the EPT index. In particular, since I generate the various LAS files on different machines, I was wondering if there was a way to let each machine contribute its part of the index from the individual LAS files (such index mapped to a network location) or if a workflow is supported in which each machine can build its own EPT index and then merge all EPT indexes into one? I don't think this is possible, but wanted to check.

Thank you for any help,

-Piero


_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal
--

Piero Toffanin
Drone Solutions Engineer

masseranolabs.com
piero.dev


--

Piero Toffanin
Drone Solutions Engineer

masseranolabs.com
piero.dev


_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal


--

_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal