[pdal] Proposal: Remove SQLite reader/writer for PDAL 2.0

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[pdal] Proposal: Remove SQLite reader/writer for PDAL 2.0

Howard Butler-3
All,

I am proposing that we remove the SQLite drivers in the upcoming PDAL 2.0 release. I do not think these drivers are seeing much use, they do not have a full featureset, and we are seeing some bugs creep in as PDAL interacts with other software libraries that are using SQLite (notably GDAL). 

The drivers have been their own little thing for quite a while and do not seem to have taken off as a useful container format for point cloud data. Given this experience, I would like to remove the burden of maintaining from the project unless there are strong objections and corresponding avenues of support to rally to keep it in the project. 

If you have an objection, or are using this capability, we would really like to hear about how you are using, and what it allows you to do that cannot be achieved with some other format or approach.

Thanks,

Howard


_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] Proposal: Remove SQLite reader/writer for PDAL 2.0

Howard Butler-3
On second thought, we'll table the removal until 2.1 to allow a full release cycle for everyone who might be impacted to catch up, but we will still plan to remove these drivers. Again, if you are going to be impacted by this change we would like to hear from you. We don't know of much use of these drivers, and their continued maintenance burden does not seem worth it given their limited use.

Howard

On Tue, Jul 30, 2019 at 9:43 PM Howard Butler <[hidden email]> wrote:
All,

I am proposing that we remove the SQLite drivers in the upcoming PDAL 2.0 release. I do not think these drivers are seeing much use, they do not have a full featureset, and we are seeing some bugs creep in as PDAL interacts with other software libraries that are using SQLite (notably GDAL). 

The drivers have been their own little thing for quite a while and do not seem to have taken off as a useful container format for point cloud data. Given this experience, I would like to remove the burden of maintaining from the project unless there are strong objections and corresponding avenues of support to rally to keep it in the project. 

If you have an objection, or are using this capability, we would really like to hear about how you are using, and what it allows you to do that cannot be achieved with some other format or approach.

Thanks,

Howard


_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal
Jed
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] Proposal: Remove SQLite reader/writer for PDAL 2.0

Jed
In reply to this post by Howard Butler-3
On Wed, 31 Jul 2019 08:28:09 -0500 Howard Butler wrote:
> On second thought, we'll table the removal until 2.1 to allow a full
> release cycle for everyone who might be impacted to catch up, but we will
> still plan to remove these drivers. Again, if you are going to be impacted
> by this change we would like to hear from you. We don't know of much use of
> these drivers, and their continued maintenance burden does not seem worth
> it given their limited use.

Good timing! In the next few days we were about to start building some
tools based on them.

The use case was to round-trip data to PDAL and back from inside a
couple of applications that include embedded Python interpreters, but
have limited support for other PDAL formats. In particular, although
the specific applications we're working with do have some level of
support for both las and e57, the built in readers and writers don't
always handle arbitrary dimensions well, if at all.

Since an SQLite interface is included in the Python Standard Library
the idea was to use that rather than trying to muck about with adding
external modules to each application's Python install (and the
maintenance nightmare that entails). Based on PDAL's current docs,
SQLite looked like a good candidate for storing points with arbitrary
dimensions in a way that was both well supported by PDAL and easily
accessible by anything with a Python interpreter. If SQLite is
deprecated, then good old text files look like about the only other
container that would fill that niche.

I should note that all of this was theoretical and we may well have
run in to issues trying to put it in to practice. Nonetheless, it
seemed like a reasonable approach and knowing that at least one
commercial lidar software had used their own flavor of SQLite as a
native file format gave me some additional confidence that it wan't a
bad solution.

As an aside, I don't know if it makes sense for PDAL to have a
"native" container format or not, but it would be helpful to more
clearly document which writer/reader pairs can be expected to
losslessly round-trip data if PDAL is the only application involved.
With the number of useful filters PDAL already has, I don't think it
is unreasonable to start thinking about it as a central part of a
processing workflow rather than just  a tool for moving data from
External Application A to External Application B. The more it takes on
a central role the more I think it makes sense to be asking "What's
the best container format to use with PDAL?" rather than just
conforming to whatever might be supported by external applications.

Best wishes,

--
Jed Frechette
_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] Proposal: Remove SQLite reader/writer for PDAL 2.0

Howard Butler-3


> On Aug 1, 2019, at 1:13 PM, Jed Frechette <[hidden email]> wrote:
>
> On Wed, 31 Jul 2019 08:28:09 -0500 Howard Butler wrote:
>> On second thought, we'll table the removal until 2.1 to allow a full release cycle for everyone who might be impacted to catch up, but we will still plan to remove these drivers. Again, if you are going to be impacted by this change we would like to hear from you. We don't know of much use of these drivers, and their continued maintenance burden does not seem worth it given their limited use.
>
> The use case was to round-trip data to PDAL and back from inside a couple of applications that include embedded Python interpreters, but have limited support for other PDAL formats. In particular, although the specific applications we're working with do have some level of
> support for both las and e57, the built in readers and writers don't always handle arbitrary dimensions well, if at all.

I think TileDB is a better choice for this task at the moment than SQLite with the upcoming PDAL 2.0 release. Norman Barker is supporting it through PDAL's support venues, and I believe his firm is available for paid support opportunities. It is going to perform much better and it supports streaming. The SQLite drivers were a proof of concept that I developed based on our experience with both the Oracle and pgpointcloud drivers, and while it is interesting for database storage of point clouds, it has some significant downsides.

TileDB's interface in Python is much better than SQLite's and is going to give you much more convenient numpy access to the points and  attributes. With SQLite you would have to build that all up yourself, and you would likely need to protect yourself from us making any schema changes to the SQLite storage layout too.

> As an aside, I don't know if it makes sense for PDAL to have a "native" container format or not, but it would be helpful to more
> clearly document which writer/reader pairs can be expected to losslessly round-trip data if PDAL is the only application involved.

We don't really have such a thing. LAS and LAZ with extra_bytes is likely the closest thing it we have released at the moment, especially if you stuff metadata in VLRs, but it isn't pure by any means.

> With the number of useful filters PDAL already has, I don't think it is unreasonable to start thinking about it as a central part of a processing workflow rather than just  a tool for moving data from External Application A to External Application B. The more it takes on
> a central role the more I think it makes sense to be asking "What's the best container format to use with PDAL?" rather than just conforming to whatever might be supported by external applications.

HDF or TileDB might make the most sense as the binary container for this task. Both would give you maximum flexibility, compressed binary storage with universal platform support, and opportunity to advertise the data in other computing environments beyond PDAL. It hasn't been a requirement for us to develop something like this, however.  You should explore whether or not the TileDB drivers that Norman Barker recently added are sufficient for this task, but I would suspect some of the finer points like PDAL's metadata and such might not fully survive a transit. It's hard work to get all of that stuff right. We haven't done it.

Howard
_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal
Jed
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] Proposal: Remove SQLite reader/writer for PDAL 2.0

Jed
On Thu, Aug 1, 2019 at 12:32 PM Howard Butler <[hidden email]> wrote:
> I think TileDB is a better choice for this task at the moment than SQLite with the upcoming PDAL 2.0 release. Norman Barker is supporting it through PDAL's support venues, and I believe his firm is available for paid support opportunities. It is going to perform much better and it supports streaming. The SQLite drivers were a proof of concept that I developed based on our experience with both the Oracle and pgpointcloud drivers, and while it is interesting for database storage of point clouds, it has some significant downsides.

Thanks for the good advice. I had looked in to TileDB and it seems
really interesting but given how general it is I wasn't sure how well
the implementation played with PDAL. It's good to hear that it is at
least a candidate for a nativeish container format.

The biggest challenge I see with it for the use case I outlined is
that it is a compiled Python module. Unfortunately, we're mostly stuck
on Windows and even with the help of conda, keeping various modules
working together is a pain. Trying to do it while adding in Python
version mismatches and trying to match whatever compiler was used to
build the embedded interpreter pretty quickly negates any ease of use
advantages Python might have offered so it's probably time to look at
other ways of extending the applications in question.

Or just use text files ;-) Most of our data sets are only several
hundred million points, so although far from ideal it's certainly
manageable.

Best,


--
Jed Frechette
_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] Proposal: Remove SQLite reader/writer for PDAL 2.0

Howard Butler-3


> On Aug 1, 2019, at 2:09 PM, Jed Frechette <[hidden email]> wrote:
>
> On Thu, Aug 1, 2019 at 12:32 PM Howard Butler <[hidden email]> wrote:
>> I think TileDB is a better choice for this task at the moment than SQLite with the upcoming PDAL 2.0 release. Norman Barker is supporting it through PDAL's support venues, and I believe his firm is available for paid support opportunities. It is going to perform much better and it supports streaming. The SQLite drivers were a proof of concept that I developed based on our experience with both the Oracle and pgpointcloud drivers, and while it is interesting for database storage of point clouds, it has some significant downsides.
>
> Thanks for the good advice. I had looked in to TileDB and it seems
> really interesting but given how general it is I wasn't sure how well
> the implementation played with PDAL. It's good to hear that it is at
> least a candidate for a nativeish container format.

Numpy is also a candidate. We don't have a writers.numpy, but all the code to do so is in the Python extension. You could then save npz files and transit your data that way. Happy to merge a patch on it, but it won't make the 2.0 release cutoff.

Howard
_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] Proposal: Remove SQLite reader/writer for PDAL 2.0

andrew.bell.ia@gmail.com

I don't know if it's helpful, but PLY and PCD both support binary encoding of any set of dimensions.

On Thu, Aug 1, 2019 at 3:26 PM Howard Butler <[hidden email]> wrote:


> On Aug 1, 2019, at 2:09 PM, Jed Frechette <[hidden email]> wrote:
>
> On Thu, Aug 1, 2019 at 12:32 PM Howard Butler <[hidden email]> wrote:
>> I think TileDB is a better choice for this task at the moment than SQLite with the upcoming PDAL 2.0 release. Norman Barker is supporting it through PDAL's support venues, and I believe his firm is available for paid support opportunities. It is going to perform much better and it supports streaming. The SQLite drivers were a proof of concept that I developed based on our experience with both the Oracle and pgpointcloud drivers, and while it is interesting for database storage of point clouds, it has some significant downsides.
>
> Thanks for the good advice. I had looked in to TileDB and it seems
> really interesting but given how general it is I wasn't sure how well
> the implementation played with PDAL. It's good to hear that it is at
> least a candidate for a nativeish container format.

Numpy is also a candidate. We don't have a writers.numpy, but all the code to do so is in the Python extension. You could then save npz files and transit your data that way. Happy to merge a patch on it, but it won't make the 2.0 release cutoff.

Howard
_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal


--
Andrew Bell
[hidden email]

_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal
Jed
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] Proposal: Remove SQLite reader/writer for PDAL 2.0

Jed
In reply to this post by Howard Butler-3
On Thu, Aug 1, 2019 at 1:26 PM Howard Butler <[hidden email]> wrote:
> Numpy is also a candidate.

How much should I be worried about this warning in the 1.9.1 docs?

"""
It is untested whether problems may occur if the versions of Python
used in writing the file and for reading the file don’t match.
"""

Is this referring to issues such as:

https://stackoverflow.com/questions/24105148/load-python-2-npy-file-in-python-3

or something else?

--
Jed Frechette
_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] Proposal: Remove SQLite reader/writer for PDAL 2.0

Howard Butler-3


> On Aug 1, 2019, at 2:57 PM, Jed Frechette <[hidden email]> wrote:
>
> On Thu, Aug 1, 2019 at 1:26 PM Howard Butler <[hidden email]> wrote:
>> Numpy is also a candidate.
>
> How much should I be worried about this warning in the 1.9.1 docs?

Lots? ;)

More seriously, I think things are a little better than they used to be, but there were definitely dragons here in the past. This internal and external lack of interchange is why efforts like Arrow are gaining traction, but there has been bumps along that road too.  I'd be curious to hear what Norman has to say on this topic as it is a space they are charging after quite intently.

>
> """
> It is untested whether problems may occur if the versions of Python
> used in writing the file and for reading the file don’t match.
> """
>
> Is this referring to issues such as:
>
> https://stackoverflow.com/questions/24105148/load-python-2-npy-file-in-python-3
>
> or something else?

Yeah.

_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] Proposal: Remove SQLite reader/writer for PDAL 2.0

Norman Barker-2
In reply to this post by Jed
Thanks for the mention Howard.

The TileDB driver is supported and is actively developed. TileDB is generic across other domains but the implementation in PDAL is solid and TileDB might be a good transfer format for your use case. If you decide to use TileDB within PDAL then happy to discuss it further. 

Without meaning to hijack this thread, which toolchain (e.g. mingw64) are you using when you build from source and are not using conda? 



On Thu, Aug 1, 2019 at 2:58 PM Jed Frechette <[hidden email]> wrote:
On Thu, Aug 1, 2019 at 1:26 PM Howard Butler <[hidden email]> wrote:
> Numpy is also a candidate.

How much should I be worried about this warning in the 1.9.1 docs?

"""
It is untested whether problems may occur if the versions of Python
used in writing the file and for reading the file don’t match.
"""

Is this referring to issues such as:

https://stackoverflow.com/questions/24105148/load-python-2-npy-file-in-python-3

or something else?

--
Jed Frechette
_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal

_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] Proposal: Remove SQLite reader/writer for PDAL 2.0

Norman Barker-2
In reply to this post by Howard Butler-3
>More seriously, I think things are a little better than they used to be, but there were definitely dragons here in the past. This internal and external lack of interchange is why efforts like >Arrow are gaining traction, but there has been bumps along that road too.  I'd be curious to hear what Norman has to say on this topic as it is a space they are charging after quite intently.

Round tripping between PDAL and other tools using TileDB is something we have done. I will be creating a couple of example python notebooks demonstrating some of this in the next few weeks. We have done a fair amount of testing of creating TileDB arrays with PDAL and then accessing / modifying the arrays outside of PDAL with other tools and then back to PDAL which I think is the use case you are describing. It works well.
 
> """
> It is untested whether problems may occur if the versions of Python
> used in writing the file and for reading the file don’t match.
> """
>
> Is this referring to issues such as:
>
> https://stackoverflow.com/questions/24105148/load-python-2-npy-file-in-python-3
>
> or something else?

Yeah.

_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal

_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal
Jed
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] Proposal: Remove SQLite reader/writer for PDAL 2.0

Jed
In reply to this post by Norman Barker-2
On Thu, Aug 1, 2019 at 2:06 PM Norman Barker <[hidden email]> wrote:
> Without meaning to hijack this thread, which toolchain (e.g. mingw64) are you using when you build from source and are not using conda?

Not being able to control the toolchain was the problem I was hoping
to bypass. Ideally, by having something we can use in environments
where we have Python, but aren't guaranteed to have much else. To give
concrete examples, two of the applications we wanted to extend
initially were Agisoft and Houdini.

Agisoft includes a Python interpreter that sys.version reports as
"3.5.2 (default, Aug 28 2018, 15:41:10) [MSC v.1600 64 bit (AMD64)]".
The embedded Python doesn't ship with Numpy or much else in the way of
useful modules, and other than Python there is no other real extension
mechanism for the application.

Houdini also ships with Python, "2.7.15 (default, Apr  8 2019,
15:38:59) [MSC v.1916 64 bit (AMD64)]" in that case. Despite still
being on 2.7 (for ecosystem reasons) Houdini is a lot better. It does
at least include Numpy and also offers a bunch of other ways to extend
the application including a C++ SDK and even their own method of
making C++ functions accessible from Python [1].

In both cases, I know it is possible to add external Python modules to
the embedded distributions, but the process seems very brittle and
unsustainable, especially when you start getting in to modules that
mix Python and C++ e.g. [2].

I don't know that this is necessarily a problem that PDAL needs to
solve. Although having a portable format that could easily be handled
by pure Python would be convenient, there are certainly downsides to
working within that restriction. If the current SQLite implementation
isn't a good solution, I'm not necessarily arguing to keep it. More
than anything I'm just glad this thread happened the week before we
started working on it. ;-)

TileDB does seem like a good working format for situations where we
have more control over the toolchain.

Best,

[1] https://www.sidefx.com/docs/houdini/hom/extendingwithcpp.html
[2] https://www.sidefx.com/forum/topic/58442/

--
Jed Frechette
_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal
Reply | Threaded
Open this post in threaded view
|

Re: [pdal] Proposal: Remove SQLite reader/writer for PDAL 2.0

Howard Butler-3


On Aug 1, 2019, at 4:16 PM, Jed Frechette <[hidden email]> wrote:

I'm not necessarily arguing to keep it. More than anything I'm just glad this thread happened the week before we started working on it. ;-)

Even if you had started with it, the drivers would have been moved the SQLite drivers to https://github.com/PDAL/unsuppported-plugins which would have still given you the option of building things up yourself and using them. 

_______________________________________________
pdal mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pdal