[gdal-dev] VRT derived band pixel functions written in Python

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[gdal-dev] VRT derived band pixel functions written in Python

Even Rouault-2
Hi,

I wanted to mention a new (and I think pretty cool ) feature I've added in
trunk: the possibility to define pixel functions in Python in a VRT derived
band.

Let's start with a simple example to multiply a raster by 1.5 :

<VRTDataset rasterXSize="20" rasterYSize="20">
  <SRS>EPSG:26711</SRS>
  <GeoTransform>440720,60,0,3751320,0,-60</GeoTransform>
  <VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand">
    <PixelFunctionType>multiply</PixelFunctionType>
    <PixelFunctionLanguage>Python</PixelFunctionLanguage>
    <PixelFunctionArguments factor="1.5"/>
    <PixelFunctionCode><![CDATA[
import numpy as np
def multiply(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
                   raster_ysize, buf_radius, gt, **kwargs):
    factor = float(kwargs['factor'])
    out_ar[:] = np.round_(np.clip(in_ar[0] * factor,0,255))
]]>
    </PixelFunctionCode>
    <SimpleSource>
      <SourceFilename relativeToVRT="1">byte.tif</SourceFilename>
    </SimpleSource>
  </VRTRasterBand>
</VRTDataset>

or to add 2 (or more) rasters:

<VRTDataset rasterXSize="20" rasterYSize="20">
  <SRS>EPSG:26711</SRS>
  <GeoTransform>440720,60,0,3751320,0,-60</GeoTransform>
  <VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand">
    <PixelFunctionType>add</PixelFunctionType>
    <PixelFunctionLanguage>Python</PixelFunctionLanguage>
    <PixelFunctionCode><![CDATA[
import numpy as np
def add(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
                   raster_ysize, buf_radius, gt, **kwargs):
    np.round_(np.clip(np.sum(in_ar, axis = 0, dtype = 'uint16'),0,255),
              out = out_ar)
]]>
    </PixelFunctionCode>
    <SimpleSource>
      <SourceFilename relativeToVRT="1">byte.tif</SourceFilename>
    </SimpleSource>
    <SimpleSource>
      <SourceFilename relativeToVRT="1">byte2.tif</SourceFilename>
    </SimpleSource>
  </VRTRasterBand>
</VRTDataset>


You can put any python module code inside PixelFunctionCode, with at least one
function with the following arguments :
- in_ar: list of input numpy arrays (one numpy array for each source)
- out_ar: output numpy array to fill (initialized at the right dimensions and
with the VRTRasterBand.dataType)
- xoff: pixel offset to the top left corner of the accessed region of the band
- yoff: line offset to the top left corner of the accessed region of the band
- xsize: width of the region of the accessed region of the band
- ysize: height of the region of the accessed region of the band
- raster_xsize: total with of the raster band
- raster_ysize: total height of the raster band
- buf_radius: radius of the buffer (in pixels) added to the left, right, top
and bottom of in_ar / out_ar
- gt: geotransform
- kwargs: dictionnary with user arguments defined in PixelFunctionArguments

For basic operations, you just need to care about in_ar and out_ar.

With all that, you can do interesting stuff like hillshading (code ported from
gdaldem):

<VRTDataset rasterXSize="121" rasterYSize="121">
  <SRS>EPSG:4326</SRS>
  <GeoTransform>-80.004166666666663,0.008333333333333,0,
44.004166666666663,0,-0.008333333333333</GeoTransform>
  <VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand">
    <ColorInterp>Gray</ColorInterp>
    <SimpleSource>
      <SourceFilename relativeToVRT="1">n43.dt0</SourceFilename>
    </SimpleSource>
    <PixelFunctionLanguage>Python</PixelFunctionLanguage>
    <PixelFunctionType>hillshade</PixelFunctionType>
    <PixelFunctionArguments scale="111120" z_factor="30" />
    <PixelFunctionCode>
      <![CDATA[
# Licence: X/MIT
# Copyright 2016, Even Rouault
import math

def hillshade_int(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
                         raster_ysize, radius, gt, z, scale):
    ovr_scale_x = float(out_ar.shape[1] - 2 * radius) / xsize
    ovr_scale_y = float(out_ar.shape[0] - 2 * radius) / ysize
    ewres = gt[1] / ovr_scale_x
    nsres = gt[5] / ovr_scale_y
    inv_nsres = 1.0 / nsres
    inv_ewres = 1.0 / ewres

    az = 315
    alt = 45
    degreesToRadians = math.pi / 180

    sin_alt = math.sin(alt * degreesToRadians)
    azRadians = az * degreesToRadians
    z_scale_factor = z / (8 * scale)
    cos_alt_mul_z_scale_factor = \
              math.cos(alt * degreesToRadians) * z_scale_factor
    cos_az_mul_cos_alt_mul_z_scale_factor_mul_254 = \
                254 * math.cos(azRadians) * cos_alt_mul_z_scale_factor
    sin_az_mul_cos_alt_mul_z_scale_factor_mul_254 = \
                254 * math.sin(azRadians) * cos_alt_mul_z_scale_factor
    square_z_scale_factor = z_scale_factor * z_scale_factor
    sin_alt_mul_254 = 254.0 * sin_alt

    for j in range(radius, out_ar.shape[0]-radius):
        win_line = in_ar[0][j-radius:j+radius+1,:]
        for i in range(radius, out_ar.shape[1]-radius):
            win = win_line[:,i-radius:i+radius+1].tolist()
            x = inv_ewres * ((win[0][0] + win[1][0] + win[1][0] + win[2][0])-\
                             (win[0][2] + win[1][2] + win[1][2] + win[2][2]))
            y = inv_nsres * ((win[2][0] + win[2][1] + win[2][1] + win[2][2])-\
                             (win[0][0] + win[0][1] + win[0][1] + win[0][2]))
            xx_plus_yy = x * x + y * y
            cang_mul_254 = (sin_alt_mul_254 - \
                (y * cos_az_mul_cos_alt_mul_z_scale_factor_mul_254 - \
                    x * sin_az_mul_cos_alt_mul_z_scale_factor_mul_254)) / \
                math.sqrt(1 + square_z_scale_factor * xx_plus_yy)
            if cang_mul_254 < 0:
                out_ar[j,i] = 1
            else:
                out_ar[j,i] = 1 + round(cang_mul_254)

def hillshade(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
              raster_ysize, radius, gt, **kwargs):
    z = float(kwargs['z_factor'])
    scale= float(kwargs['scale'])
    hillshade_int(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
                  raster_ysize, radius, gt, z, scale)

]]>
    </PixelFunctionCode>
    <BufferRadius>1</BufferRadius>
    <SourceTransferType>Int16</SourceTransferType>
  </VRTRasterBand>
</VRTDataset>

You can completely offload the python code itself into a proper my_lib.py file
and just specify
    <PixelFunctionType>my_lib.hillshade</PixelFunctionType>

Technically, the interfacing with Python is done at run-time by dynamically
loading Python symbols with dlopen()/GetProcAddress(), when they are already
available in the process. For example if libgdal is loaded from a Python
interpreter, or from a binary like QGIS which has already loaded the Python
lib. Otherwise a few shared objects ("libpython2.7.so", "python27.dll",  
"libpython3.4m.so", etc.) are tried, unless the PYTHONSO config option is
specified to point to a precise filename. The advantage of this approach is that
the same GDAL library binary is compatible of all Python 2.X (tested: 2.6,
2.7) and 3.X (tested: 3.1, 3.4) versions, and any numpy version that comes in
the Python environment used (at compilation time you don't need any
python/numpy development package). The numpy dependency is not a critical one:
one could imagine a fallback mode where Python arrays would be used instead,
but this has likely little interest.

Successfully tested on Linux, MacOSX, FreeBSD and Windows. Some extra tweaking
of the predefined set of shared object names - that are probed when no already
loaded Python environment is found - might be needed.
The implementation should be thread-safe regarding use of the Python Global
Interpreter Lock (GIL).

I've also tested with the numba Just-In-Time compiler
(http://numba.pydata.org/) and it provides major performance improvements for
highly computational code (example given below).
With numba enabled, I found that my above Python hillshading on a 10Kx10K
float32 raster was even faster than gdaldem (the reason is that the Python
version is a bit simplified as it doesn't take into account input nodata
values), whereas the non-jit'ed one is 100x slower. When removing the nodata
flag, gdaldem is only twice faster as the jit'ed python code. So this is a good
sign that such approach isn't only a toy or just for prototyping.
Speaking of JIT, there's no provision (yet?) for interfacing with PyPy. Would
require a new backend as PyPy C API has nothing to do with the CPython one.

There are obvious security concerns in allowing Python code to be run when
getting the content of a vrt file. The GDAL_VRT_ENABLE_PYTHON config option =
IF_SAFE / NO / YES can be set to control the behaviour. The default is IF_SAFE
(can be change at compilation time by defining
-DGDAL_VRT_ENABLE_PYTHON_DEFAULT="NO" e.g. And Python code execution can be
completely disabled with -DGDAL_VRT_DISABLE_PYTHON). Safe must be understood
as: the code will not read, write, delete... files, spawn external code, do
network activity, etc. Said otherwise, the code will only do "maths". But
infinite looping is something definitely possible in the safe mode. The
heuristics of the IF_SAFE mode is rather basic and I'd be grateful if people
could point ways of breaking it. If any of the following strings - "import"
(unless it is "import numpy" / "from numpy import ...", "import math" / "from
math import ..." or "from numba import jit"), "eval", "compile", "open",
"load", "file", "input", "save", "memmap", "DataSource", "genfromtxt",
"getattr", "ctypeslib", "testing", "dump", "fromregex" - is found anywhere in
the code, then the code is considered unsafe (there are interestingly a lot of
methods in numpy to do file I/O. Hopefully I've captured them with the previous
filters). Another 'unsafe' pattern is when the pixel function references an
external module like my above my_lib.hillshade example (who knows if there
will not be some day a shutil.reformat_your_hard_drive function with the right
prototype...)

This new capability isn't yet documented in the VRT doc, although this message
will be a start.

I'm interested in feedback you may have.

And to conclude with a fun example: a raster with a Mandelbrot fractal. Just a
grey-level version. Let to the reader as an exercice: add a color table. To be
opened for example in QGIS and enjoy the almost infinite zoom feeling. Make
sure to disable contrast enhancement. It uses numba when available, and when
this is the case, it's really fast when paning/zooming.

<VRTDataset rasterXSize="100000000" rasterYSize="100000000">
  <VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand">
    <PixelFunctionLanguage>Python</PixelFunctionLanguage>
    <PixelFunctionType>mandelbrot</PixelFunctionType>
    <PixelFunctionCode>
      <![CDATA[
try:
    from numba import jit
    #print('Using numba')
    g_max_iterations = 100
except:
    class jit(object):
        def __init__(self, nopython = True, nogil = True):
            pass

        def __call__(self, f):
            return f

    #print('Using non-JIT version')
    g_max_iterations = 25

# Use a wrapper since the VRT code cannot access the jit decorated function
def mandelbrot(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
                        raster_ysize, r, gt, **kwargs):
    mandelbrot_jit(out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize,
g_max_iterations)

@jit(nopython=True, nogil=True)
def mandelbrot_jit(out_ar, xoff, yoff, xsize, ysize, raster_xsize,
                        raster_ysize, max_iterations):
    ovr_factor_y = float(out_ar.shape[0]) / ysize
    ovr_factor_x = float(out_ar.shape[1]) / xsize
    for j in range( out_ar.shape[0]):
        y0 = 2.0 * (yoff + j / ovr_factor_y) / raster_ysize - 1
        for i in range(out_ar.shape[1]):
            x0 = 3.5 * (xoff + i / ovr_factor_x) / raster_xsize - 2.5
            x = 0.0
            y = 0.0
            x2 = 0.0
            y2 = 0.0
            iteration = 0
            while x2 + y2 < 4 and iteration < max_iterations:
                y = 2*x*y + y0
                x = x2 - y2 + x0
                x2 = x * x
                y2 = y * y
                iteration += 1

            out_ar[j][i] = iteration * 255 / max_iterations
]]>
    </PixelFunctionCode>
    <Metadata>
      <MDI key="STATISTICS_MAXIMUM">255</MDI>
      <MDI key="STATISTICS_MEAN">127</MDI>
      <MDI key="STATISTICS_MINIMUM">0</MDI>
      <MDI key="STATISTICS_STDDEV">127</MDI>
    </Metadata>
    <ColorInterp>Gray</ColorInterp>
    <Histograms>
      <HistItem>
        <HistMin>-0.5</HistMin>
        <HistMax>255.5</HistMax>
        <BucketCount>256</BucketCount>
        <IncludeOutOfRange>0</IncludeOutOfRange>
        <Approximate>1</Approximate>
        <HistCounts>0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0</HistCounts>
      </HistItem>
    </Histograms>
  </VRTRasterBand>
</VRTDataset>

Statistics have been added just to make QGIS open the file a bit quicker.

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: VRT derived band pixel functions written in Python

Poughon Victor
Hi Even,

This is a really cool and really impressive feature!
Calling Python code from C++ without development packages as a dependency sounds like black magic to me. Obviously Python symbols have to be there at some point to execute Python code, so this is only usable from a binary that happens to load them already (CPython, QGIS, etc.), correct? In particular you say that it is "done at run-time by dynamically loading Python symbols". But I'm confused because later you mention incompatibilities issues between the CPython and Pypy API. GDAL's secret sauce, I guess...?
I'm also curious why it's possible to use numba but no pypy, which AFAIK are both python JITs? And finally did you consider using Cython (which claims pypy compatibility)?

Cheers,

Victor Poughon

________________________________________
De : gdal-dev [[hidden email]] de la part de Even Rouault [[hidden email]]
Envoyé : lundi 12 septembre 2016 14:31
À : [hidden email]
Objet : [gdal-dev] VRT derived band pixel functions written in Python

Hi,

I wanted to mention a new (and I think pretty cool ) feature I've added in
trunk: the possibility to define pixel functions in Python in a VRT derived
band.

Let's start with a simple example to multiply a raster by 1.5 :

<VRTDataset rasterXSize="20" rasterYSize="20">
  <SRS>EPSG:26711</SRS>
  <GeoTransform>440720,60,0,3751320,0,-60</GeoTransform>
  <VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand">
    <PixelFunctionType>multiply</PixelFunctionType>
    <PixelFunctionLanguage>Python</PixelFunctionLanguage>
    <PixelFunctionArguments factor="1.5"/>
    <PixelFunctionCode><![CDATA[
import numpy as np
def multiply(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
                   raster_ysize, buf_radius, gt, **kwargs):
    factor = float(kwargs['factor'])
    out_ar[:] = np.round_(np.clip(in_ar[0] * factor,0,255))
]]>
    </PixelFunctionCode>
    <SimpleSource>
      <SourceFilename relativeToVRT="1">byte.tif</SourceFilename>
    </SimpleSource>
  </VRTRasterBand>
</VRTDataset>

or to add 2 (or more) rasters:

<VRTDataset rasterXSize="20" rasterYSize="20">
  <SRS>EPSG:26711</SRS>
  <GeoTransform>440720,60,0,3751320,0,-60</GeoTransform>
  <VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand">
    <PixelFunctionType>add</PixelFunctionType>
    <PixelFunctionLanguage>Python</PixelFunctionLanguage>
    <PixelFunctionCode><![CDATA[
import numpy as np
def add(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
                   raster_ysize, buf_radius, gt, **kwargs):
    np.round_(np.clip(np.sum(in_ar, axis = 0, dtype = 'uint16'),0,255),
              out = out_ar)
]]>
    </PixelFunctionCode>
    <SimpleSource>
      <SourceFilename relativeToVRT="1">byte.tif</SourceFilename>
    </SimpleSource>
    <SimpleSource>
      <SourceFilename relativeToVRT="1">byte2.tif</SourceFilename>
    </SimpleSource>
  </VRTRasterBand>
</VRTDataset>


You can put any python module code inside PixelFunctionCode, with at least one
function with the following arguments :
- in_ar: list of input numpy arrays (one numpy array for each source)
- out_ar: output numpy array to fill (initialized at the right dimensions and
with the VRTRasterBand.dataType)
- xoff: pixel offset to the top left corner of the accessed region of the band
- yoff: line offset to the top left corner of the accessed region of the band
- xsize: width of the region of the accessed region of the band
- ysize: height of the region of the accessed region of the band
- raster_xsize: total with of the raster band
- raster_ysize: total height of the raster band
- buf_radius: radius of the buffer (in pixels) added to the left, right, top
and bottom of in_ar / out_ar
- gt: geotransform
- kwargs: dictionnary with user arguments defined in PixelFunctionArguments

For basic operations, you just need to care about in_ar and out_ar.

With all that, you can do interesting stuff like hillshading (code ported from
gdaldem):

<VRTDataset rasterXSize="121" rasterYSize="121">
  <SRS>EPSG:4326</SRS>
  <GeoTransform>-80.004166666666663,0.008333333333333,0,
44.004166666666663,0,-0.008333333333333</GeoTransform>
  <VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand">
    <ColorInterp>Gray</ColorInterp>
    <SimpleSource>
      <SourceFilename relativeToVRT="1">n43.dt0</SourceFilename>
    </SimpleSource>
    <PixelFunctionLanguage>Python</PixelFunctionLanguage>
    <PixelFunctionType>hillshade</PixelFunctionType>
    <PixelFunctionArguments scale="111120" z_factor="30" />
    <PixelFunctionCode>
      <![CDATA[
# Licence: X/MIT
# Copyright 2016, Even Rouault
import math

def hillshade_int(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
                         raster_ysize, radius, gt, z, scale):
    ovr_scale_x = float(out_ar.shape[1] - 2 * radius) / xsize
    ovr_scale_y = float(out_ar.shape[0] - 2 * radius) / ysize
    ewres = gt[1] / ovr_scale_x
    nsres = gt[5] / ovr_scale_y
    inv_nsres = 1.0 / nsres
    inv_ewres = 1.0 / ewres

    az = 315
    alt = 45
    degreesToRadians = math.pi / 180

    sin_alt = math.sin(alt * degreesToRadians)
    azRadians = az * degreesToRadians
    z_scale_factor = z / (8 * scale)
    cos_alt_mul_z_scale_factor = \
              math.cos(alt * degreesToRadians) * z_scale_factor
    cos_az_mul_cos_alt_mul_z_scale_factor_mul_254 = \
                254 * math.cos(azRadians) * cos_alt_mul_z_scale_factor
    sin_az_mul_cos_alt_mul_z_scale_factor_mul_254 = \
                254 * math.sin(azRadians) * cos_alt_mul_z_scale_factor
    square_z_scale_factor = z_scale_factor * z_scale_factor
    sin_alt_mul_254 = 254.0 * sin_alt

    for j in range(radius, out_ar.shape[0]-radius):
        win_line = in_ar[0][j-radius:j+radius+1,:]
        for i in range(radius, out_ar.shape[1]-radius):
            win = win_line[:,i-radius:i+radius+1].tolist()
            x = inv_ewres * ((win[0][0] + win[1][0] + win[1][0] + win[2][0])-\
                             (win[0][2] + win[1][2] + win[1][2] + win[2][2]))
            y = inv_nsres * ((win[2][0] + win[2][1] + win[2][1] + win[2][2])-\
                             (win[0][0] + win[0][1] + win[0][1] + win[0][2]))
            xx_plus_yy = x * x + y * y
            cang_mul_254 = (sin_alt_mul_254 - \
                (y * cos_az_mul_cos_alt_mul_z_scale_factor_mul_254 - \
                    x * sin_az_mul_cos_alt_mul_z_scale_factor_mul_254)) / \
                math.sqrt(1 + square_z_scale_factor * xx_plus_yy)
            if cang_mul_254 < 0:
                out_ar[j,i] = 1
            else:
                out_ar[j,i] = 1 + round(cang_mul_254)

def hillshade(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
              raster_ysize, radius, gt, **kwargs):
    z = float(kwargs['z_factor'])
    scale= float(kwargs['scale'])
    hillshade_int(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
                  raster_ysize, radius, gt, z, scale)

]]>
    </PixelFunctionCode>
    <BufferRadius>1</BufferRadius>
    <SourceTransferType>Int16</SourceTransferType>
  </VRTRasterBand>
</VRTDataset>

You can completely offload the python code itself into a proper my_lib.py file
and just specify
    <PixelFunctionType>my_lib.hillshade</PixelFunctionType>

Technically, the interfacing with Python is done at run-time by dynamically
loading Python symbols with dlopen()/GetProcAddress(), when they are already
available in the process. For example if libgdal is loaded from a Python
interpreter, or from a binary like QGIS which has already loaded the Python
lib. Otherwise a few shared objects ("libpython2.7.so", "python27.dll",
"libpython3.4m.so", etc.) are tried, unless the PYTHONSO config option is
specified to point to a precise filename. The advantage of this approach is that
the same GDAL library binary is compatible of all Python 2.X (tested: 2.6,
2.7) and 3.X (tested: 3.1, 3.4) versions, and any numpy version that comes in
the Python environment used (at compilation time you don't need any
python/numpy development package). The numpy dependency is not a critical one:
one could imagine a fallback mode where Python arrays would be used instead,
but this has likely little interest.

Successfully tested on Linux, MacOSX, FreeBSD and Windows. Some extra tweaking
of the predefined set of shared object names - that are probed when no already
loaded Python environment is found - might be needed.
The implementation should be thread-safe regarding use of the Python Global
Interpreter Lock (GIL).

I've also tested with the numba Just-In-Time compiler
(http://numba.pydata.org/) and it provides major performance improvements for
highly computational code (example given below).
With numba enabled, I found that my above Python hillshading on a 10Kx10K
float32 raster was even faster than gdaldem (the reason is that the Python
version is a bit simplified as it doesn't take into account input nodata
values), whereas the non-jit'ed one is 100x slower. When removing the nodata
flag, gdaldem is only twice faster as the jit'ed python code. So this is a good
sign that such approach isn't only a toy or just for prototyping.
Speaking of JIT, there's no provision (yet?) for interfacing with PyPy. Would
require a new backend as PyPy C API has nothing to do with the CPython one.

There are obvious security concerns in allowing Python code to be run when
getting the content of a vrt file. The GDAL_VRT_ENABLE_PYTHON config option =
IF_SAFE / NO / YES can be set to control the behaviour. The default is IF_SAFE
(can be change at compilation time by defining
-DGDAL_VRT_ENABLE_PYTHON_DEFAULT="NO" e.g. And Python code execution can be
completely disabled with -DGDAL_VRT_DISABLE_PYTHON). Safe must be understood
as: the code will not read, write, delete... files, spawn external code, do
network activity, etc. Said otherwise, the code will only do "maths". But
infinite looping is something definitely possible in the safe mode. The
heuristics of the IF_SAFE mode is rather basic and I'd be grateful if people
could point ways of breaking it. If any of the following strings - "import"
(unless it is "import numpy" / "from numpy import ...", "import math" / "from
math import ..." or "from numba import jit"), "eval", "compile", "open",
"load", "file", "input", "save", "memmap", "DataSource", "genfromtxt",
"getattr", "ctypeslib", "testing", "dump", "fromregex" - is found anywhere in
the code, then the code is considered unsafe (there are interestingly a lot of
methods in numpy to do file I/O. Hopefully I've captured them with the previous
filters). Another 'unsafe' pattern is when the pixel function references an
external module like my above my_lib.hillshade example (who knows if there
will not be some day a shutil.reformat_your_hard_drive function with the right
prototype...)

This new capability isn't yet documented in the VRT doc, although this message
will be a start.

I'm interested in feedback you may have.

And to conclude with a fun example: a raster with a Mandelbrot fractal. Just a
grey-level version. Let to the reader as an exercice: add a color table. To be
opened for example in QGIS and enjoy the almost infinite zoom feeling. Make
sure to disable contrast enhancement. It uses numba when available, and when
this is the case, it's really fast when paning/zooming.

<VRTDataset rasterXSize="100000000" rasterYSize="100000000">
  <VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand">
    <PixelFunctionLanguage>Python</PixelFunctionLanguage>
    <PixelFunctionType>mandelbrot</PixelFunctionType>
    <PixelFunctionCode>
      <![CDATA[
try:
    from numba import jit
    #print('Using numba')
    g_max_iterations = 100
except:
    class jit(object):
        def __init__(self, nopython = True, nogil = True):
            pass

        def __call__(self, f):
            return f

    #print('Using non-JIT version')
    g_max_iterations = 25

# Use a wrapper since the VRT code cannot access the jit decorated function
def mandelbrot(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
                        raster_ysize, r, gt, **kwargs):
    mandelbrot_jit(out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize,
g_max_iterations)

@jit(nopython=True, nogil=True)
def mandelbrot_jit(out_ar, xoff, yoff, xsize, ysize, raster_xsize,
                        raster_ysize, max_iterations):
    ovr_factor_y = float(out_ar.shape[0]) / ysize
    ovr_factor_x = float(out_ar.shape[1]) / xsize
    for j in range( out_ar.shape[0]):
        y0 = 2.0 * (yoff + j / ovr_factor_y) / raster_ysize - 1
        for i in range(out_ar.shape[1]):
            x0 = 3.5 * (xoff + i / ovr_factor_x) / raster_xsize - 2.5
            x = 0.0
            y = 0.0
            x2 = 0.0
            y2 = 0.0
            iteration = 0
            while x2 + y2 < 4 and iteration < max_iterations:
                y = 2*x*y + y0
                x = x2 - y2 + x0
                x2 = x * x
                y2 = y * y
                iteration += 1

            out_ar[j][i] = iteration * 255 / max_iterations
]]>
    </PixelFunctionCode>
    <Metadata>
      <MDI key="STATISTICS_MAXIMUM">255</MDI>
      <MDI key="STATISTICS_MEAN">127</MDI>
      <MDI key="STATISTICS_MINIMUM">0</MDI>
      <MDI key="STATISTICS_STDDEV">127</MDI>
    </Metadata>
    <ColorInterp>Gray</ColorInterp>
    <Histograms>
      <HistItem>
        <HistMin>-0.5</HistMin>
        <HistMax>255.5</HistMax>
        <BucketCount>256</BucketCount>
        <IncludeOutOfRange>0</IncludeOutOfRange>
        <Approximate>1</Approximate>
        <HistCounts>0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0</HistCounts>
      </HistItem>
    </Histograms>
  </VRTRasterBand>
</VRTDataset>

Statistics have been added just to make QGIS open the file a bit quicker.

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: VRT derived band pixel functions written in Python

Sean Gillies-3
In reply to this post by Even Rouault-2
Hi Even,

On Mon, Sep 12, 2016 at 2:31 PM, Even Rouault <[hidden email]> wrote:
Hi,

I wanted to mention a new (and I think pretty cool ) feature I've added in
trunk: the possibility to define pixel functions in Python in a VRT derived
band.

...

There are obvious security concerns in allowing Python code to be run when
getting the content of a vrt file. The GDAL_VRT_ENABLE_PYTHON config option =
IF_SAFE / NO / YES can be set to control the behaviour. The default is IF_SAFE
(can be change at compilation time by defining
-DGDAL_VRT_ENABLE_PYTHON_DEFAULT="NO" e.g. And Python code execution can be
completely disabled with -DGDAL_VRT_DISABLE_PYTHON). Safe must be understood
as: the code will not read, write, delete... files, spawn external code, do
network activity, etc. Said otherwise, the code will only do "maths". But
infinite looping is something definitely possible in the safe mode. The
heuristics of the IF_SAFE mode is rather basic and I'd be grateful if people
could point ways of breaking it. If any of the following strings - "import"
(unless it is "import numpy" / "from numpy import ...", "import math" / "from
math import ..." or "from numba import jit"), "eval", "compile", "open",
"load", "file", "input", "save", "memmap", "DataSource", "genfromtxt",
"getattr", "ctypeslib", "testing", "dump", "fromregex" - is found anywhere in
the code, then the code is considered unsafe (there are interestingly a lot of
methods in numpy to do file I/O. Hopefully I've captured them with the previous
filters). Another 'unsafe' pattern is when the pixel function references an
external module like my above my_lib.hillshade example (who knows if there
will not be some day a shutil.reformat_your_hard_drive function with the right
prototype...)

This new capability isn't yet documented in the VRT doc, although this message
will be a start.

I'm interested in feedback you may have.

I found http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html to be a good intro to the risks of eval'ing untrusted Python code. Mentioned in there is a notable attempt to make a secure subset of Python called "pysandbox", but its developer has since declared it "broken by design": https://lwn.net/Articles/574215/. I'm not knowledgeable enough about sandboxing (OS or otherwise) to say if that's right.

I see that in GDAL 2.0+ we can set options in the VRT XML itself. Is it possible to set GDAL_VRT_ENABLE_PYTHON=YES in a VRT and thus override the reader's own trust policies? My ignorance of how GDAL separates "open" options from "config" options might be on display in this question.

My $.02 is that since "is safe" will be hard to guarantee (it's an outstanding unsolved Python community issue), removing "IF_SAFE" from the options would be a good thing and that the default for GDAL_VRT_ENABLE_PYTHON should be "NO".

--
Sean Gillies

_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: VRT derived band pixel functions written in Python

Even Rouault-2
In reply to this post by Poughon Victor
Le lundi 12 septembre 2016 16:59:34, Poughon Victor a écrit :
> Hi Even,
>
> This is a really cool and really impressive feature!
> Calling Python code from C++ without development packages as a dependency
> sounds like black magic to me.

No black magic. Just setting function pointers using dlopen() :
https://github.com/OSGeo/gdal/blob/trunk/gdal/frmts/vrt/vrtderivedrasterband.cpp#L390

> Obviously Python symbols have to be there
> at some point to execute Python code, so this is only usable from a binary
> that happens to load them already (CPython, QGIS, etc.), correct?

No, there are 2 modes:
- GDAL is loaded after something else has already loaded python (so your
CPython, QGIS)
- GDAL cannot find any python symbols in the already available symbols of the
process, in which case it wil try to load a few reasonable shared objects like
"libpython2.7.so", etc...

See
https://github.com/OSGeo/gdal/blob/trunk/gdal/frmts/vrt/vrtderivedrasterband.cpp#L238 
and below lines

> But I'm confused because later you mention
> incompatibilities issues between the CPython and Pypy API. GDAL's secret
> sauce, I guess...?

Nothing GDAL specific here. It is just that the way to embed PyPy from C is
completely different from CPython :

http://doc.pypy.org/en/latest/embedding.html
vs
https://docs.python.org/2/extending/embedding.html

> I'm also curious why it's possible to use numba but no
> pypy, which AFAIK are both python JITs?

numba is "just" a CPython extension (a non-trivial one dragging llvmlite
etc...), so once you support CPython, it comes for free to use it, provided
you use the right import and function decorations in your python code.

> And finally did you consider using
> Cython (which claims pypy compatibility)?

No, I didn't investigate that.

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: VRT derived band pixel functions written in Python

Even Rouault-2
In reply to this post by Sean Gillies-3
> I found http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html
> to be a good intro to the risks of eval'ing untrusted Python code.
> Mentioned in there is a notable attempt to make a secure subset of Python
> called "pysandbox", but its developer has since declared it "broken by
> design": https://lwn.net/Articles/574215/. I'm not knowledgeable enough
> about sandboxing (OS or otherwise) to say if that's right.

Those links are fascinating. I wouldn't have imagined that there was valid
Python code that could crash the Python interpreter, almost by design ! (so
the osgeo.gdal gotchas are not so bad after all :-) )
OK, so I've followed your suggestion:
* IF_SAFE mode is removed (actually between #if 0 / #endif if someone wants to
pursue the sandboxing effort),
* and the default of GDAL_VRT_ENABLE_PYTHON is now NO.

>
> I see that in GDAL 2.0+ we can set options in the VRT XML itself. Is it
> possible to set GDAL_VRT_ENABLE_PYTHON=YES in a VRT and thus override the
> reader's own trust policies?

No, that's not possible. GDAL_VRT_ENABLE_PYTHON is a config option only, so can
only be set as a env variable or through CPLSetConfigOption().

We could imagine to provide it as an option of the VRT indeed, but then there
would be interesting situations like the following. Imagine something called
"my.tif" with the following content :

<VRTDataset rasterXSize="20" rasterYSize="20">
  <VRTRasterBand dataType="Byte" band="1">
    <ColorInterp>Gray</ColorInterp>
    <SimpleSource>
      <SourceFilename relativeToVRT="1"><![CDATA[
<VRTDataset .... evil Python code here ... .... >]]></SourceFilename>
      <OpenOptions>
        <OOI key="ENABLE_PYTHON">YES</OOI>
      </OpenOptions>
    </SimpleSource>
  </VRTRasterBand>
</VRTDataset>

The internalized VRT would then be allowed with ENABLE_PYTHON=YES.

So it's probably a good idea to not provide an open option for now. One could
perhaps imagine to introduce one, provided we make it illegal as an item in
<OpenOptions>, so it can only be provided by "top-level" GDALOpenEx() calls.

> My ignorance of how GDAL separates "open"
> options from "config" options might be on display in this question.

They are just 2 different mechanisms of providing options. There's no automatic
bridge between both concepts. Sometimes some option may exist in both worlds
(because historically there was only global config option, but in some use
cases, some options mostly make sense per dataset) or in just one of them.

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: VRT derived band pixel functions written in Python

jramm
Incredible!!!!
Really cool initiative. 

I had a play around and wanted a way to be able to write the pixel function in a 'stand alone' python module and generate the VRT from it. This would allow independent testing and easier maintenance of the python code. It is fairly easy using lxml to build up the VRT dynamically:


In the example I more-or-less recreate the hillshade example Even posted, although there is a lot of functionality missing from the VRT-builder still, so some nodes are not present. 
It would be quite easy therefore, to expand this to a sort of GDAL 'algorithms library' (I think it can wholesale replace what I had tried here: https://github.com/JamesRamm/GeoAlg).

One thing I would note (if I understand correctly) is that you have specified any "import" statement as unsafe except numpy/math and numba? It might be better to either allow imports or not allow them at all (unless the config option is set) as I don't see why those libraries would be any more or any less 'safe' than another library...e.g. another standard library module.

I would agree with Sean and remove 'IF_SAFE' and leave it entirely to the users discretion. Many users who are working with their own local data will likely have no problems about just setting it to 'YES' and I imagine users for whom it would be a probably might already have their own rules on what constitutes 'safe'?

On a sidenote - does this new functionality overlap considerably with the gdal map algebra project?

On 12 September 2016 at 19:25, Even Rouault <[hidden email]> wrote:
> I found http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html
> to be a good intro to the risks of eval'ing untrusted Python code.
> Mentioned in there is a notable attempt to make a secure subset of Python
> called "pysandbox", but its developer has since declared it "broken by
> design": https://lwn.net/Articles/574215/. I'm not knowledgeable enough
> about sandboxing (OS or otherwise) to say if that's right.

Those links are fascinating. I wouldn't have imagined that there was valid
Python code that could crash the Python interpreter, almost by design ! (so
the osgeo.gdal gotchas are not so bad after all :-) )
OK, so I've followed your suggestion:
* IF_SAFE mode is removed (actually between #if 0 / #endif if someone wants to
pursue the sandboxing effort),
* and the default of GDAL_VRT_ENABLE_PYTHON is now NO.

>
> I see that in GDAL 2.0+ we can set options in the VRT XML itself. Is it
> possible to set GDAL_VRT_ENABLE_PYTHON=YES in a VRT and thus override the
> reader's own trust policies?

No, that's not possible. GDAL_VRT_ENABLE_PYTHON is a config option only, so can
only be set as a env variable or through CPLSetConfigOption().

We could imagine to provide it as an option of the VRT indeed, but then there
would be interesting situations like the following. Imagine something called
"my.tif" with the following content :

<VRTDataset rasterXSize="20" rasterYSize="20">
  <VRTRasterBand dataType="Byte" band="1">
    <ColorInterp>Gray</ColorInterp>
    <SimpleSource>
      <SourceFilename relativeToVRT="1"><![CDATA[
<VRTDataset .... evil Python code here ... .... >]]></SourceFilename>
      <OpenOptions>
        <OOI key="ENABLE_PYTHON">YES</OOI>
      </OpenOptions>
    </SimpleSource>
  </VRTRasterBand>
</VRTDataset>

The internalized VRT would then be allowed with ENABLE_PYTHON=YES.

So it's probably a good idea to not provide an open option for now. One could
perhaps imagine to introduce one, provided we make it illegal as an item in
<OpenOptions>, so it can only be provided by "top-level" GDALOpenEx() calls.

> My ignorance of how GDAL separates "open"
> options from "config" options might be on display in this question.

They are just 2 different mechanisms of providing options. There's no automatic
bridge between both concepts. Sometimes some option may exist in both worlds
(because historically there was only global config option, but in some use
cases, some options mostly make sense per dataset) or in just one of them.

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev


_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: [gdal-dev] VRT derived band pixel functions written in Python

Rutger
In reply to this post by Even Rouault-2
Hi Even,

This is really amazing. I'm becoming more and more a fan of Numba for number crunching, so this certainly makes my day. As soon as i can find a Win x64 dev version on a Conda channel I'll give it a try.

A use case that comes to mind, and which i run into regularly, is when i want to do some simple aggregation before using something like gdalwarp. For example when you have a file containing 24 hourly temperature values, and you are only interested in the daily mean. Currently i either aggregate before warping and write the intermediates to disk, or aggregate after warping which is computationally inefficient. Neither is optimal.

A few questions, can you access a files metadata from within a pixel function? This would perhaps allow for example interpolating atmospheric data to the overpass time of a satellite image.

Do the pixel functions also work with @numba.vectorize(), in particular when targeting 'parallel' or 'cuda'. And would that give parallel processing for both IO and calculations?

You should give the folks at Continuum a heads up, i'm sure they appreciate seeing Numba used like this.


Regards,
Rutger


Reply | Threaded
Open this post in threaded view
|

Re: VRT derived band pixel functions written in Python

Even Rouault-2
In reply to this post by jramm
Le lundi 12 septembre 2016 23:38:41, James Ramm a écrit :

> Incredible!!!!
> Really cool initiative.
>
> I had a play around and wanted a way to be able to write the pixel function
> in a 'stand alone' python module and generate the VRT from it. This would
> allow independent testing and easier maintenance of the python code. It is
> fairly easy using lxml to build up the VRT dynamically:
>
> https://github.com/JamesRamm/gdal_pixel_functions
>
> In the example I more-or-less recreate the hillshade example Even posted,
> although there is a lot of functionality missing from the VRT-builder
> still, so some nodes are not present.
> It would be quite easy therefore, to expand this to a sort of GDAL
> 'algorithms library' (I think it can wholesale replace what I had tried
> here: https://github.com/JamesRamm/GeoAlg).

a "-of VRT" option in current gdal_calc.py could also be cool (at least for
simple one line operations)

Programmatic creation of VRT derived band, with the previous scope (ie calling
a native registered pixel function), already existed. See
https://github.com/OSGeo/gdal/blob/trunk/autotest/gdrivers/vrtderived.py#L175
I didn't extend for now the VRTDataset::AddBand implementation to set the new
elements needed for the Python stuff, but that would certainly be doable.

>
> One thing I would note (if I understand correctly) is that you have
> specified any "import" statement as unsafe except numpy/math and numba? It
> might be better to either allow imports or not allow them at all (unless
> the config option is set) as I don't see why those libraries would be any
> more or any less 'safe' than another library...e.g. another standard
> library module.

Well, the idea was that those modules should only do math stuff and not
interact with the rest of the system (contrary to 'os', 'system' and the like)
But is in the numpy case, it is not actually true since it has I/O methods,
hence my naive attempt to blacklist those by names.

>
> I would agree with Sean and remove 'IF_SAFE' and leave it entirely to the
> users discretion.

That's what I've done (see previous message)

> Many users who are working with their own local data will
> likely have no problems about just setting it to 'YES' and I imagine users
> for whom it would be a probably might already have their own rules on what
> constitutes 'safe'?
>
> On a sidenote - does this new functionality overlap considerably with the
> gdal map algebra project?

Indeed. I think there might be several incarnations of such capability and I'm
not sure we can come up with a single one that will satisfy all needs and
constraints. The VRT way has some use cases where it shines (on-the-fly
computation of a region of interest. easy integration with upper levels in the
stack: QGIS, MapServer, etc...), whereas you could imagine something else
efficient in doing whole raster processing with parallelism etc...

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: VRT derived band pixel functions written in Python

Even Rouault-2
In reply to this post by Rutger
Le mardi 13 septembre 2016 09:02:09, Rutger a écrit :

> Hi Even,
>
> This is really amazing. I'm becoming more and more a fan of Numba for
> number crunching, so this certainly makes my day. As soon as i can find a
> Win x64 dev version on a Conda channel I'll give it a try.
>
> A use case that comes to mind, and which i run into regularly, is when i
> want to do some simple aggregation before using something like gdalwarp.
> For example when you have a file containing 24 hourly temperature values,
> and you are only interested in the daily mean. Currently i either
> aggregate before warping and write the intermediates to disk, or aggregate
> after warping which is computationally inefficient. Neither is optimal.
>
> A few questions, can you access a files metadata from within a pixel
> function? This would perhaps allow for example interpolating atmospheric
> data to the overpass time of a satellite image.

This is indeed a question I've asked myself, if the prototype of the pixel
function contained enough information. I had not put initially the
geotransform and whole raster dimensions for example, but found it was
necessary to have correct behaviour for hillshading at different zoom scales.
And I also added afterwards the xoff, yoff to be able to do Mandelbrot
generation (I guess there might be some more valid use cases ;-)). And then
the user provided dictionnary to have some parametrization of the algorithm
instead of hardcoding constants in it, so that you can off-load it into a
general lib. And I stopped at that point.

For your use case, 2 possibilities currently, using the
<PixelFunctionArguments> capability:
- either you get the metadata items you need at VRT generation and put them as
arguments
- either you only pass the names of the sources as arguments, and you use the
GDAL Python bindings themselves inside the pixel function itself to get access
to everything needed. But I'm just thinking it might be a bit inefficient to do
that for each RasterIO() request.

<loud_thinking>

Perhaps a compromise would be to allow, in addition to simple functions, to
specify a class name, where the constructor would receive values that don't
change from one call to another one, and a calc() method would receive the
ones that change at each RasterIO() request

def MyProcessingClass:
        def __init__(self, raster_xsize, raster_ysize, buf_radius, gt,
source_filenames, **kwargs):
                save above parameters that you may need in calc()
                do one time things like gdal.Open()'ing sources to get metadata

        def calc( self, in_ar, out_ar, xoff, yoff, xsize, ysize ):
                do your computation

We could possibly pass the sources as datasets themselves, since they are
actually already opened, but that would make the osgeo.gdal bindings
availability a requirement (well, we could as a fallback pass the filenames if
the import fails)

</loud_thinking>

>
> Do the pixel functions also work with @numba.vectorize(), in particular
> when targeting 'parallel' or 'cuda'.

I've only scratched up the surface of Numba (didn't know it 2 days ago). I
guess this might work (might only be interested if the
VRTDerivedRasterBand::IRasterIO() is called with a big enough region. Which
depends on the pixel access pattern of upper layers). The VRT drivers just
calls a Python function that takes numpy arrays and a few extra args.

> And would that give parallel
> processing for both IO and calculations?

Only calculations. I/O is done before going to Python and after returning from
it.

Actually... if you specify zero source in the VRT, then it is up to you to do
the read operations the way you like in Python, so you could possibly
parallelize them there.

>You should give the folks at Continuum a heads up, i'm sure they appreciate
> seeing Numba used like this.

I have no connection with them, but yes their project rocks.

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: VRT derived band pixel functions written in Python

Rutger
I overlooked the fact that it still moves through Python, is that the 'only' hurdle preventing parallel IO? Since gdalwarp for example has the -multi flag, it seems as if GDAL is capable of it, or is that a specific/specialized implementation?

Numba has several options which might eliminate using Python during execution. There are c-callbacks:
http://numba.pydata.org/numba-doc/dev/user/cfunc.html

It probably works by using the Numpy C-API (which i have zero experience with). I don't know if its possible that other programs, like GDAL, can also use those compiled functions without moving to Python first.

There is also ahead-of-time compilation (AOT):
http://numba.pydata.org/numba-doc/dev/user/pycc.html

AOT has the benefit that users only need Numpy as a dependency, and don't need Numba/llvm. There some drawbacks as well, like no longer being able to compile optimizations for the hardware its running on.


Regards,
Rutger

Even Rouault-2 wrote
Le mardi 13 septembre 2016 09:02:09, Rutger a écrit :

I've only scratched up the surface of Numba (didn't know it 2 days ago). I
guess this might work (might only be interested if the
VRTDerivedRasterBand::IRasterIO() is called with a big enough region. Which
depends on the pixel access pattern of upper layers). The VRT drivers just
calls a Python function that takes numpy arrays and a few extra args.

> And would that give parallel
> processing for both IO and calculations?

Only calculations. I/O is done before going to Python and after returning from
it.

Actually... if you specify zero source in the VRT, then it is up to you to do
the read operations the way you like in Python, so you could possibly
parallelize them there.


Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: VRT derived band pixel functions written in Python

Even Rouault-2
Le mardi 13 septembre 2016 11:07:39, Rutger a écrit :
> I overlooked the fact that it still moves through Python, is that the
> 'only' hurdle preventing parallel IO?

Not sure to understand your question. But if you have several sources, you
could potentially do parallelized reading of them from the Python code by
using Python threads and GDAL Python API. But looking in the SWIG generated
code, it doesn't seem that SWIG releases the GIL automatically before calling
native code. Hum... So that should probably added manually, at least around
GDALRasterIO() calls, otherwise you'll get zero perf improvements.

> Since gdalwarp for example has the
> -multi flag, it seems as if GDAL is capable of it, or is that a
> specific/specialized implementation?

Parallelized I/O doesn't mean much by itself without more context. You may
want to parallelize reading of different regions of the same dataset, or
parallelize reading of different datasets. Due to GDAL objects not being
thread-safe, the first case (reading of different regions of the same dataset)
can be solved with the second one by opening several datasets for the same
filename.

Regarding gdalwarp -multi, here's how that works. When you warp a dataset,
there's a list of all chunks (windows) to be processed that is generated.
gdalwarp -multi does the following

Thread I/O Thread computation
Read data for chunk 1
Read data for chunk 2 Do calculations for chunk 1
Write output of chunk 1 Do calculations for chunk 2
Read data for chunk 3
Write output of chunk 2 Do calculations for chunk 3


>
> Numba has several options which might eliminate using Python during
> execution. There are c-callbacks:
> http://numba.pydata.org/numba-doc/dev/user/cfunc.html

You can also use @jit(nopython=True, nogil=True) and your Python method will
end up being pure native code (provided that you don't use too high level stuff
otherwise the jit'ification will fail with an exception).

And for code that is not inlined in the VRT, you can also add cache=True so
that the jit'ification can be reused.

With all that the cost of the Python layer becomes neglectable (except loading
the Python environment the first time, if not already loaded, but for a
computation that will be longer than a few seconds, that's not really a big
deal)

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: VRT derived band pixel functions written in Python

jramm
I think you can call SWIG with the -threads argument on the command line so it will always release the GIL. Could be an easy option if it works

On Tuesday, 13 September 2016, Even Rouault <[hidden email]> wrote:
Le mardi 13 septembre 2016 11:07:39, Rutger a écrit :
> I overlooked the fact that it still moves through Python, is that the
> 'only' hurdle preventing parallel IO?

Not sure to understand your question. But if you have several sources, you
could potentially do parallelized reading of them from the Python code by
using Python threads and GDAL Python API. But looking in the SWIG generated
code, it doesn't seem that SWIG releases the GIL automatically before calling
native code. Hum... So that should probably added manually, at least around
GDALRasterIO() calls, otherwise you'll get zero perf improvements.

> Since gdalwarp for example has the
> -multi flag, it seems as if GDAL is capable of it, or is that a
> specific/specialized implementation?

Parallelized I/O doesn't mean much by itself without more context. You may
want to parallelize reading of different regions of the same dataset, or
parallelize reading of different datasets. Due to GDAL objects not being
thread-safe, the first case (reading of different regions of the same dataset)
can be solved with the second one by opening several datasets for the same
filename.

Regarding gdalwarp -multi, here's how that works. When you warp a dataset,
there's a list of all chunks (windows) to be processed that is generated.
gdalwarp -multi does the following

Thread I/O                                              Thread computation
Read data for chunk 1
Read data for chunk 2                   Do calculations for chunk 1
Write output of chunk 1                 Do calculations for chunk 2
Read data for chunk 3
Write output of chunk 2                 Do calculations for chunk 3


>
> Numba has several options which might eliminate using Python during
> execution. There are c-callbacks:
> http://numba.pydata.org/numba-doc/dev/user/cfunc.html

You can also use @jit(nopython=True, nogil=True) and your Python method will
end up being pure native code (provided that you don't use too high level stuff
otherwise the jit'ification will fail with an exception).

And for code that is not inlined in the VRT, you can also add cache=True so
that the jit'ification can be reused.

With all that the cost of the Python layer becomes neglectable (except loading
the Python environment the first time, if not already loaded, but for a
computation that will be longer than a few seconds, that's not really a big
deal)

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
<a href="javascript:;" onclick="_e(event, &#39;cvml&#39;, &#39;gdal-dev@lists.osgeo.org&#39;)">gdal-dev@...
http://lists.osgeo.org/mailman/listinfo/gdal-dev

_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: VRT derived band pixel functions written in Python

Even Rouault-2
Le mardi 13 septembre 2016 21:22:20, James Ramm a écrit :
> I think you can call SWIG with the -threads argument on the command line so
> it will always release the GIL. Could be an easy option if it works

That's mostly what I've done. See my other message :
https://lists.osgeo.org/pipermail/gdal-dev/2016-September/045155.html

>
> On Tuesday, 13 September 2016, Even Rouault <[hidden email]>
>
> wrote:
> > Le mardi 13 septembre 2016 11:07:39, Rutger a écrit :
> > > I overlooked the fact that it still moves through Python, is that the
> > > 'only' hurdle preventing parallel IO?
> >
> > Not sure to understand your question. But if you have several sources,
> > you could potentially do parallelized reading of them from the Python
> > code by using Python threads and GDAL Python API. But looking in the
> > SWIG generated code, it doesn't seem that SWIG releases the GIL
> > automatically before calling
> > native code. Hum... So that should probably added manually, at least
> > around GDALRasterIO() calls, otherwise you'll get zero perf
> > improvements.
> >
> > > Since gdalwarp for example has the
> > > -multi flag, it seems as if GDAL is capable of it, or is that a
> > > specific/specialized implementation?
> >
> > Parallelized I/O doesn't mean much by itself without more context. You
> > may want to parallelize reading of different regions of the same
> > dataset, or parallelize reading of different datasets. Due to GDAL
> > objects not being thread-safe, the first case (reading of different
> > regions of the same dataset)
> > can be solved with the second one by opening several datasets for the
> > same filename.
> >
> > Regarding gdalwarp -multi, here's how that works. When you warp a
> > dataset, there's a list of all chunks (windows) to be processed that is
> > generated. gdalwarp -multi does the following
> >
> > Thread I/O                                              Thread
> > computation Read data for chunk 1
> > Read data for chunk 2                   Do calculations for chunk 1
> > Write output of chunk 1                 Do calculations for chunk 2
> > Read data for chunk 3
> > Write output of chunk 2                 Do calculations for chunk 3
> >
> > > Numba has several options which might eliminate using Python during
> > > execution. There are c-callbacks:
> > > http://numba.pydata.org/numba-doc/dev/user/cfunc.html
> >
> > You can also use @jit(nopython=True, nogil=True) and your Python method
> > will
> > end up being pure native code (provided that you don't use too high level
> > stuff
> > otherwise the jit'ification will fail with an exception).
> >
> > And for code that is not inlined in the VRT, you can also add cache=True
> > so that the jit'ification can be reused.
> >
> > With all that the cost of the Python layer becomes neglectable (except
> > loading
> > the Python environment the first time, if not already loaded, but for a
> > computation that will be longer than a few seconds, that's not really a
> > big deal)
> >
> > --
> > Spatialys - Geospatial professional services
> > http://www.spatialys.com
> > _______________________________________________
> > gdal-dev mailing list
> > [hidden email] <javascript:;>
> > http://lists.osgeo.org/mailman/listinfo/gdal-dev

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: VRT derived band pixel functions written in Python

Rutger
In reply to this post by Even Rouault-2
Then I guess i'm mistaken in thinking that Python would become a slowdown in such a case. It's been a while since i tested it. Anyway, thanks for your comments and efforts to improve the performance.

I couldnt find any Conda channels* which build from trunk, so i probably have to wait a while before i can
try it out, something to look forward to. :)

* https://anaconda.org/search?q=gdal


Regards,
Rutger

Even Rouault-2 wrote
Le mardi 13 septembre 2016 11:07:39, Rutger a écrit :
....

With all that the cost of the Python layer becomes neglectable (except loading
the Python environment the first time, if not already loaded, but for a
computation that will be longer than a few seconds, that's not really a big
deal)

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: VRT derived band pixel functions written in Python

Even Rouault-2
Le mercredi 14 septembre 2016 09:02:07, Rutger a écrit :
> Then I guess i'm mistaken in thinking that Python would become a slowdown
> in such a case. It's been a while since i tested it. Anyway, thanks for
> your comments and efforts to improve the performance.
>
> I couldnt find any Conda channels* which build from trunk, so i probably
> have to wait a while before i can
> try it out, something to look forward to. :)
>
> * https://anaconda.org/search?q=gdal

You can also grab a nightly build at
http://www.gisinternals.com/development.php

>
>
> Regards,
> Rutger
>
>
> Even Rouault-2 wrote
>
> > Le mardi 13 septembre 2016 11:07:39, Rutger a écrit :
> > ....
> >
> > With all that the cost of the Python layer becomes neglectable (except
> > loading
> > the Python environment the first time, if not already loaded, but for a
> > computation that will be longer than a few seconds, that's not really a
> > big
> > deal)
> >
> >
> > gdal-dev@.osgeo
> >
> > http://lists.osgeo.org/mailman/listinfo/gdal-dev
>
> --
> View this message in context:
> http://osgeo-org.1560.x6.nabble.com/gdal-dev-VRT-derived-band-pixel-functi
> ons-written-in-Python-tp5285323p5285730.html Sent from the GDAL - Dev
> mailing list archive at Nabble.com.
> _______________________________________________
> gdal-dev mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/gdal-dev

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: VRT derived band pixel functions written in Python

jramm
Trying to run this using a function relying on scipy.ndimage...

When running gdal_translate on the VRT, I get ImportError: No module named scipy.ndimage
This comes after successfully import numpy. scipy.ndimage will happily import within the python interpreter.
Any tips on how to track this down/debug?

The entire VRT file is as follows:

<VRTDataset RasterXSize="111090" RasterYSize="259376">
  <SRS>PROJCS["OSGB 1936 / British National Grid",GEOGCS["OSGB 1936",DATUM["OSGB_1936",SPHEROID["Airy 1830",6377563.396,299.3249646,AUTHORITY["EPSG","7001"]],TOWGS84[446.448,-125.157,542.06,0.15,0.247,0.842,-20.489],AUTHORITY["EPSG","6277"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4277"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",49],PARAMETER["central_meridian",-2],PARAMETER["scale_factor",0.9996012717],PARAMETER["false_easting",400000],PARAMETER["false_northing",-100000],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","27700"]]</SRS>
  <GeoTransform>100180.0,5.0,0.0,1215730.0,0.0,-5.0</GeoTransform>
  <VRTRasterBand dataType="Float32" band="1" subClass="VRTDerivedRasterBand">
    <SimpleSource>
      <SourceFilename relativeToVrt="0">F:\tif_data\large_sparse.tif</SourceFilename>
      <SourceProperties BlockXSize="256" BlockYSize="256" RasterXSize="111090" RasterYSize="259376"/>
          <OpenOptions>
                <OOI key="NUM_THREADS">4</OOI>
                <OOI key="SPARSE_OK">TRUE</OOI>
          </OpenOptions>
    </SimpleSource>
    <PixelFunctionType>extract_blobs</PixelFunctionType>
    <PixelFunctionLanguage>Python</PixelFunctionLanguage>
    <PixelFunctionCode>
<![CDATA[
import numpy
import scipy.ndimage

def label_sum(input_array, labels, index):
    counts = numpy.bincount(labels.ravel())
    sums = numpy.bincount(labels.ravel(), weights=input_array.ravel())
    found = (index >= 0) & (index < counts.size)
    index[~found] = 0

    sums = sums[index]
    sums[~found] = 0
    return sums

def blob_extraction(input_array, count_threshold, nodata=0, all_connected=True, invert=False, replacement_value=-1.0):
    if count_threshold <= 0:
        raise ValueError('count_threshold must be >= 0')

    # Convert the background value to 0
    if invert:
        # 'Invert' the array, such that the data values (foreground) are made background
        # and blobs are detected in the nodata regions
        # In order to recover the original input data, we make a copy of the array
        original_array = input_array.copy()
        input_array[input_array != numpy.float32(nodata)] = 0
    else:
        input_array[input_array == numpy.float32(nodata)] = 0

    if all_connected:
        # all_connected implies using 8-connectivity and groups will be
        # generated considering orthogonal and diagonal neighbours.
        connectivity_struct = [[1, 1, 1],
                               [1, 1, 1],
                               [1, 1, 1]]
    else:
        # Groups will be generated using only orthogonal neighbours
        connectivity_struct = None

    # Get labelled groups of values
    label_array, n_regions = scipy.ndimage.label(input_array,
                                                 structure=connectivity_struct)
    # Create a copy of the input array converted to bools
    input_as_bool = numpy.array(input_array != 0)
    # Get sizes of each group and remove values from groups where the size is
    # is less than the threshold
    sizes = label_sum(
        input_as_bool, label_array, numpy.arange(n_regions + 1, dtype=numpy.int))
    mask_small_blobs = sizes &lt; count_threshold

    remove_pixels = mask_small_blobs[label_array]

    if invert:
        remove_pixels[input_array == 0] = False
        input_array = original_array
        input_array[remove_pixels] = replacement_value
    else:
        input_array[remove_pixels] = nodata
        # Convert background value back to nodata
        input_array[input_array == 0] = nodata

    return input_array


def extract_blobs(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
            raster_ysize, radius, gt, **kwargs):    
    out_ar = blob_extraction(in_ar[0], int(kwargs["count_threshold"]), int(kwargs["nodata"]))

            
]]>

        </PixelFunctionCode>
    <BufferRadius>5</BufferRadius>
    <PixelFunctionArguments nodata="0" count_threshold="5"/>
  </VRTRasterBand>
</VRTDataset>
Reply | Threaded
Open this post in threaded view
|

Re: VRT derived band pixel functions written in Python

jramm
In reply to this post by Even Rouault-2
Trying to run this using a function relying on scipy.ndimage...

When running gdal_translate on the VRT, I get ImportError: No module named scipy.ndimage
This comes after successfully import numpy. scipy.ndimage will happily import within the python interpreter.
Any tips on how to track this down/debug?

The entire VRT file is as follows:

<VRTDataset RasterXSize="111090" RasterYSize="259376">
  <SRS>PROJCS["OSGB 1936 / British National Grid",GEOGCS["OSGB 1936",DATUM["OSGB_1936",SPHEROID["Airy 1830",6377563.396,299.3249646,AUTHORITY["EPSG","7001"]],TOWGS84[446.448,-125.157,542.06,0.15,0.247,0.842,-20.489],AUTHORITY["EPSG","6277"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4277"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",49],PARAMETER["central_meridian",-2],PARAMETER["scale_factor",0.9996012717],PARAMETER["false_easting",400000],PARAMETER["false_northing",-100000],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","27700"]]</SRS>
  <GeoTransform>100180.0,5.0,0.0,1215730.0,0.0,-5.0</GeoTransform>
  <VRTRasterBand dataType="Float32" band="1" subClass="VRTDerivedRasterBand">
    <SimpleSource>
      <SourceFilename relativeToVrt="0">F:\tif_data\large_sparse.tif</SourceFilename>
      <SourceProperties BlockXSize="256" BlockYSize="256" RasterXSize="111090" RasterYSize="259376"/>
          <OpenOptions>
                <OOI key="NUM_THREADS">4</OOI>
                <OOI key="SPARSE_OK">TRUE</OOI>
          </OpenOptions>
    </SimpleSource>
    <PixelFunctionType>extract_blobs</PixelFunctionType>
    <PixelFunctionLanguage>Python</PixelFunctionLanguage>
    <PixelFunctionCode>
<![CDATA[
import numpy
import scipy.ndimage

def label_sum(input_array, labels, index):
    counts = numpy.bincount(labels.ravel())
    sums = numpy.bincount(labels.ravel(), weights=input_array.ravel())
    found = (index >= 0) & (index < counts.size)
    index[~found] = 0

    sums = sums[index]
    sums[~found] = 0
    return sums

def blob_extraction(input_array, count_threshold, nodata=0, all_connected=True, invert=False, replacement_value=-1.0):
    if count_threshold <= 0:
        raise ValueError('count_threshold must be >= 0')

    # Convert the background value to 0
    if invert:
        # 'Invert' the array, such that the data values (foreground) are made background
        # and blobs are detected in the nodata regions
        # In order to recover the original input data, we make a copy of the array
        original_array = input_array.copy()
        input_array[input_array != numpy.float32(nodata)] = 0
    else:
        input_array[input_array == numpy.float32(nodata)] = 0

    if all_connected:
        # all_connected implies using 8-connectivity and groups will be
        # generated considering orthogonal and diagonal neighbours.
        connectivity_struct = [[1, 1, 1],
                               [1, 1, 1],
                               [1, 1, 1]]
    else:
        # Groups will be generated using only orthogonal neighbours
        connectivity_struct = None

    # Get labelled groups of values
    label_array, n_regions = scipy.ndimage.label(input_array,
                                                 structure=connectivity_struct)
    # Create a copy of the input array converted to bools
    input_as_bool = numpy.array(input_array != 0)
    # Get sizes of each group and remove values from groups where the size is
    # is less than the threshold
    sizes = label_sum(
        input_as_bool, label_array, numpy.arange(n_regions + 1, dtype=numpy.int))
    mask_small_blobs = sizes &lt; count_threshold

    remove_pixels = mask_small_blobs[label_array]

    if invert:
        remove_pixels[input_array == 0] = False
        input_array = original_array
        input_array[remove_pixels] = replacement_value
    else:
        input_array[remove_pixels] = nodata
        # Convert background value back to nodata
        input_array[input_array == 0] = nodata

    return input_array


def extract_blobs(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
            raster_ysize, radius, gt, **kwargs):    
    out_ar = blob_extraction(in_ar[0], int(kwargs["count_threshold"]), int(kwargs["nodata"]))

            
]]>

        </PixelFunctionCode>
    <BufferRadius>5</BufferRadius>
    <PixelFunctionArguments nodata="0" count_threshold="5"/>
  </VRTRasterBand>
</VRTDataset>
Reply | Threaded
Open this post in threaded view
|

Re: VRT derived band pixel functions written in Python

Even Rouault-2
In reply to this post by jramm
Le mercredi 14 septembre 2016 17:24:53, jramm a écrit :
> Trying to run this using a function relying on scipy.ndimage...
>
> When running gdal_translate on the VRT, I get ImportError: No module named
> scipy.ndimage
> This comes after successfully import numpy. scipy.ndimage will happily
> import within the python interpreter.

Works for me for both inline or offline functions.

Are you sure GDAL loads the same python lib as the python version used in the
python interpreter ? (check the debug traces with CPL_DEBUG=ON)

You can also add at the top of your script

import sys
print(sys.path)

and check if the output points to a location where your scipy package can be
found.

> Any tips on how to track this down/debug?
>
> The entire VRT file is as follows:

I guess this is not the entire VRT since it refers to an inline definition of
the script but <PixelFunctionCode> has empty content.

>
> <VRTDataset RasterXSize="111090" RasterYSize="259376">
>   <SRS>PROJCS["OSGB 1936 / British National Grid",GEOGCS["OSGB
> 1936",DATUM["OSGB_1936",SPHEROID["Airy
> 1830",6377563.396,299.3249646,AUTHORITY["EPSG","7001"]],TOWGS84[446.448,-12
> 5.157,542.06,0.15,0.247,0.842,-20.489],AUTHORITY["EPSG","6277"]],PRIMEM["Gr
> eenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHO
> RITY["EPSG","9122"]],AUTHORITY["EPSG","4277"]],PROJECTION["Transverse_Merca
> tor"],PARAMETER["latitude_of_origin",49],PARAMETER["central_meridian",-2],P
> ARAMETER["scale_factor",0.9996012717],PARAMETER["false_easting",400000],PAR
> AMETER["false_northing",-100000],UNIT["metre",1,AUTHORITY["EPSG","9001"]],A
> XIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","27700"]]</SRS>
> <GeoTransform>100180.0,5.0,0.0,1215730.0,0.0,-5.0</GeoTransform>
> <VRTRasterBand dataType="Float32" band="1"
> subClass="VRTDerivedRasterBand">
>     <SimpleSource>
>       <SourceFilename
> relativeToVrt="0">F:\tif_data\large_sparse.tif</SourceFilename>
>       <SourceProperties BlockXSize="256" BlockYSize="256"
> RasterXSize="111090" RasterYSize="259376"/>
>  <OpenOptions>
> <OOI key="NUM_THREADS">4</OOI>
> <OOI key="SPARSE_OK">TRUE</OOI>
>  </OpenOptions>
>     </SimpleSource>
>     <PixelFunctionType>extract_blobs</PixelFunctionType>
>     <PixelFunctionLanguage>Python</PixelFunctionLanguage>
>     <PixelFunctionCode>
> </PixelFunctionCode>
>     <BufferRadius>5</BufferRadius>
>     <PixelFunctionArguments nodata="0" count_threshold="5"/>
>   </VRTRasterBand>
> </VRTDataset>
>
>
>
>
> --
> View this message in context:
> http://osgeo-org.1560.x6.nabble.com/gdal-dev-VRT-derived-band-pixel-functi
> ons-written-in-Python-tp5285323p5285882.html Sent from the GDAL - Dev
> mailing list archive at Nabble.com.
> _______________________________________________
> gdal-dev mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/gdal-dev

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: VRT derived band pixel functions written in Python

Even Rouault-2
Le mercredi 14 septembre 2016 20:50:04, James Ramm a écrit :

> Yes, it is loading a different python....The path printed by sys.path is
> different to if I open the command window and type:
>
> python
>
> >>> import sys
> >>> print(sys.path)
>
> Gdal and the python bindings were compiled with vs2015 and python 3.5, and
> I can correctly import python in the 3.5 interpreter, yet somehow a
> different python DLL (2.7) is being loaded at runtime.

Might be related to the default try order in

        const char* const apszPythonSO[] = { "python27.dll",
                                            "python26.dll",
                                            "python34.dll",
                                            "python35.dll",
                                            "python36.dll",
                                            "python33.dll",
                                            "python32.dll" };

First found, first served.

Hum maybe we should try to match the version that issuing python on the
command line would start. We could potentially look at the PATH to see if
there's something like "bla:\pythonXX" and try the related .dll... Or more
costly, but more reliable, try issuing a 'python -c "import sys;
print(str(sys.version_info[0]) + str(sys.version_info[1]))"' command

That's one of the downside of requiring no dependency at build time.

Anyway you can override the default guess by setting the PYTHONSO config option
to point to the desired python dll.

By the way I've committed the doc in
https://trac.osgeo.org/gdal/changeset/35441 . Should be reflected online in a
few hours.


> I am on a
> 'inherited' PC right now, so final thing to do is to ensure that the
> gdal_translate I am running is the one I compiled and there isn't another
> version lurking somewhere....
>
>
> On 14 September 2016 at 16:50, Even Rouault <[hidden email]>
>
> wrote:
> > Le mercredi 14 septembre 2016 17:24:53, jramm a écrit :
> > > Trying to run this using a function relying on scipy.ndimage...
> > >
> > > When running gdal_translate on the VRT, I get ImportError: No module
> >
> > named
> >
> > > scipy.ndimage
> > > This comes after successfully import numpy. scipy.ndimage will happily
> > > import within the python interpreter.
> >
> > Works for me for both inline or offline functions.
> >
> > Are you sure GDAL loads the same python lib as the python version used in
> > the
> > python interpreter ? (check the debug traces with CPL_DEBUG=ON)
> >
> > You can also add at the top of your script
> >
> > import sys
> > print(sys.path)
> >
> > and check if the output points to a location where your scipy package can
> > be
> > found.
> >
> > > Any tips on how to track this down/debug?
> >
> > > The entire VRT file is as follows:
> > I guess this is not the entire VRT since it refers to an inline
> > definition of
> > the script but <PixelFunctionCode> has empty content.
> >
> > > <VRTDataset RasterXSize="111090" RasterYSize="259376">
> > >
> > >   <SRS>PROJCS["OSGB 1936 / British National Grid",GEOGCS["OSGB
> > >
> > > 1936",DATUM["OSGB_1936",SPHEROID["Airy
> > > 1830",6377563.396,299.3249646,AUTHORITY["EPSG","7001"]],TOWG
> >
> > S84[446.448,-12
> >
> > > 5.157,542.06,0.15,0.247,0.842,-20.489],AUTHORITY["EPSG","627
> >
> > 7"]],PRIMEM["Gr
> >
> > > eenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532
> >
> > 925199433,AUTHO
> >
> > > RITY["EPSG","9122"]],AUTHORITY["EPSG","4277"]],PROJECTION["
> >
> > Transverse_Merca
> >
> > > tor"],PARAMETER["latitude_of_origin",49],PARAMETER["central_
> >
> > meridian",-2],P
> >
> > > ARAMETER["scale_factor",0.9996012717],PARAMETER["false_easti
> >
> > ng",400000],PAR
> >
> > > AMETER["false_northing",-100000],UNIT["metre",1,AUTHORITY["
> >
> > EPSG","9001"]],A
> >
> > > XIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG",
> >
> > "27700"]]</SRS>
> >
> > > <GeoTransform>100180.0,5.0,0.0,1215730.0,0.0,-5.0</GeoTransform>
> > > <VRTRasterBand dataType="Float32" band="1"
> > > subClass="VRTDerivedRasterBand">
> > >
> > >     <SimpleSource>
> > >    
> > >       <SourceFilename
> > >
> > > relativeToVrt="0">F:\tif_data\large_sparse.tif</SourceFilename>
> > >
> > >       <SourceProperties BlockXSize="256" BlockYSize="256"
> > >
> > > RasterXSize="111090" RasterYSize="259376"/>
> > >
> > >         <OpenOptions>
> > >        
> > >               <OOI key="NUM_THREADS">4</OOI>
> > >               <OOI key="SPARSE_OK">TRUE</OOI>
> > >        
> > >         </OpenOptions>
> > >    
> > >     </SimpleSource>
> > >     <PixelFunctionType>extract_blobs</PixelFunctionType>
> > >     <PixelFunctionLanguage>Python</PixelFunctionLanguage>
> > >     <PixelFunctionCode>
> > >    
> > >       </PixelFunctionCode>
> > >    
> > >     <BufferRadius>5</BufferRadius>
> > >     <PixelFunctionArguments nodata="0" count_threshold="5"/>
> > >  
> > >   </VRTRasterBand>
> > >
> > > </VRTDataset>
> > >
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > > http://osgeo-org.1560.x6.nabble.com/gdal-dev-VRT-derived-
> >
> > band-pixel-functi
> >
> > > ons-written-in-Python-tp5285323p5285882.html Sent from the GDAL - Dev
> > > mailing list archive at Nabble.com.
> > > _______________________________________________
> > > gdal-dev mailing list
> > > [hidden email]
> > > http://lists.osgeo.org/mailman/listinfo/gdal-dev
> >
> > --
> > Spatialys - Geospatial professional services
> > http://www.spatialys.com

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: VRT derived band pixel functions written in Python

jramm

Ah, that makes sense. I'll have to try the config option in the morning, but it sounds like that could be it


On 14 Sep 2016 8:06 p.m., "Even Rouault" <[hidden email]> wrote:
Le mercredi 14 septembre 2016 20:50:04, James Ramm a écrit :
> Yes, it is loading a different python....The path printed by sys.path is
> different to if I open the command window and type:
>
> python
>
> >>> import sys
> >>> print(sys.path)
>
> Gdal and the python bindings were compiled with vs2015 and python 3.5, and
> I can correctly import python in the 3.5 interpreter, yet somehow a
> different python DLL (2.7) is being loaded at runtime.

Might be related to the default try order in

        const char* const apszPythonSO[] = { "python27.dll",
                                            "python26.dll",
                                            "python34.dll",
                                            "python35.dll",
                                            "python36.dll",
                                            "python33.dll",
                                            "python32.dll" };

First found, first served.

Hum maybe we should try to match the version that issuing python on the
command line would start. We could potentially look at the PATH to see if
there's something like "bla:\pythonXX" and try the related .dll... Or more
costly, but more reliable, try issuing a 'python -c "import sys;
print(str(sys.version_info[0]) + str(sys.version_info[1]))"' command

That's one of the downside of requiring no dependency at build time.

Anyway you can override the default guess by setting the PYTHONSO config option
to point to the desired python dll.

By the way I've committed the doc in
https://trac.osgeo.org/gdal/changeset/35441 . Should be reflected online in a
few hours.


> I am on a
> 'inherited' PC right now, so final thing to do is to ensure that the
> gdal_translate I am running is the one I compiled and there isn't another
> version lurking somewhere....
>
>
> On 14 September 2016 at 16:50, Even Rouault <[hidden email]>
>
> wrote:
> > Le mercredi 14 septembre 2016 17:24:53, jramm a écrit :
> > > Trying to run this using a function relying on scipy.ndimage...
> > >
> > > When running gdal_translate on the VRT, I get ImportError: No module
> >
> > named
> >
> > > scipy.ndimage
> > > This comes after successfully import numpy. scipy.ndimage will happily
> > > import within the python interpreter.
> >
> > Works for me for both inline or offline functions.
> >
> > Are you sure GDAL loads the same python lib as the python version used in
> > the
> > python interpreter ? (check the debug traces with CPL_DEBUG=ON)
> >
> > You can also add at the top of your script
> >
> > import sys
> > print(sys.path)
> >
> > and check if the output points to a location where your scipy package can
> > be
> > found.
> >
> > > Any tips on how to track this down/debug?
> >
> > > The entire VRT file is as follows:
> > I guess this is not the entire VRT since it refers to an inline
> > definition of
> > the script but <PixelFunctionCode> has empty content.
> >
> > > <VRTDataset RasterXSize="111090" RasterYSize="259376">
> > >
> > >   <SRS>PROJCS["OSGB 1936 / British National Grid",GEOGCS["OSGB
> > >
> > > 1936",DATUM["OSGB_1936",SPHEROID["Airy
> > > 1830",6377563.396,299.3249646,AUTHORITY["EPSG","7001"]],TOWG
> >
> > S84[446.448,-12
> >
> > > 5.157,542.06,0.15,0.247,0.842,-20.489],AUTHORITY["EPSG","627
> >
> > 7"]],PRIMEM["Gr
> >
> > > eenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532
> >
> > 925199433,AUTHO
> >
> > > RITY["EPSG","9122"]],AUTHORITY["EPSG","4277"]],PROJECTION["
> >
> > Transverse_Merca
> >
> > > tor"],PARAMETER["latitude_of_origin",49],PARAMETER["central_
> >
> > meridian",-2],P
> >
> > > ARAMETER["scale_factor",0.9996012717],PARAMETER["false_easti
> >
> > ng",400000],PAR
> >
> > > AMETER["false_northing",-100000],UNIT["metre",1,AUTHORITY["
> >
> > EPSG","9001"]],A
> >
> > > XIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG",
> >
> > "27700"]]</SRS>
> >
> > > <GeoTransform>100180.0,5.0,0.0,1215730.0,0.0,-5.0</GeoTransform>
> > > <VRTRasterBand dataType="Float32" band="1"
> > > subClass="VRTDerivedRasterBand">
> > >
> > >     <SimpleSource>
> > >
> > >       <SourceFilename
> > >
> > > relativeToVrt="0">F:\tif_data\large_sparse.tif</SourceFilename>
> > >
> > >       <SourceProperties BlockXSize="256" BlockYSize="256"
> > >
> > > RasterXSize="111090" RasterYSize="259376"/>
> > >
> > >         <OpenOptions>
> > >
> > >               <OOI key="NUM_THREADS">4</OOI>
> > >               <OOI key="SPARSE_OK">TRUE</OOI>
> > >
> > >         </OpenOptions>
> > >
> > >     </SimpleSource>
> > >     <PixelFunctionType>extract_blobs</PixelFunctionType>
> > >     <PixelFunctionLanguage>Python</PixelFunctionLanguage>
> > >     <PixelFunctionCode>
> > >
> > >       </PixelFunctionCode>
> > >
> > >     <BufferRadius>5</BufferRadius>
> > >     <PixelFunctionArguments nodata="0" count_threshold="5"/>
> > >
> > >   </VRTRasterBand>
> > >
> > > </VRTDataset>
> > >
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > > http://osgeo-org.1560.x6.nabble.com/gdal-dev-VRT-derived-
> >
> > band-pixel-functi
> >
> > > ons-written-in-Python-tp5285323p5285882.html Sent from the GDAL - Dev
> > > mailing list archive at Nabble.com.
> > > _______________________________________________
> > > gdal-dev mailing list
> > > [hidden email]
> > > http://lists.osgeo.org/mailman/listinfo/gdal-dev
> >
> > --
> > Spatialys - Geospatial professional services
> > http://www.spatialys.com

--
Spatialys - Geospatial professional services
http://www.spatialys.com

_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
12