Hi,
I wanted to mention a new (and I think pretty cool ) feature I've added in trunk: the possibility to define pixel functions in Python in a VRT derived band. Let's start with a simple example to multiply a raster by 1.5 : <VRTDataset rasterXSize="20" rasterYSize="20"> <SRS>EPSG:26711</SRS> <GeoTransform>440720,60,0,3751320,0,-60</GeoTransform> <VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand"> <PixelFunctionType>multiply</PixelFunctionType> <PixelFunctionLanguage>Python</PixelFunctionLanguage> <PixelFunctionArguments factor="1.5"/> <PixelFunctionCode><![CDATA[ import numpy as np def multiply(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, buf_radius, gt, **kwargs): factor = float(kwargs['factor']) out_ar[:] = np.round_(np.clip(in_ar[0] * factor,0,255)) ]]> </PixelFunctionCode> <SimpleSource> <SourceFilename relativeToVRT="1">byte.tif</SourceFilename> </SimpleSource> </VRTRasterBand> </VRTDataset> or to add 2 (or more) rasters: <VRTDataset rasterXSize="20" rasterYSize="20"> <SRS>EPSG:26711</SRS> <GeoTransform>440720,60,0,3751320,0,-60</GeoTransform> <VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand"> <PixelFunctionType>add</PixelFunctionType> <PixelFunctionLanguage>Python</PixelFunctionLanguage> <PixelFunctionCode><![CDATA[ import numpy as np def add(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, buf_radius, gt, **kwargs): np.round_(np.clip(np.sum(in_ar, axis = 0, dtype = 'uint16'),0,255), out = out_ar) ]]> </PixelFunctionCode> <SimpleSource> <SourceFilename relativeToVRT="1">byte.tif</SourceFilename> </SimpleSource> <SimpleSource> <SourceFilename relativeToVRT="1">byte2.tif</SourceFilename> </SimpleSource> </VRTRasterBand> </VRTDataset> You can put any python module code inside PixelFunctionCode, with at least one function with the following arguments : - in_ar: list of input numpy arrays (one numpy array for each source) - out_ar: output numpy array to fill (initialized at the right dimensions and with the VRTRasterBand.dataType) - xoff: pixel offset to the top left corner of the accessed region of the band - yoff: line offset to the top left corner of the accessed region of the band - xsize: width of the region of the accessed region of the band - ysize: height of the region of the accessed region of the band - raster_xsize: total with of the raster band - raster_ysize: total height of the raster band - buf_radius: radius of the buffer (in pixels) added to the left, right, top and bottom of in_ar / out_ar - gt: geotransform - kwargs: dictionnary with user arguments defined in PixelFunctionArguments For basic operations, you just need to care about in_ar and out_ar. With all that, you can do interesting stuff like hillshading (code ported from gdaldem): <VRTDataset rasterXSize="121" rasterYSize="121"> <SRS>EPSG:4326</SRS> <GeoTransform>-80.004166666666663,0.008333333333333,0, 44.004166666666663,0,-0.008333333333333</GeoTransform> <VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand"> <ColorInterp>Gray</ColorInterp> <SimpleSource> <SourceFilename relativeToVRT="1">n43.dt0</SourceFilename> </SimpleSource> <PixelFunctionLanguage>Python</PixelFunctionLanguage> <PixelFunctionType>hillshade</PixelFunctionType> <PixelFunctionArguments scale="111120" z_factor="30" /> <PixelFunctionCode> <![CDATA[ # Licence: X/MIT # Copyright 2016, Even Rouault import math def hillshade_int(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, radius, gt, z, scale): ovr_scale_x = float(out_ar.shape[1] - 2 * radius) / xsize ovr_scale_y = float(out_ar.shape[0] - 2 * radius) / ysize ewres = gt[1] / ovr_scale_x nsres = gt[5] / ovr_scale_y inv_nsres = 1.0 / nsres inv_ewres = 1.0 / ewres az = 315 alt = 45 degreesToRadians = math.pi / 180 sin_alt = math.sin(alt * degreesToRadians) azRadians = az * degreesToRadians z_scale_factor = z / (8 * scale) cos_alt_mul_z_scale_factor = \ math.cos(alt * degreesToRadians) * z_scale_factor cos_az_mul_cos_alt_mul_z_scale_factor_mul_254 = \ 254 * math.cos(azRadians) * cos_alt_mul_z_scale_factor sin_az_mul_cos_alt_mul_z_scale_factor_mul_254 = \ 254 * math.sin(azRadians) * cos_alt_mul_z_scale_factor square_z_scale_factor = z_scale_factor * z_scale_factor sin_alt_mul_254 = 254.0 * sin_alt for j in range(radius, out_ar.shape[0]-radius): win_line = in_ar[0][j-radius:j+radius+1,:] for i in range(radius, out_ar.shape[1]-radius): win = win_line[:,i-radius:i+radius+1].tolist() x = inv_ewres * ((win[0][0] + win[1][0] + win[1][0] + win[2][0])-\ (win[0][2] + win[1][2] + win[1][2] + win[2][2])) y = inv_nsres * ((win[2][0] + win[2][1] + win[2][1] + win[2][2])-\ (win[0][0] + win[0][1] + win[0][1] + win[0][2])) xx_plus_yy = x * x + y * y cang_mul_254 = (sin_alt_mul_254 - \ (y * cos_az_mul_cos_alt_mul_z_scale_factor_mul_254 - \ x * sin_az_mul_cos_alt_mul_z_scale_factor_mul_254)) / \ math.sqrt(1 + square_z_scale_factor * xx_plus_yy) if cang_mul_254 < 0: out_ar[j,i] = 1 else: out_ar[j,i] = 1 + round(cang_mul_254) def hillshade(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, radius, gt, **kwargs): z = float(kwargs['z_factor']) scale= float(kwargs['scale']) hillshade_int(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, radius, gt, z, scale) ]]> </PixelFunctionCode> <BufferRadius>1</BufferRadius> <SourceTransferType>Int16</SourceTransferType> </VRTRasterBand> </VRTDataset> You can completely offload the python code itself into a proper my_lib.py file and just specify <PixelFunctionType>my_lib.hillshade</PixelFunctionType> Technically, the interfacing with Python is done at run-time by dynamically loading Python symbols with dlopen()/GetProcAddress(), when they are already available in the process. For example if libgdal is loaded from a Python interpreter, or from a binary like QGIS which has already loaded the Python lib. Otherwise a few shared objects ("libpython2.7.so", "python27.dll", "libpython3.4m.so", etc.) are tried, unless the PYTHONSO config option is specified to point to a precise filename. The advantage of this approach is that the same GDAL library binary is compatible of all Python 2.X (tested: 2.6, 2.7) and 3.X (tested: 3.1, 3.4) versions, and any numpy version that comes in the Python environment used (at compilation time you don't need any python/numpy development package). The numpy dependency is not a critical one: one could imagine a fallback mode where Python arrays would be used instead, but this has likely little interest. Successfully tested on Linux, MacOSX, FreeBSD and Windows. Some extra tweaking of the predefined set of shared object names - that are probed when no already loaded Python environment is found - might be needed. The implementation should be thread-safe regarding use of the Python Global Interpreter Lock (GIL). I've also tested with the numba Just-In-Time compiler (http://numba.pydata.org/) and it provides major performance improvements for highly computational code (example given below). With numba enabled, I found that my above Python hillshading on a 10Kx10K float32 raster was even faster than gdaldem (the reason is that the Python version is a bit simplified as it doesn't take into account input nodata values), whereas the non-jit'ed one is 100x slower. When removing the nodata flag, gdaldem is only twice faster as the jit'ed python code. So this is a good sign that such approach isn't only a toy or just for prototyping. Speaking of JIT, there's no provision (yet?) for interfacing with PyPy. Would require a new backend as PyPy C API has nothing to do with the CPython one. There are obvious security concerns in allowing Python code to be run when getting the content of a vrt file. The GDAL_VRT_ENABLE_PYTHON config option = IF_SAFE / NO / YES can be set to control the behaviour. The default is IF_SAFE (can be change at compilation time by defining -DGDAL_VRT_ENABLE_PYTHON_DEFAULT="NO" e.g. And Python code execution can be completely disabled with -DGDAL_VRT_DISABLE_PYTHON). Safe must be understood as: the code will not read, write, delete... files, spawn external code, do network activity, etc. Said otherwise, the code will only do "maths". But infinite looping is something definitely possible in the safe mode. The heuristics of the IF_SAFE mode is rather basic and I'd be grateful if people could point ways of breaking it. If any of the following strings - "import" (unless it is "import numpy" / "from numpy import ...", "import math" / "from math import ..." or "from numba import jit"), "eval", "compile", "open", "load", "file", "input", "save", "memmap", "DataSource", "genfromtxt", "getattr", "ctypeslib", "testing", "dump", "fromregex" - is found anywhere in the code, then the code is considered unsafe (there are interestingly a lot of methods in numpy to do file I/O. Hopefully I've captured them with the previous filters). Another 'unsafe' pattern is when the pixel function references an external module like my above my_lib.hillshade example (who knows if there will not be some day a shutil.reformat_your_hard_drive function with the right prototype...) This new capability isn't yet documented in the VRT doc, although this message will be a start. I'm interested in feedback you may have. And to conclude with a fun example: a raster with a Mandelbrot fractal. Just a grey-level version. Let to the reader as an exercice: add a color table. To be opened for example in QGIS and enjoy the almost infinite zoom feeling. Make sure to disable contrast enhancement. It uses numba when available, and when this is the case, it's really fast when paning/zooming. <VRTDataset rasterXSize="100000000" rasterYSize="100000000"> <VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand"> <PixelFunctionLanguage>Python</PixelFunctionLanguage> <PixelFunctionType>mandelbrot</PixelFunctionType> <PixelFunctionCode> <![CDATA[ try: from numba import jit #print('Using numba') g_max_iterations = 100 except: class jit(object): def __init__(self, nopython = True, nogil = True): pass def __call__(self, f): return f #print('Using non-JIT version') g_max_iterations = 25 # Use a wrapper since the VRT code cannot access the jit decorated function def mandelbrot(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, r, gt, **kwargs): mandelbrot_jit(out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, g_max_iterations) @jit(nopython=True, nogil=True) def mandelbrot_jit(out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, max_iterations): ovr_factor_y = float(out_ar.shape[0]) / ysize ovr_factor_x = float(out_ar.shape[1]) / xsize for j in range( out_ar.shape[0]): y0 = 2.0 * (yoff + j / ovr_factor_y) / raster_ysize - 1 for i in range(out_ar.shape[1]): x0 = 3.5 * (xoff + i / ovr_factor_x) / raster_xsize - 2.5 x = 0.0 y = 0.0 x2 = 0.0 y2 = 0.0 iteration = 0 while x2 + y2 < 4 and iteration < max_iterations: y = 2*x*y + y0 x = x2 - y2 + x0 x2 = x * x y2 = y * y iteration += 1 out_ar[j][i] = iteration * 255 / max_iterations ]]> </PixelFunctionCode> <Metadata> <MDI key="STATISTICS_MAXIMUM">255</MDI> <MDI key="STATISTICS_MEAN">127</MDI> <MDI key="STATISTICS_MINIMUM">0</MDI> <MDI key="STATISTICS_STDDEV">127</MDI> </Metadata> <ColorInterp>Gray</ColorInterp> <Histograms> <HistItem> <HistMin>-0.5</HistMin> <HistMax>255.5</HistMax> <BucketCount>256</BucketCount> <IncludeOutOfRange>0</IncludeOutOfRange> <Approximate>1</Approximate> <HistCounts>0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0</HistCounts> </HistItem> </Histograms> </VRTRasterBand> </VRTDataset> Statistics have been added just to make QGIS open the file a bit quicker. Even -- Spatialys - Geospatial professional services http://www.spatialys.com _______________________________________________ gdal-dev mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/gdal-dev |
Hi Even,
This is a really cool and really impressive feature! Calling Python code from C++ without development packages as a dependency sounds like black magic to me. Obviously Python symbols have to be there at some point to execute Python code, so this is only usable from a binary that happens to load them already (CPython, QGIS, etc.), correct? In particular you say that it is "done at run-time by dynamically loading Python symbols". But I'm confused because later you mention incompatibilities issues between the CPython and Pypy API. GDAL's secret sauce, I guess...? I'm also curious why it's possible to use numba but no pypy, which AFAIK are both python JITs? And finally did you consider using Cython (which claims pypy compatibility)? Cheers, Victor Poughon ________________________________________ De : gdal-dev [[hidden email]] de la part de Even Rouault [[hidden email]] Envoyé : lundi 12 septembre 2016 14:31 À : [hidden email] Objet : [gdal-dev] VRT derived band pixel functions written in Python Hi, I wanted to mention a new (and I think pretty cool ) feature I've added in trunk: the possibility to define pixel functions in Python in a VRT derived band. Let's start with a simple example to multiply a raster by 1.5 : <VRTDataset rasterXSize="20" rasterYSize="20"> <SRS>EPSG:26711</SRS> <GeoTransform>440720,60,0,3751320,0,-60</GeoTransform> <VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand"> <PixelFunctionType>multiply</PixelFunctionType> <PixelFunctionLanguage>Python</PixelFunctionLanguage> <PixelFunctionArguments factor="1.5"/> <PixelFunctionCode><![CDATA[ import numpy as np def multiply(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, buf_radius, gt, **kwargs): factor = float(kwargs['factor']) out_ar[:] = np.round_(np.clip(in_ar[0] * factor,0,255)) ]]> </PixelFunctionCode> <SimpleSource> <SourceFilename relativeToVRT="1">byte.tif</SourceFilename> </SimpleSource> </VRTRasterBand> </VRTDataset> or to add 2 (or more) rasters: <VRTDataset rasterXSize="20" rasterYSize="20"> <SRS>EPSG:26711</SRS> <GeoTransform>440720,60,0,3751320,0,-60</GeoTransform> <VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand"> <PixelFunctionType>add</PixelFunctionType> <PixelFunctionLanguage>Python</PixelFunctionLanguage> <PixelFunctionCode><![CDATA[ import numpy as np def add(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, buf_radius, gt, **kwargs): np.round_(np.clip(np.sum(in_ar, axis = 0, dtype = 'uint16'),0,255), out = out_ar) ]]> </PixelFunctionCode> <SimpleSource> <SourceFilename relativeToVRT="1">byte.tif</SourceFilename> </SimpleSource> <SimpleSource> <SourceFilename relativeToVRT="1">byte2.tif</SourceFilename> </SimpleSource> </VRTRasterBand> </VRTDataset> You can put any python module code inside PixelFunctionCode, with at least one function with the following arguments : - in_ar: list of input numpy arrays (one numpy array for each source) - out_ar: output numpy array to fill (initialized at the right dimensions and with the VRTRasterBand.dataType) - xoff: pixel offset to the top left corner of the accessed region of the band - yoff: line offset to the top left corner of the accessed region of the band - xsize: width of the region of the accessed region of the band - ysize: height of the region of the accessed region of the band - raster_xsize: total with of the raster band - raster_ysize: total height of the raster band - buf_radius: radius of the buffer (in pixels) added to the left, right, top and bottom of in_ar / out_ar - gt: geotransform - kwargs: dictionnary with user arguments defined in PixelFunctionArguments For basic operations, you just need to care about in_ar and out_ar. With all that, you can do interesting stuff like hillshading (code ported from gdaldem): <VRTDataset rasterXSize="121" rasterYSize="121"> <SRS>EPSG:4326</SRS> <GeoTransform>-80.004166666666663,0.008333333333333,0, 44.004166666666663,0,-0.008333333333333</GeoTransform> <VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand"> <ColorInterp>Gray</ColorInterp> <SimpleSource> <SourceFilename relativeToVRT="1">n43.dt0</SourceFilename> </SimpleSource> <PixelFunctionLanguage>Python</PixelFunctionLanguage> <PixelFunctionType>hillshade</PixelFunctionType> <PixelFunctionArguments scale="111120" z_factor="30" /> <PixelFunctionCode> <![CDATA[ # Licence: X/MIT # Copyright 2016, Even Rouault import math def hillshade_int(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, radius, gt, z, scale): ovr_scale_x = float(out_ar.shape[1] - 2 * radius) / xsize ovr_scale_y = float(out_ar.shape[0] - 2 * radius) / ysize ewres = gt[1] / ovr_scale_x nsres = gt[5] / ovr_scale_y inv_nsres = 1.0 / nsres inv_ewres = 1.0 / ewres az = 315 alt = 45 degreesToRadians = math.pi / 180 sin_alt = math.sin(alt * degreesToRadians) azRadians = az * degreesToRadians z_scale_factor = z / (8 * scale) cos_alt_mul_z_scale_factor = \ math.cos(alt * degreesToRadians) * z_scale_factor cos_az_mul_cos_alt_mul_z_scale_factor_mul_254 = \ 254 * math.cos(azRadians) * cos_alt_mul_z_scale_factor sin_az_mul_cos_alt_mul_z_scale_factor_mul_254 = \ 254 * math.sin(azRadians) * cos_alt_mul_z_scale_factor square_z_scale_factor = z_scale_factor * z_scale_factor sin_alt_mul_254 = 254.0 * sin_alt for j in range(radius, out_ar.shape[0]-radius): win_line = in_ar[0][j-radius:j+radius+1,:] for i in range(radius, out_ar.shape[1]-radius): win = win_line[:,i-radius:i+radius+1].tolist() x = inv_ewres * ((win[0][0] + win[1][0] + win[1][0] + win[2][0])-\ (win[0][2] + win[1][2] + win[1][2] + win[2][2])) y = inv_nsres * ((win[2][0] + win[2][1] + win[2][1] + win[2][2])-\ (win[0][0] + win[0][1] + win[0][1] + win[0][2])) xx_plus_yy = x * x + y * y cang_mul_254 = (sin_alt_mul_254 - \ (y * cos_az_mul_cos_alt_mul_z_scale_factor_mul_254 - \ x * sin_az_mul_cos_alt_mul_z_scale_factor_mul_254)) / \ math.sqrt(1 + square_z_scale_factor * xx_plus_yy) if cang_mul_254 < 0: out_ar[j,i] = 1 else: out_ar[j,i] = 1 + round(cang_mul_254) def hillshade(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, radius, gt, **kwargs): z = float(kwargs['z_factor']) scale= float(kwargs['scale']) hillshade_int(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, radius, gt, z, scale) ]]> </PixelFunctionCode> <BufferRadius>1</BufferRadius> <SourceTransferType>Int16</SourceTransferType> </VRTRasterBand> </VRTDataset> You can completely offload the python code itself into a proper my_lib.py file and just specify <PixelFunctionType>my_lib.hillshade</PixelFunctionType> Technically, the interfacing with Python is done at run-time by dynamically loading Python symbols with dlopen()/GetProcAddress(), when they are already available in the process. For example if libgdal is loaded from a Python interpreter, or from a binary like QGIS which has already loaded the Python lib. Otherwise a few shared objects ("libpython2.7.so", "python27.dll", "libpython3.4m.so", etc.) are tried, unless the PYTHONSO config option is specified to point to a precise filename. The advantage of this approach is that the same GDAL library binary is compatible of all Python 2.X (tested: 2.6, 2.7) and 3.X (tested: 3.1, 3.4) versions, and any numpy version that comes in the Python environment used (at compilation time you don't need any python/numpy development package). The numpy dependency is not a critical one: one could imagine a fallback mode where Python arrays would be used instead, but this has likely little interest. Successfully tested on Linux, MacOSX, FreeBSD and Windows. Some extra tweaking of the predefined set of shared object names - that are probed when no already loaded Python environment is found - might be needed. The implementation should be thread-safe regarding use of the Python Global Interpreter Lock (GIL). I've also tested with the numba Just-In-Time compiler (http://numba.pydata.org/) and it provides major performance improvements for highly computational code (example given below). With numba enabled, I found that my above Python hillshading on a 10Kx10K float32 raster was even faster than gdaldem (the reason is that the Python version is a bit simplified as it doesn't take into account input nodata values), whereas the non-jit'ed one is 100x slower. When removing the nodata flag, gdaldem is only twice faster as the jit'ed python code. So this is a good sign that such approach isn't only a toy or just for prototyping. Speaking of JIT, there's no provision (yet?) for interfacing with PyPy. Would require a new backend as PyPy C API has nothing to do with the CPython one. There are obvious security concerns in allowing Python code to be run when getting the content of a vrt file. The GDAL_VRT_ENABLE_PYTHON config option = IF_SAFE / NO / YES can be set to control the behaviour. The default is IF_SAFE (can be change at compilation time by defining -DGDAL_VRT_ENABLE_PYTHON_DEFAULT="NO" e.g. And Python code execution can be completely disabled with -DGDAL_VRT_DISABLE_PYTHON). Safe must be understood as: the code will not read, write, delete... files, spawn external code, do network activity, etc. Said otherwise, the code will only do "maths". But infinite looping is something definitely possible in the safe mode. The heuristics of the IF_SAFE mode is rather basic and I'd be grateful if people could point ways of breaking it. If any of the following strings - "import" (unless it is "import numpy" / "from numpy import ...", "import math" / "from math import ..." or "from numba import jit"), "eval", "compile", "open", "load", "file", "input", "save", "memmap", "DataSource", "genfromtxt", "getattr", "ctypeslib", "testing", "dump", "fromregex" - is found anywhere in the code, then the code is considered unsafe (there are interestingly a lot of methods in numpy to do file I/O. Hopefully I've captured them with the previous filters). Another 'unsafe' pattern is when the pixel function references an external module like my above my_lib.hillshade example (who knows if there will not be some day a shutil.reformat_your_hard_drive function with the right prototype...) This new capability isn't yet documented in the VRT doc, although this message will be a start. I'm interested in feedback you may have. And to conclude with a fun example: a raster with a Mandelbrot fractal. Just a grey-level version. Let to the reader as an exercice: add a color table. To be opened for example in QGIS and enjoy the almost infinite zoom feeling. Make sure to disable contrast enhancement. It uses numba when available, and when this is the case, it's really fast when paning/zooming. <VRTDataset rasterXSize="100000000" rasterYSize="100000000"> <VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand"> <PixelFunctionLanguage>Python</PixelFunctionLanguage> <PixelFunctionType>mandelbrot</PixelFunctionType> <PixelFunctionCode> <![CDATA[ try: from numba import jit #print('Using numba') g_max_iterations = 100 except: class jit(object): def __init__(self, nopython = True, nogil = True): pass def __call__(self, f): return f #print('Using non-JIT version') g_max_iterations = 25 # Use a wrapper since the VRT code cannot access the jit decorated function def mandelbrot(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, r, gt, **kwargs): mandelbrot_jit(out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, g_max_iterations) @jit(nopython=True, nogil=True) def mandelbrot_jit(out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, max_iterations): ovr_factor_y = float(out_ar.shape[0]) / ysize ovr_factor_x = float(out_ar.shape[1]) / xsize for j in range( out_ar.shape[0]): y0 = 2.0 * (yoff + j / ovr_factor_y) / raster_ysize - 1 for i in range(out_ar.shape[1]): x0 = 3.5 * (xoff + i / ovr_factor_x) / raster_xsize - 2.5 x = 0.0 y = 0.0 x2 = 0.0 y2 = 0.0 iteration = 0 while x2 + y2 < 4 and iteration < max_iterations: y = 2*x*y + y0 x = x2 - y2 + x0 x2 = x * x y2 = y * y iteration += 1 out_ar[j][i] = iteration * 255 / max_iterations ]]> </PixelFunctionCode> <Metadata> <MDI key="STATISTICS_MAXIMUM">255</MDI> <MDI key="STATISTICS_MEAN">127</MDI> <MDI key="STATISTICS_MINIMUM">0</MDI> <MDI key="STATISTICS_STDDEV">127</MDI> </Metadata> <ColorInterp>Gray</ColorInterp> <Histograms> <HistItem> <HistMin>-0.5</HistMin> <HistMax>255.5</HistMax> <BucketCount>256</BucketCount> <IncludeOutOfRange>0</IncludeOutOfRange> <Approximate>1</Approximate> <HistCounts>0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0</HistCounts> </HistItem> </Histograms> </VRTRasterBand> </VRTDataset> Statistics have been added just to make QGIS open the file a bit quicker. Even -- Spatialys - Geospatial professional services http://www.spatialys.com _______________________________________________ gdal-dev mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/gdal-dev _______________________________________________ gdal-dev mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/gdal-dev |
In reply to this post by Even Rouault-2
Hi Even,
On Mon, Sep 12, 2016 at 2:31 PM, Even Rouault <[hidden email]> wrote: Hi, I found http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html to be a good intro to the risks of eval'ing untrusted Python code. Mentioned in there is a notable attempt to make a secure subset of Python called "pysandbox", but its developer has since declared it "broken by design": https://lwn.net/Articles/574215/. I'm not knowledgeable enough about sandboxing (OS or otherwise) to say if that's right. I see that in GDAL 2.0+ we can set options in the VRT XML itself. Is it possible to set GDAL_VRT_ENABLE_PYTHON=YES in a VRT and thus override the reader's own trust policies? My ignorance of how GDAL separates "open" options from "config" options might be on display in this question. My $.02 is that since "is safe" will be hard to guarantee (it's an outstanding unsolved Python community issue), removing "IF_SAFE" from the options would be a good thing and that the default for GDAL_VRT_ENABLE_PYTHON should be "NO". Sean Gillies _______________________________________________ gdal-dev mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/gdal-dev |
In reply to this post by Poughon Victor
Le lundi 12 septembre 2016 16:59:34, Poughon Victor a écrit :
> Hi Even, > > This is a really cool and really impressive feature! > Calling Python code from C++ without development packages as a dependency > sounds like black magic to me. No black magic. Just setting function pointers using dlopen() : https://github.com/OSGeo/gdal/blob/trunk/gdal/frmts/vrt/vrtderivedrasterband.cpp#L390 > Obviously Python symbols have to be there > at some point to execute Python code, so this is only usable from a binary > that happens to load them already (CPython, QGIS, etc.), correct? No, there are 2 modes: - GDAL is loaded after something else has already loaded python (so your CPython, QGIS) - GDAL cannot find any python symbols in the already available symbols of the process, in which case it wil try to load a few reasonable shared objects like "libpython2.7.so", etc... See https://github.com/OSGeo/gdal/blob/trunk/gdal/frmts/vrt/vrtderivedrasterband.cpp#L238 and below lines > But I'm confused because later you mention > incompatibilities issues between the CPython and Pypy API. GDAL's secret > sauce, I guess...? Nothing GDAL specific here. It is just that the way to embed PyPy from C is completely different from CPython : http://doc.pypy.org/en/latest/embedding.html vs https://docs.python.org/2/extending/embedding.html > I'm also curious why it's possible to use numba but no > pypy, which AFAIK are both python JITs? numba is "just" a CPython extension (a non-trivial one dragging llvmlite etc...), so once you support CPython, it comes for free to use it, provided you use the right import and function decorations in your python code. > And finally did you consider using > Cython (which claims pypy compatibility)? No, I didn't investigate that. Even -- Spatialys - Geospatial professional services http://www.spatialys.com _______________________________________________ gdal-dev mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/gdal-dev |
In reply to this post by Sean Gillies-3
> I found http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html
> to be a good intro to the risks of eval'ing untrusted Python code. > Mentioned in there is a notable attempt to make a secure subset of Python > called "pysandbox", but its developer has since declared it "broken by > design": https://lwn.net/Articles/574215/. I'm not knowledgeable enough > about sandboxing (OS or otherwise) to say if that's right. Those links are fascinating. I wouldn't have imagined that there was valid Python code that could crash the Python interpreter, almost by design ! (so the osgeo.gdal gotchas are not so bad after all :-) ) OK, so I've followed your suggestion: * IF_SAFE mode is removed (actually between #if 0 / #endif if someone wants to pursue the sandboxing effort), * and the default of GDAL_VRT_ENABLE_PYTHON is now NO. > > I see that in GDAL 2.0+ we can set options in the VRT XML itself. Is it > possible to set GDAL_VRT_ENABLE_PYTHON=YES in a VRT and thus override the > reader's own trust policies? No, that's not possible. GDAL_VRT_ENABLE_PYTHON is a config option only, so can only be set as a env variable or through CPLSetConfigOption(). We could imagine to provide it as an option of the VRT indeed, but then there would be interesting situations like the following. Imagine something called "my.tif" with the following content : <VRTDataset rasterXSize="20" rasterYSize="20"> <VRTRasterBand dataType="Byte" band="1"> <ColorInterp>Gray</ColorInterp> <SimpleSource> <SourceFilename relativeToVRT="1"><![CDATA[ <VRTDataset .... evil Python code here ... .... >]]></SourceFilename> <OpenOptions> <OOI key="ENABLE_PYTHON">YES</OOI> </OpenOptions> </SimpleSource> </VRTRasterBand> </VRTDataset> The internalized VRT would then be allowed with ENABLE_PYTHON=YES. So it's probably a good idea to not provide an open option for now. One could perhaps imagine to introduce one, provided we make it illegal as an item in <OpenOptions>, so it can only be provided by "top-level" GDALOpenEx() calls. > My ignorance of how GDAL separates "open" > options from "config" options might be on display in this question. They are just 2 different mechanisms of providing options. There's no automatic bridge between both concepts. Sometimes some option may exist in both worlds (because historically there was only global config option, but in some use cases, some options mostly make sense per dataset) or in just one of them. -- Spatialys - Geospatial professional services http://www.spatialys.com _______________________________________________ gdal-dev mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/gdal-dev |
Incredible!!!! Really cool initiative. I had a play around and wanted a way to be able to write the pixel function in a 'stand alone' python module and generate the VRT from it. This would allow independent testing and easier maintenance of the python code. It is fairly easy using lxml to build up the VRT dynamically: In the example I more-or-less recreate the hillshade example Even posted, although there is a lot of functionality missing from the VRT-builder still, so some nodes are not present. It would be quite easy therefore, to expand this to a sort of GDAL 'algorithms library' (I think it can wholesale replace what I had tried here: https://github.com/JamesRamm/GeoAlg). One thing I would note (if I understand correctly) is that you have specified any "import" statement as unsafe except numpy/math and numba? It might be better to either allow imports or not allow them at all (unless the config option is set) as I don't see why those libraries would be any more or any less 'safe' than another library...e.g. another standard library module. I would agree with Sean and remove 'IF_SAFE' and leave it entirely to the users discretion. Many users who are working with their own local data will likely have no problems about just setting it to 'YES' and I imagine users for whom it would be a probably might already have their own rules on what constitutes 'safe'? On a sidenote - does this new functionality overlap considerably with the gdal map algebra project? On 12 September 2016 at 19:25, Even Rouault <[hidden email]> wrote: > I found http://nedbatchelder.com/blog/ _______________________________________________ gdal-dev mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/gdal-dev |
In reply to this post by Even Rouault-2
Hi Even,
This is really amazing. I'm becoming more and more a fan of Numba for number crunching, so this certainly makes my day. As soon as i can find a Win x64 dev version on a Conda channel I'll give it a try. A use case that comes to mind, and which i run into regularly, is when i want to do some simple aggregation before using something like gdalwarp. For example when you have a file containing 24 hourly temperature values, and you are only interested in the daily mean. Currently i either aggregate before warping and write the intermediates to disk, or aggregate after warping which is computationally inefficient. Neither is optimal. A few questions, can you access a files metadata from within a pixel function? This would perhaps allow for example interpolating atmospheric data to the overpass time of a satellite image. Do the pixel functions also work with @numba.vectorize(), in particular when targeting 'parallel' or 'cuda'. And would that give parallel processing for both IO and calculations? You should give the folks at Continuum a heads up, i'm sure they appreciate seeing Numba used like this. Regards, Rutger |
In reply to this post by jramm
Le lundi 12 septembre 2016 23:38:41, James Ramm a écrit :
> Incredible!!!! > Really cool initiative. > > I had a play around and wanted a way to be able to write the pixel function > in a 'stand alone' python module and generate the VRT from it. This would > allow independent testing and easier maintenance of the python code. It is > fairly easy using lxml to build up the VRT dynamically: > > https://github.com/JamesRamm/gdal_pixel_functions > > In the example I more-or-less recreate the hillshade example Even posted, > although there is a lot of functionality missing from the VRT-builder > still, so some nodes are not present. > It would be quite easy therefore, to expand this to a sort of GDAL > 'algorithms library' (I think it can wholesale replace what I had tried > here: https://github.com/JamesRamm/GeoAlg). a "-of VRT" option in current gdal_calc.py could also be cool (at least for simple one line operations) Programmatic creation of VRT derived band, with the previous scope (ie calling a native registered pixel function), already existed. See https://github.com/OSGeo/gdal/blob/trunk/autotest/gdrivers/vrtderived.py#L175 I didn't extend for now the VRTDataset::AddBand implementation to set the new elements needed for the Python stuff, but that would certainly be doable. > > One thing I would note (if I understand correctly) is that you have > specified any "import" statement as unsafe except numpy/math and numba? It > might be better to either allow imports or not allow them at all (unless > the config option is set) as I don't see why those libraries would be any > more or any less 'safe' than another library...e.g. another standard > library module. Well, the idea was that those modules should only do math stuff and not interact with the rest of the system (contrary to 'os', 'system' and the like) But is in the numpy case, it is not actually true since it has I/O methods, hence my naive attempt to blacklist those by names. > > I would agree with Sean and remove 'IF_SAFE' and leave it entirely to the > users discretion. That's what I've done (see previous message) > Many users who are working with their own local data will > likely have no problems about just setting it to 'YES' and I imagine users > for whom it would be a probably might already have their own rules on what > constitutes 'safe'? > > On a sidenote - does this new functionality overlap considerably with the > gdal map algebra project? Indeed. I think there might be several incarnations of such capability and I'm not sure we can come up with a single one that will satisfy all needs and constraints. The VRT way has some use cases where it shines (on-the-fly computation of a region of interest. easy integration with upper levels in the stack: QGIS, MapServer, etc...), whereas you could imagine something else efficient in doing whole raster processing with parallelism etc... -- Spatialys - Geospatial professional services http://www.spatialys.com _______________________________________________ gdal-dev mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/gdal-dev |
In reply to this post by Rutger
Le mardi 13 septembre 2016 09:02:09, Rutger a écrit :
> Hi Even, > > This is really amazing. I'm becoming more and more a fan of Numba for > number crunching, so this certainly makes my day. As soon as i can find a > Win x64 dev version on a Conda channel I'll give it a try. > > A use case that comes to mind, and which i run into regularly, is when i > want to do some simple aggregation before using something like gdalwarp. > For example when you have a file containing 24 hourly temperature values, > and you are only interested in the daily mean. Currently i either > aggregate before warping and write the intermediates to disk, or aggregate > after warping which is computationally inefficient. Neither is optimal. > > A few questions, can you access a files metadata from within a pixel > function? This would perhaps allow for example interpolating atmospheric > data to the overpass time of a satellite image. This is indeed a question I've asked myself, if the prototype of the pixel function contained enough information. I had not put initially the geotransform and whole raster dimensions for example, but found it was necessary to have correct behaviour for hillshading at different zoom scales. And I also added afterwards the xoff, yoff to be able to do Mandelbrot generation (I guess there might be some more valid use cases ;-)). And then the user provided dictionnary to have some parametrization of the algorithm instead of hardcoding constants in it, so that you can off-load it into a general lib. And I stopped at that point. For your use case, 2 possibilities currently, using the <PixelFunctionArguments> capability: - either you get the metadata items you need at VRT generation and put them as arguments - either you only pass the names of the sources as arguments, and you use the GDAL Python bindings themselves inside the pixel function itself to get access to everything needed. But I'm just thinking it might be a bit inefficient to do that for each RasterIO() request. <loud_thinking> Perhaps a compromise would be to allow, in addition to simple functions, to specify a class name, where the constructor would receive values that don't change from one call to another one, and a calc() method would receive the ones that change at each RasterIO() request def MyProcessingClass: def __init__(self, raster_xsize, raster_ysize, buf_radius, gt, source_filenames, **kwargs): save above parameters that you may need in calc() do one time things like gdal.Open()'ing sources to get metadata def calc( self, in_ar, out_ar, xoff, yoff, xsize, ysize ): do your computation We could possibly pass the sources as datasets themselves, since they are actually already opened, but that would make the osgeo.gdal bindings availability a requirement (well, we could as a fallback pass the filenames if the import fails) </loud_thinking> > > Do the pixel functions also work with @numba.vectorize(), in particular > when targeting 'parallel' or 'cuda'. I've only scratched up the surface of Numba (didn't know it 2 days ago). I guess this might work (might only be interested if the VRTDerivedRasterBand::IRasterIO() is called with a big enough region. Which depends on the pixel access pattern of upper layers). The VRT drivers just calls a Python function that takes numpy arrays and a few extra args. > And would that give parallel > processing for both IO and calculations? Only calculations. I/O is done before going to Python and after returning from it. Actually... if you specify zero source in the VRT, then it is up to you to do the read operations the way you like in Python, so you could possibly parallelize them there. >You should give the folks at Continuum a heads up, i'm sure they appreciate > seeing Numba used like this. I have no connection with them, but yes their project rocks. Even -- Spatialys - Geospatial professional services http://www.spatialys.com _______________________________________________ gdal-dev mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/gdal-dev |
I overlooked the fact that it still moves through Python, is that the 'only' hurdle preventing parallel IO? Since gdalwarp for example has the -multi flag, it seems as if GDAL is capable of it, or is that a specific/specialized implementation?
Numba has several options which might eliminate using Python during execution. There are c-callbacks: http://numba.pydata.org/numba-doc/dev/user/cfunc.html It probably works by using the Numpy C-API (which i have zero experience with). I don't know if its possible that other programs, like GDAL, can also use those compiled functions without moving to Python first. There is also ahead-of-time compilation (AOT): http://numba.pydata.org/numba-doc/dev/user/pycc.html AOT has the benefit that users only need Numpy as a dependency, and don't need Numba/llvm. There some drawbacks as well, like no longer being able to compile optimizations for the hardware its running on. Regards, Rutger
|
Le mardi 13 septembre 2016 11:07:39, Rutger a écrit :
> I overlooked the fact that it still moves through Python, is that the > 'only' hurdle preventing parallel IO? Not sure to understand your question. But if you have several sources, you could potentially do parallelized reading of them from the Python code by using Python threads and GDAL Python API. But looking in the SWIG generated code, it doesn't seem that SWIG releases the GIL automatically before calling native code. Hum... So that should probably added manually, at least around GDALRasterIO() calls, otherwise you'll get zero perf improvements. > Since gdalwarp for example has the > -multi flag, it seems as if GDAL is capable of it, or is that a > specific/specialized implementation? Parallelized I/O doesn't mean much by itself without more context. You may want to parallelize reading of different regions of the same dataset, or parallelize reading of different datasets. Due to GDAL objects not being thread-safe, the first case (reading of different regions of the same dataset) can be solved with the second one by opening several datasets for the same filename. Regarding gdalwarp -multi, here's how that works. When you warp a dataset, there's a list of all chunks (windows) to be processed that is generated. gdalwarp -multi does the following Thread I/O Thread computation Read data for chunk 1 Read data for chunk 2 Do calculations for chunk 1 Write output of chunk 1 Do calculations for chunk 2 Read data for chunk 3 Write output of chunk 2 Do calculations for chunk 3 > > Numba has several options which might eliminate using Python during > execution. There are c-callbacks: > http://numba.pydata.org/numba-doc/dev/user/cfunc.html You can also use @jit(nopython=True, nogil=True) and your Python method will end up being pure native code (provided that you don't use too high level stuff otherwise the jit'ification will fail with an exception). And for code that is not inlined in the VRT, you can also add cache=True so that the jit'ification can be reused. With all that the cost of the Python layer becomes neglectable (except loading the Python environment the first time, if not already loaded, but for a computation that will be longer than a few seconds, that's not really a big deal) -- Spatialys - Geospatial professional services http://www.spatialys.com _______________________________________________ gdal-dev mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/gdal-dev |
I think you can call SWIG with the -threads argument on the command line so it will always release the GIL. Could be an easy option if it works
On Tuesday, 13 September 2016, Even Rouault <[hidden email]> wrote: Le mardi 13 septembre 2016 11:07:39, Rutger a écrit : _______________________________________________ gdal-dev mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/gdal-dev |
Le mardi 13 septembre 2016 21:22:20, James Ramm a écrit :
> I think you can call SWIG with the -threads argument on the command line so > it will always release the GIL. Could be an easy option if it works That's mostly what I've done. See my other message : https://lists.osgeo.org/pipermail/gdal-dev/2016-September/045155.html > > On Tuesday, 13 September 2016, Even Rouault <[hidden email]> > > wrote: > > Le mardi 13 septembre 2016 11:07:39, Rutger a écrit : > > > I overlooked the fact that it still moves through Python, is that the > > > 'only' hurdle preventing parallel IO? > > > > Not sure to understand your question. But if you have several sources, > > you could potentially do parallelized reading of them from the Python > > code by using Python threads and GDAL Python API. But looking in the > > SWIG generated code, it doesn't seem that SWIG releases the GIL > > automatically before calling > > native code. Hum... So that should probably added manually, at least > > around GDALRasterIO() calls, otherwise you'll get zero perf > > improvements. > > > > > Since gdalwarp for example has the > > > -multi flag, it seems as if GDAL is capable of it, or is that a > > > specific/specialized implementation? > > > > Parallelized I/O doesn't mean much by itself without more context. You > > may want to parallelize reading of different regions of the same > > dataset, or parallelize reading of different datasets. Due to GDAL > > objects not being thread-safe, the first case (reading of different > > regions of the same dataset) > > can be solved with the second one by opening several datasets for the > > same filename. > > > > Regarding gdalwarp -multi, here's how that works. When you warp a > > dataset, there's a list of all chunks (windows) to be processed that is > > generated. gdalwarp -multi does the following > > > > Thread I/O Thread > > computation Read data for chunk 1 > > Read data for chunk 2 Do calculations for chunk 1 > > Write output of chunk 1 Do calculations for chunk 2 > > Read data for chunk 3 > > Write output of chunk 2 Do calculations for chunk 3 > > > > > Numba has several options which might eliminate using Python during > > > execution. There are c-callbacks: > > > http://numba.pydata.org/numba-doc/dev/user/cfunc.html > > > > You can also use @jit(nopython=True, nogil=True) and your Python method > > will > > end up being pure native code (provided that you don't use too high level > > stuff > > otherwise the jit'ification will fail with an exception). > > > > And for code that is not inlined in the VRT, you can also add cache=True > > so that the jit'ification can be reused. > > > > With all that the cost of the Python layer becomes neglectable (except > > loading > > the Python environment the first time, if not already loaded, but for a > > computation that will be longer than a few seconds, that's not really a > > big deal) > > > > -- > > Spatialys - Geospatial professional services > > http://www.spatialys.com > > _______________________________________________ > > gdal-dev mailing list > > [hidden email] <javascript:;> > > http://lists.osgeo.org/mailman/listinfo/gdal-dev -- Spatialys - Geospatial professional services http://www.spatialys.com _______________________________________________ gdal-dev mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/gdal-dev |
In reply to this post by Even Rouault-2
Then I guess i'm mistaken in thinking that Python would become a slowdown in such a case. It's been a while since i tested it. Anyway, thanks for your comments and efforts to improve the performance.
I couldnt find any Conda channels* which build from trunk, so i probably have to wait a while before i can try it out, something to look forward to. :) * https://anaconda.org/search?q=gdal Regards, Rutger
|
Le mercredi 14 septembre 2016 09:02:07, Rutger a écrit :
> Then I guess i'm mistaken in thinking that Python would become a slowdown > in such a case. It's been a while since i tested it. Anyway, thanks for > your comments and efforts to improve the performance. > > I couldnt find any Conda channels* which build from trunk, so i probably > have to wait a while before i can > try it out, something to look forward to. :) > > * https://anaconda.org/search?q=gdal You can also grab a nightly build at http://www.gisinternals.com/development.php > > > Regards, > Rutger > > > Even Rouault-2 wrote > > > Le mardi 13 septembre 2016 11:07:39, Rutger a écrit : > > .... > > > > With all that the cost of the Python layer becomes neglectable (except > > loading > > the Python environment the first time, if not already loaded, but for a > > computation that will be longer than a few seconds, that's not really a > > big > > deal) > > > > > > gdal-dev@.osgeo > > > > http://lists.osgeo.org/mailman/listinfo/gdal-dev > > -- > View this message in context: > http://osgeo-org.1560.x6.nabble.com/gdal-dev-VRT-derived-band-pixel-functi > ons-written-in-Python-tp5285323p5285730.html Sent from the GDAL - Dev > mailing list archive at Nabble.com. > _______________________________________________ > gdal-dev mailing list > [hidden email] > http://lists.osgeo.org/mailman/listinfo/gdal-dev -- Spatialys - Geospatial professional services http://www.spatialys.com _______________________________________________ gdal-dev mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/gdal-dev |
Trying to run this using a function relying on scipy.ndimage...
When running gdal_translate on the VRT, I get ImportError: No module named scipy.ndimage This comes after successfully import numpy. scipy.ndimage will happily import within the python interpreter. Any tips on how to track this down/debug? The entire VRT file is as follows: <VRTDataset RasterXSize="111090" RasterYSize="259376"> <SRS>PROJCS["OSGB 1936 / British National Grid",GEOGCS["OSGB 1936",DATUM["OSGB_1936",SPHEROID["Airy 1830",6377563.396,299.3249646,AUTHORITY["EPSG","7001"]],TOWGS84[446.448,-125.157,542.06,0.15,0.247,0.842,-20.489],AUTHORITY["EPSG","6277"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4277"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",49],PARAMETER["central_meridian",-2],PARAMETER["scale_factor",0.9996012717],PARAMETER["false_easting",400000],PARAMETER["false_northing",-100000],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","27700"]]</SRS> <GeoTransform>100180.0,5.0,0.0,1215730.0,0.0,-5.0</GeoTransform> <VRTRasterBand dataType="Float32" band="1" subClass="VRTDerivedRasterBand"> <SimpleSource> <SourceFilename relativeToVrt="0">F:\tif_data\large_sparse.tif</SourceFilename> <SourceProperties BlockXSize="256" BlockYSize="256" RasterXSize="111090" RasterYSize="259376"/> <OpenOptions> <OOI key="NUM_THREADS">4</OOI> <OOI key="SPARSE_OK">TRUE</OOI> </OpenOptions> </SimpleSource> <PixelFunctionType>extract_blobs</PixelFunctionType> <PixelFunctionLanguage>Python</PixelFunctionLanguage> <PixelFunctionCode> <![CDATA[ import numpy import scipy.ndimage def label_sum(input_array, labels, index): counts = numpy.bincount(labels.ravel()) sums = numpy.bincount(labels.ravel(), weights=input_array.ravel()) found = (index >= 0) & (index < counts.size) index[~found] = 0 sums = sums[index] sums[~found] = 0 return sums def blob_extraction(input_array, count_threshold, nodata=0, all_connected=True, invert=False, replacement_value=-1.0): if count_threshold <= 0: raise ValueError('count_threshold must be >= 0') # Convert the background value to 0 if invert: # 'Invert' the array, such that the data values (foreground) are made background # and blobs are detected in the nodata regions # In order to recover the original input data, we make a copy of the array original_array = input_array.copy() input_array[input_array != numpy.float32(nodata)] = 0 else: input_array[input_array == numpy.float32(nodata)] = 0 if all_connected: # all_connected implies using 8-connectivity and groups will be # generated considering orthogonal and diagonal neighbours. connectivity_struct = [[1, 1, 1], [1, 1, 1], [1, 1, 1]] else: # Groups will be generated using only orthogonal neighbours connectivity_struct = None # Get labelled groups of values label_array, n_regions = scipy.ndimage.label(input_array, structure=connectivity_struct) # Create a copy of the input array converted to bools input_as_bool = numpy.array(input_array != 0) # Get sizes of each group and remove values from groups where the size is # is less than the threshold sizes = label_sum( input_as_bool, label_array, numpy.arange(n_regions + 1, dtype=numpy.int)) mask_small_blobs = sizes < count_threshold remove_pixels = mask_small_blobs[label_array] if invert: remove_pixels[input_array == 0] = False input_array = original_array input_array[remove_pixels] = replacement_value else: input_array[remove_pixels] = nodata # Convert background value back to nodata input_array[input_array == 0] = nodata return input_array def extract_blobs(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, radius, gt, **kwargs): out_ar = blob_extraction(in_ar[0], int(kwargs["count_threshold"]), int(kwargs["nodata"])) ]]> </PixelFunctionCode> <BufferRadius>5</BufferRadius> <PixelFunctionArguments nodata="0" count_threshold="5"/> </VRTRasterBand> </VRTDataset> |
In reply to this post by Even Rouault-2
Trying to run this using a function relying on scipy.ndimage...
When running gdal_translate on the VRT, I get ImportError: No module named scipy.ndimage This comes after successfully import numpy. scipy.ndimage will happily import within the python interpreter. Any tips on how to track this down/debug? The entire VRT file is as follows: <VRTDataset RasterXSize="111090" RasterYSize="259376"> <SRS>PROJCS["OSGB 1936 / British National Grid",GEOGCS["OSGB 1936",DATUM["OSGB_1936",SPHEROID["Airy 1830",6377563.396,299.3249646,AUTHORITY["EPSG","7001"]],TOWGS84[446.448,-125.157,542.06,0.15,0.247,0.842,-20.489],AUTHORITY["EPSG","6277"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4277"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",49],PARAMETER["central_meridian",-2],PARAMETER["scale_factor",0.9996012717],PARAMETER["false_easting",400000],PARAMETER["false_northing",-100000],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","27700"]]</SRS> <GeoTransform>100180.0,5.0,0.0,1215730.0,0.0,-5.0</GeoTransform> <VRTRasterBand dataType="Float32" band="1" subClass="VRTDerivedRasterBand"> <SimpleSource> <SourceFilename relativeToVrt="0">F:\tif_data\large_sparse.tif</SourceFilename> <SourceProperties BlockXSize="256" BlockYSize="256" RasterXSize="111090" RasterYSize="259376"/> <OpenOptions> <OOI key="NUM_THREADS">4</OOI> <OOI key="SPARSE_OK">TRUE</OOI> </OpenOptions> </SimpleSource> <PixelFunctionType>extract_blobs</PixelFunctionType> <PixelFunctionLanguage>Python</PixelFunctionLanguage> <PixelFunctionCode> <![CDATA[ import numpy import scipy.ndimage def label_sum(input_array, labels, index): counts = numpy.bincount(labels.ravel()) sums = numpy.bincount(labels.ravel(), weights=input_array.ravel()) found = (index >= 0) & (index < counts.size) index[~found] = 0 sums = sums[index] sums[~found] = 0 return sums def blob_extraction(input_array, count_threshold, nodata=0, all_connected=True, invert=False, replacement_value=-1.0): if count_threshold <= 0: raise ValueError('count_threshold must be >= 0') # Convert the background value to 0 if invert: # 'Invert' the array, such that the data values (foreground) are made background # and blobs are detected in the nodata regions # In order to recover the original input data, we make a copy of the array original_array = input_array.copy() input_array[input_array != numpy.float32(nodata)] = 0 else: input_array[input_array == numpy.float32(nodata)] = 0 if all_connected: # all_connected implies using 8-connectivity and groups will be # generated considering orthogonal and diagonal neighbours. connectivity_struct = [[1, 1, 1], [1, 1, 1], [1, 1, 1]] else: # Groups will be generated using only orthogonal neighbours connectivity_struct = None # Get labelled groups of values label_array, n_regions = scipy.ndimage.label(input_array, structure=connectivity_struct) # Create a copy of the input array converted to bools input_as_bool = numpy.array(input_array != 0) # Get sizes of each group and remove values from groups where the size is # is less than the threshold sizes = label_sum( input_as_bool, label_array, numpy.arange(n_regions + 1, dtype=numpy.int)) mask_small_blobs = sizes < count_threshold remove_pixels = mask_small_blobs[label_array] if invert: remove_pixels[input_array == 0] = False input_array = original_array input_array[remove_pixels] = replacement_value else: input_array[remove_pixels] = nodata # Convert background value back to nodata input_array[input_array == 0] = nodata return input_array def extract_blobs(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, radius, gt, **kwargs): out_ar = blob_extraction(in_ar[0], int(kwargs["count_threshold"]), int(kwargs["nodata"])) ]]> </PixelFunctionCode> <BufferRadius>5</BufferRadius> <PixelFunctionArguments nodata="0" count_threshold="5"/> </VRTRasterBand> </VRTDataset> |
In reply to this post by jramm
Le mercredi 14 septembre 2016 17:24:53, jramm a écrit :
> Trying to run this using a function relying on scipy.ndimage... > > When running gdal_translate on the VRT, I get ImportError: No module named > scipy.ndimage > This comes after successfully import numpy. scipy.ndimage will happily > import within the python interpreter. Works for me for both inline or offline functions. Are you sure GDAL loads the same python lib as the python version used in the python interpreter ? (check the debug traces with CPL_DEBUG=ON) You can also add at the top of your script import sys print(sys.path) and check if the output points to a location where your scipy package can be found. > Any tips on how to track this down/debug? > > The entire VRT file is as follows: I guess this is not the entire VRT since it refers to an inline definition of the script but <PixelFunctionCode> has empty content. > > <VRTDataset RasterXSize="111090" RasterYSize="259376"> > <SRS>PROJCS["OSGB 1936 / British National Grid",GEOGCS["OSGB > 1936",DATUM["OSGB_1936",SPHEROID["Airy > 1830",6377563.396,299.3249646,AUTHORITY["EPSG","7001"]],TOWGS84[446.448,-12 > 5.157,542.06,0.15,0.247,0.842,-20.489],AUTHORITY["EPSG","6277"]],PRIMEM["Gr > eenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHO > RITY["EPSG","9122"]],AUTHORITY["EPSG","4277"]],PROJECTION["Transverse_Merca > tor"],PARAMETER["latitude_of_origin",49],PARAMETER["central_meridian",-2],P > ARAMETER["scale_factor",0.9996012717],PARAMETER["false_easting",400000],PAR > AMETER["false_northing",-100000],UNIT["metre",1,AUTHORITY["EPSG","9001"]],A > XIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","27700"]]</SRS> > <GeoTransform>100180.0,5.0,0.0,1215730.0,0.0,-5.0</GeoTransform> > <VRTRasterBand dataType="Float32" band="1" > subClass="VRTDerivedRasterBand"> > <SimpleSource> > <SourceFilename > relativeToVrt="0">F:\tif_data\large_sparse.tif</SourceFilename> > <SourceProperties BlockXSize="256" BlockYSize="256" > RasterXSize="111090" RasterYSize="259376"/> > <OpenOptions> > <OOI key="NUM_THREADS">4</OOI> > <OOI key="SPARSE_OK">TRUE</OOI> > </OpenOptions> > </SimpleSource> > <PixelFunctionType>extract_blobs</PixelFunctionType> > <PixelFunctionLanguage>Python</PixelFunctionLanguage> > <PixelFunctionCode> > </PixelFunctionCode> > <BufferRadius>5</BufferRadius> > <PixelFunctionArguments nodata="0" count_threshold="5"/> > </VRTRasterBand> > </VRTDataset> > > > > > -- > View this message in context: > http://osgeo-org.1560.x6.nabble.com/gdal-dev-VRT-derived-band-pixel-functi > ons-written-in-Python-tp5285323p5285882.html Sent from the GDAL - Dev > mailing list archive at Nabble.com. > _______________________________________________ > gdal-dev mailing list > [hidden email] > http://lists.osgeo.org/mailman/listinfo/gdal-dev -- Spatialys - Geospatial professional services http://www.spatialys.com _______________________________________________ gdal-dev mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/gdal-dev |
Le mercredi 14 septembre 2016 20:50:04, James Ramm a écrit :
> Yes, it is loading a different python....The path printed by sys.path is > different to if I open the command window and type: > > python > > >>> import sys > >>> print(sys.path) > > Gdal and the python bindings were compiled with vs2015 and python 3.5, and > I can correctly import python in the 3.5 interpreter, yet somehow a > different python DLL (2.7) is being loaded at runtime. Might be related to the default try order in const char* const apszPythonSO[] = { "python27.dll", "python26.dll", "python34.dll", "python35.dll", "python36.dll", "python33.dll", "python32.dll" }; First found, first served. Hum maybe we should try to match the version that issuing python on the command line would start. We could potentially look at the PATH to see if there's something like "bla:\pythonXX" and try the related .dll... Or more costly, but more reliable, try issuing a 'python -c "import sys; print(str(sys.version_info[0]) + str(sys.version_info[1]))"' command That's one of the downside of requiring no dependency at build time. Anyway you can override the default guess by setting the PYTHONSO config option to point to the desired python dll. By the way I've committed the doc in https://trac.osgeo.org/gdal/changeset/35441 . Should be reflected online in a few hours. > I am on a > 'inherited' PC right now, so final thing to do is to ensure that the > gdal_translate I am running is the one I compiled and there isn't another > version lurking somewhere.... > > > On 14 September 2016 at 16:50, Even Rouault <[hidden email]> > > wrote: > > Le mercredi 14 septembre 2016 17:24:53, jramm a écrit : > > > Trying to run this using a function relying on scipy.ndimage... > > > > > > When running gdal_translate on the VRT, I get ImportError: No module > > > > named > > > > > scipy.ndimage > > > This comes after successfully import numpy. scipy.ndimage will happily > > > import within the python interpreter. > > > > Works for me for both inline or offline functions. > > > > Are you sure GDAL loads the same python lib as the python version used in > > the > > python interpreter ? (check the debug traces with CPL_DEBUG=ON) > > > > You can also add at the top of your script > > > > import sys > > print(sys.path) > > > > and check if the output points to a location where your scipy package can > > be > > found. > > > > > Any tips on how to track this down/debug? > > > > > The entire VRT file is as follows: > > I guess this is not the entire VRT since it refers to an inline > > definition of > > the script but <PixelFunctionCode> has empty content. > > > > > <VRTDataset RasterXSize="111090" RasterYSize="259376"> > > > > > > <SRS>PROJCS["OSGB 1936 / British National Grid",GEOGCS["OSGB > > > > > > 1936",DATUM["OSGB_1936",SPHEROID["Airy > > > 1830",6377563.396,299.3249646,AUTHORITY["EPSG","7001"]],TOWG > > > > S84[446.448,-12 > > > > > 5.157,542.06,0.15,0.247,0.842,-20.489],AUTHORITY["EPSG","627 > > > > 7"]],PRIMEM["Gr > > > > > eenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532 > > > > 925199433,AUTHO > > > > > RITY["EPSG","9122"]],AUTHORITY["EPSG","4277"]],PROJECTION[" > > > > Transverse_Merca > > > > > tor"],PARAMETER["latitude_of_origin",49],PARAMETER["central_ > > > > meridian",-2],P > > > > > ARAMETER["scale_factor",0.9996012717],PARAMETER["false_easti > > > > ng",400000],PAR > > > > > AMETER["false_northing",-100000],UNIT["metre",1,AUTHORITY[" > > > > EPSG","9001"]],A > > > > > XIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG", > > > > "27700"]]</SRS> > > > > > <GeoTransform>100180.0,5.0,0.0,1215730.0,0.0,-5.0</GeoTransform> > > > <VRTRasterBand dataType="Float32" band="1" > > > subClass="VRTDerivedRasterBand"> > > > > > > <SimpleSource> > > > > > > <SourceFilename > > > > > > relativeToVrt="0">F:\tif_data\large_sparse.tif</SourceFilename> > > > > > > <SourceProperties BlockXSize="256" BlockYSize="256" > > > > > > RasterXSize="111090" RasterYSize="259376"/> > > > > > > <OpenOptions> > > > > > > <OOI key="NUM_THREADS">4</OOI> > > > <OOI key="SPARSE_OK">TRUE</OOI> > > > > > > </OpenOptions> > > > > > > </SimpleSource> > > > <PixelFunctionType>extract_blobs</PixelFunctionType> > > > <PixelFunctionLanguage>Python</PixelFunctionLanguage> > > > <PixelFunctionCode> > > > > > > </PixelFunctionCode> > > > > > > <BufferRadius>5</BufferRadius> > > > <PixelFunctionArguments nodata="0" count_threshold="5"/> > > > > > > </VRTRasterBand> > > > > > > </VRTDataset> > > > > > > > > > > > > > > > -- > > > View this message in context: > > > http://osgeo-org.1560.x6.nabble.com/gdal-dev-VRT-derived- > > > > band-pixel-functi > > > > > ons-written-in-Python-tp5285323p5285882.html Sent from the GDAL - Dev > > > mailing list archive at Nabble.com. > > > _______________________________________________ > > > gdal-dev mailing list > > > [hidden email] > > > http://lists.osgeo.org/mailman/listinfo/gdal-dev > > > > -- > > Spatialys - Geospatial professional services > > http://www.spatialys.com -- Spatialys - Geospatial professional services http://www.spatialys.com _______________________________________________ gdal-dev mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/gdal-dev |
Ah, that makes sense. I'll have to try the config option in the morning, but it sounds like that could be it On 14 Sep 2016 8:06 p.m., "Even Rouault" <[hidden email]> wrote: Le mercredi 14 septembre 2016 20:50:04, James Ramm a écrit : _______________________________________________ gdal-dev mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/gdal-dev |
Free forum by Nabble | Edit this page |