lecture: Development of a new framework for Distributed Processing of Big Geospatial Data
The Geospatial world is still facing the lack of well-established distributed processing solutions tailored to the amount and heterogeneity of geodata, especially when fast data processing is a must. However, most current distributed computing frameworks have important limitations regarding both data distribution and data partitioning methods. Hence, this paper presents a prototype for tiling, stitching and processing of big geospatial data. The system is based on the IQLib concept (https://github.com/posseidon/IQLib/) developed in the frame of the IQmulus EU FP7 research and development project (http://www.iqmulus.eu). The data distribution framework has no limitations on programming language environment and can execute scripts (and workflows) written in different development frameworks (e.g. Python, R or C#). It is capable of processing raster, vector and point cloud data. Our intention is to provide a solution to perform a wide range of geospatial processing capabilities in a distributed environment with no restrictions on data storage concepts. Our research covers methods controlling data partitioning, distributed processing and data assimilation as well. Partitioning (also referred to as “Tiling”) is a very delicate yet crucial step having impact on the whole processing. After algorithms have processed these “chunks” or “tiles” of data, partial results are collected to carry out data assimilation or “Stitching”.
The paper presents the above-mentioned prototype through a case study dealing with country-wide processing of raster imagery. Assessment is carried out by comparing the results (computing time, accuracy, etc.) to concurrent solutions. Further investigations on algorithmic and implementation details are in focus for the near future.