FOSS4G 2022 | Semantic querying in earth observation data cubes

1 yr ago

5 views

Earth observation (EO) imagery has become an essential source of information to better monitor and understand the impact of major social and environmental issues. In recent years we have seen significant improvements in availability and accessibility of these data. Programs like Landsat and Copernicus release new images every day, freely and openly available to everyone. Technological improvements such as data cubes (e.g. OpenDataCube), scalable cloud-based analysis platforms (e.g. Google Earth Engine) and standardized data access APIs (e.g. OpenEO) are easing the retrieval of the data and enabling higher processing speeds. All these developments have lowered the barriers for utilizing the value of EO imagery, yet translating EO imagery directly into information using automated and repeatable methods remains a main challenge. Imagery lacks inherent semantic meaning, thus requires interpretation. For example, consider someone who uses EO imagery to monitor vegetation loss. A multi-spectral satellite image of a location may consist of an array of digital numbers representing the intensity of reflected radiation at different wavelengths. The user, however, is not interested in digital numbers, they are interested in a semantic categorical value stating if vegetation was observed. Inferring this semantic variable from the reflectance values is an inherently ill-posed problem, since it requires bridging a gap between the two-dimensional image domain and the four-dimensional spatio-temporal real-world domain. Advanced technical expertise in the field of EO analytics is needed for this task, making it a remaining barrier on the way to a broad utilization of EO imagery across a wide range of application domains. We propose a semantic querying framework for extracting information from EO imagery as a tool to help bridge the gap between imagery and semantic concepts. The novelty of this framework is that it makes a clear separation between the image domain and the real-world domain. There are three main components in the framework. The first component forms the real-world domain. This is where EO data users interact with the system. They can express their queries in the real-world domain, meaning that they directly reference semantic concepts that exist in the real world (e.g. forest, fire). For simplicity reasons, we currently work on a higher level of abstraction, and focus on concepts that correspond to land-cover classes (e.g. vegetation). For example, a user can query how often vegetation was observed at a certain location during a certain timespan. These queries do not contain any information on how the semantic concepts are represented by the underlying data. The second component forms the image domain. This is where the EO imagery is stored in a data cube, a multi-dimensional array organizing the data in a way that simplifies storage, access and analysis. Besides the imagery itself, the data cube may be enriched with automatically generated layers that already offer a first degree of interpretation for each pixel (i.e. a semantically-enabled data cube [1]), as well as with additional data sources that can be utilized to better represent certain properties of real-world semantic concepts (e.g. digital elevation models). The third component serves as the mapping between the real-word domain and the image domain. This is where EO data experts bring their expertise into the system, by formalizing relationships between the observed data values and the presence of a real-world semantic concept. In our current work these relationships are always binary, meaning that the concept is marked either as present or not present. However, the structure allows also for non-binary relationships, e.g. probabilities that a concept is present given the observed data values. We implemented a proof-of-concept of our proposed framework as an open-source Python library (see https://github.com/ZGIS/semantique). The library contains functions and classes that allow users to formulate their queries and call a query processor to execute them with respect to a specific mapping. Queries are formulated by chaining together semantic concept references and analytical processes. The query processor will translate each referenced semantic concept into a multi-dimensional array covering the spatio-temporal extent of the query. It does so by retrieving the relevant data values from the data storage, and subsequently applying the rules that are specified in the mapping. If the relationships are binary, the resulting array will be boolean, with “true” values for those pixels that are identified as being an observation of the referenced concept, and “false” values for all other pixels. Analytical processes can then be applied to this array. Each process is a well-defined array operation performing a single task. For example, applying a…

Lucas van der Meer

https://talks.osgeo.org/foss4g-2022-academic-track/talk/VSSKSB/

#foss4g2022 #academictrack

Featured Channels

FOSS4G 2022 | Semantic querying in earth observation data cubes

Featured Channels