openEO based processing
The architecture of WorldCereal phase II is based upon the guidelines provided by the ESA APEx project. The key goal is to build a system and processing workflows that can be used beyond the project lifetime.
In accordance with the APEx framework, a few project specific choices have been made:
- openEO is selected as the standard in which processing workflows are described
- Copernicus Dataspace Ecosystem openEO federation is selected as the target platform
openEO as API
WorldCereal selects openEO as processing standard, because it allows to express both the training and inference workflows in a high level manner. While in phase I, the consortium had to invest substantially in writing data access code, this is now handled by openEO.
The use of openEO aids in generating results that follow FAIR data and open science principles. This is crucial to provide transparency and reproducibility of WorldCereal results. It will also make it very feasible for anyone to inspect workflows in a visual manner.
CDSE as processing platform
The scalable processing needed to generate a global map is offered by the CDSE openEO federation. This includes the software to manage cloud resources, the software to parallelize workflows, disaster recovery across 2 datacenters, and the operational burden of monitoring the system.
The CDSE open source deployment is described in more detail in the CDSE openEO documentation. The most important element is that CDSE components are required to maintain an uptime of 99.5% on a monthly basis. Lower uptimes are possible, but result in penalties for the platform operators. The contract also has a long lifetime, ensuring that WorldCereal workflows can be executed beyond the project lifetime.
Finally, it is important to note that CDSE is the only EU platform that offers access to the full archives of Sentinel-2 L2A and Sentinel-1 GRD data. Hence, for global map production there are currently no alternatives, except for non-EU commercial cloud providers.
STAC based data management
The project handles a number of assets that need to be managed:
- Final workflow output, in the form of GeoTiff raster files
- Raw EO data sampled over reference data locations and relevant time periods. Stored as netCDF.
- Reference data, stored as vector file (GeoJSON, GeoParquet)
- Trained models, stored as binary files
A key design element, is that all these assets are described using STAC metadata. By ingesting the metadata into a STAC catalog, it can be discovered and visualized.
The general design pattern for WorldCereal components is to read data from STAC and to again write it as STAC when generating derived assets. This is a very simple but powerful approach that allows WorldCereal components to be compatible with other external components that are also supporting STAC. Next to that, it is an important requirement for FAIR data principles.
Processing system design
The design of the processing system is focused on 3 main use cases:
- Generating maps based on existing models
- Training new models based on a user-defined selection of reference data
- Disseminating products
These processes, and the links to various components are shown in the diagram below. Key operational components are supported by the CDSE and APEx projects, to ensure that the system remains operational beyond project lifetime.
The WorldCereal Toolbox
is an open source Python library and set of Jupyter notebooks, that can be used to train custom models. It offers a user-friendly API and easy example code, to make the process as easy as possible.
This toolbox also contains other WorldCereal specific source code that was used to develop the system, or to generate openEO UDPs. It is based on GFMap, a general purpose, openEO-based, library for mapping applications.
The WorldCereal openEO UDPs
are user defined processes that encapsulate specific WorldCereal workflows. These workflows can be executed from generic tools such as the openEO web editor, and are hosted in APEx.
The APEx Algorithm hosting
service maintains a catalog of openEO UDPs, ensuring that they remain discoverable beyond the project lifetime, and allowing initiatives such as the ESA Stakeholder engagement facility to promote the use of WorldCereal services. The hosting service also runs regular benchmarks and tests, to be able to flag if an algorithm is no longer functional.
The APEx Upscaling service
allows to produce results over a larger area, based on openEO UDPs. This allows WorldCereal users to generate larger maps.
The STAC catalog
is used for dissemination and long term archiving of maps generated by the project. An openEO backend can export results directly to STAC, or this can be performed by the WorldCereal Toolbox.