Uploading user datasets to the RDM

Introduction

To generate accurate cropland and crop type maps, high quality reference data is indispensable for both training classification algorithms and validation of the final products. Therefore, WorldCereal would like to engage with global agricultural community to stimulate and facilitate opening and sharing of reference data.

Users can upload their datasets in their area of interest or in data poor regions to contribute to high quality global WorldCereal products, or generate their own custom, high quality maps.

Uploading through User Interface

Note that in order to be able to upload your dataset, you need to sign up for a Copernicus Data Space Ecosystem (CDSE) account. This is completely free of charge!

A user can upload a dataset by clicking the “Contribute” button on the home page of the RDM website. We provide an intuitive and AI assisted workflow to facilitate fast upload and harmonization of user datasets.

Please check out our tutorial video which guides you through this procedure step-by-step:

For completeness, the different steps have also been documented below.

Step 1: Dataset Qualification Check

The dataset must adhere to certain formats and contain minimum attributes. The user must ensure the dataset meets the following requirements before being able to continue:

The dataset should have spatial geometry (polygons or points).
The dataset must include information on land cover/crop types.
The dataset should have information on observation time.
The dataset must cover years 2017 onwards (due to restricted availability of Sentinel imagery prior to 2017).

If the answer for all the above questions is “yes” then the datset is qualified to be uploaded to the RDM. These checks are mainly to prevent errors and ensure the data can be used for training/validating crop models.

Step 2: Prepare Your Dataset

Follow the below guidelines to ensure a smooth uploading procedure:

Supported Dataset Formats:
- GeoPackage (.gpkg): multi-layer geopackage files will be rejected. Make sure your file only contains one layer!
- ESRI shapefile (.shp): shapefiles typically consist of multiple files. All files related to the shapefile need to be zipped together into one .zip file.
- GeoParquet (.geoparquet)
- Parquet (.parquet)
No strict requirements are imposed regarding the dataset projection system.
All uploaded datasets are automatically converted to EPSG:4326 (WGS84).
Land cover/crop type information:

Crop type or land cover labels in your dataset will be automatically converted to the WorldCereal crop type legend to ensure compatibility with other datasets. Make sure you know which attribute (column) of your dataset contains this information, you will be asked to select this attribute during the upload procedure.

Supported data types for this attribute: String (preferred) or Integer.

!! In case land cover/crop type information is spread across multiple attributes, you will need to merge these attributes together before proceeding with the upload.
Validity Time (observation time):

There are two options to specify the observation time for the sampels in your dataset:
- Specific observation time for each sample: In this case, make sure you know which dataset attribute (column) contains this information.
  
  Supported data types for this attribute: Date or String.You will be asked to select the dataset attribute that contains the observation date for each individual sample.
- Specify one observation time for all samples contained in the dataset. We provide specific guidelines to help you assigning a reasonable observation time.
NOTE: The WorldCereal RDM does not support multi-year datasets.
In case your dataset contains samples gathered across multiple years, you will receive a message during upload asking which part of the dataset needs to be processed.
Alternatively, you can split your dataset according to calendar year before proceeding with the upload.
Irrigation Status (optional):

You will be asked to select the dataset attribute containing information on irrigation (if present). You will be guided to map the original irrigation labels to the WorldCereal irrigation legend.

Supported data types for this attribute: String (preferred) or Integer.

Step 3: Dataset Upload & Harmonization

Next step is to upload the dataset through the user interface (accessed through the “Contribute” button, here).

Drag and drop the dataset file.
Dataset Naming:

Your dataset will automatically receive a standardized name according to our dataset naming convention.
You will be asked to specify an “identifier”, which should refer to the origin of the dataset (e.g. organization, project). This will be automatically supplemented with year, region, type and information content of the dataset.
Select key dataset attributes:

You will be presented with a list of dataset attributes. Select those attributes referring to land cover/crop type information, validity time (if present) and irrigation status (if present).

Alternatively, specify one validity time for the entire dataset.
Review and submit the harmonization:

In case your dataset contains observations across multiple years, select the year that needs to be processed.

You will be presented with the result of the AI-based automated mapping of crop type and irrigation types to the respective WorldCereal legends.
Review the mapping (pay specific attention to fields mapped to “unknown”) and submit.

After the file is uploaded successfully the RDM processes the file and adds your dataset to the community store as a fully private dataset (only accessible by you).

Using your data in the processing module

As soon as your private dataset has been uploaded successfully, you can either use the RDM web interface, or the RDM REST API services to interact with your data.
More information and dedicated guidelines can be found here.

Users will be able to use the uploaded datasets to train cropland/crop type models in the processing module.
More details on this will be added once this functionality is fully operational.