Uploading user datasets to the RDM
Introduction
To generate accurate cropland and crop type maps, high quality reference data is indispensable for both training classification algorithms and validation of the final products.
Users of the WorldCereal system can upload their own reference datasets to:
- Create fully customized cropland and crop type models tuned to their region, season and crops of interest. Adding local, high quality reference data for the specific task at hand will always result in more accurate custom products compared to using the generic, globally representative, default WorldCereal models.
- Contribute to high quality global WorldCereal cropland and crop type maps, particularly for your area of interest, and receive proper attribution for your valuable contributions.
Dataset upload is accomplished through our user-friendly, highly automated web tool that takes care of data ingestion and harmonization to WorldCereal standards.
In the following sections you find more information regarding data licensing, practicial instructions on the use of the upload tool and how your data can be used by the WorldCereal classification module.
Note on data license and protection for uploaded datasets
Default terms: All rights reserved
Upon successful upload to the WorldCereal RDM, each user dataset is by default treated as a private dataset. This effectively means that all rights are reserved by the data contributor and no part of the dataset may be copied, reproduced, publicly displayed, distributed, published, adapted, translated, or otherwise used in any form or by any means by any other party.
To guarantee proper protection of uploaded datasets, the WorldCereal consortium has established a dedicated authentication and data access system: - uploading of datasets can only be done after authentication through a valid Copernicus Data Space Ecosystem (CDSE) account - once uploaded, the dataset is tied to the CDSE account of the data contributor - all data interaction tools provided by the RDM to explore, filter and download reference data automatically take into account data ownership: non-authenticated users can only view fully public datasets, while authenticated users only see public and their own private datasets linked to their account.
What happens if your CDSE account gets removed?
Your private datasets are retained in the RDM, without any changes to their respective access policies.
To access your datasets again, you will need to create a new CDSE account with the same email id as originally used for uploading datasets.
Data publication options
Data contributors are invited to publish their dataset(s) and make them available to the wider agricultural monitoring community, thereby supporting our push towards open data and science. Further instructions and data license options are specified on this dedicated page.
Uploading through User Interface
Note that in order to be able to upload your dataset, you need to sign up for a Copernicus Data Space Ecosystem (CDSE) account. This is completely free of charge!
A user can upload a dataset by clicking the “Contribute” button on the home page of the RDM website. We provide an intuitive and AI assisted workflow to facilitate fast upload and harmonization of user datasets.
Please check out our tutorial video which guides you through this procedure step-by-step:
For completeness, the different steps have also been documented below.
Step 1: Dataset Qualification Check
The dataset must adhere to certain formats and contain minimum attributes. The user must ensure the dataset meets the following requirements before being able to continue:
The dataset should have spatial geometry (polygons or points).
The dataset must include information on land cover/crop types.
The dataset should have information on observation time.
The dataset must cover years 2017 onwards (due to restricted availability of Sentinel imagery prior to 2017).
If the answer for all the above questions is “yes” then the datset is qualified to be uploaded to the RDM. These checks are mainly to prevent errors and ensure the data can be used for training/validating crop models.
Step 2: Prepare Your Dataset
Follow the below guidelines to ensure a smooth uploading procedure:
Supported Dataset Formats:
- GeoPackage (.gpkg): multi-layer geopackage files will be rejected. Make sure your file only contains one layer!
- ESRI shapefile (.shp): shapefiles typically consist of multiple files. All files related to the shapefile need to be zipped together into one .zip file.
- GeoParquet (.geoparquet)
- Parquet (.parquet)
No strict requirements are imposed regarding the dataset projection system.
All uploaded datasets are automatically converted to EPSG:4326 (WGS84).Land cover/crop type information:
Crop type or land cover labels in your dataset will be automatically converted to the WorldCereal crop type legend to ensure compatibility with other datasets. Make sure you know which attribute (column) of your dataset contains this information, you will be asked to select this attribute during the upload procedure.
Supported data types for this attribute: String (preferred) or Integer.
!! In case land cover/crop type information is spread across multiple attributes, you will need to merge these attributes together before proceeding with the upload.
Validity Time (observation time):
There are two options to specify the observation time for the sampels in your dataset:
Specific observation time for each sample: In this case, make sure you know which dataset attribute (column) contains this information.
Supported data types for this attribute: Date or String.You will be asked to select the dataset attribute that contains the observation date for each individual sample.
Specify one observation time for all samples contained in the dataset. We provide specific guidelines to help you assigning a reasonable observation time.
NOTE: The WorldCereal RDM does not support multi-year datasets.
In case your dataset contains samples gathered across multiple years, you will receive a message during upload asking which part of the dataset needs to be processed.
Alternatively, you can split your dataset according to calendar year before proceeding with the upload.Irrigation Status (optional):
You will be asked to select the dataset attribute containing information on irrigation (if present). You will be guided to map the original irrigation labels to the WorldCereal irrigation legend.
Supported data types for this attribute: String (preferred) or Integer.
Step 3: Dataset Upload & Harmonization
Next step is to upload the dataset through the user interface (accessed through the “Contribute” button, here).
Drag and drop the dataset file.
Dataset Naming:
Your dataset will automatically receive a standardized name according to our dataset naming convention.
You will be asked to specify an “identifier”, which should refer to the origin of the dataset (e.g. organization, project). This will be automatically supplemented with year, region, type and information content of the dataset.Select key dataset attributes:
You will be presented with a list of dataset attributes. Select those attributes referring to land cover/crop type information, validity time (if present) and irrigation status (if present).
Alternatively, specify one validity time for the entire dataset.
Review and submit the harmonization:
In case your dataset contains observations across multiple years, select the year that needs to be processed.
You will be presented with the result of the AI-based automated mapping of crop type and irrigation types to the respective WorldCereal legends.
Review the mapping (pay specific attention to fields mapped to “unknown”) and submit.
After the file is uploaded successfully the RDM processes the file and adds your dataset to the community store as a fully private dataset (only accessible by you).
Step 5: Publish your data
Datasets can be published and shared with the broader community after upload. More information and instructions are available on this dedicated publication page.
Using your data in the processing module
As soon as your private dataset has been uploaded successfully, you can either use the RDM web interface, or the RDM REST API services to interact with your data.
Users will be able to use the uploaded datasets to train cropland/crop type models in the WorldCereal processing module.
A first step in this procedure would be to extract Earth Observation time series data matching the observations in your private datasets. This procedure is explained in this interactive notebook.
Based on the extracted data, along with publicly available reference data, users can then continue to train and apply customized crop models. A complete walk-through can be found here.