Dataset Specifications
Overview
GreenEarthNet is an updated version of the EarthNet2021 dataset. Improvements are mainly:
- Files are now netCDF (having proper georeferencing)
- Landcover map included
- New cloud mask
- New Scoring with a focus on vegetation modeling
- No more mesoscale weather
It contains the same locations of minicubes that were present in EarthNet2021.
NOTE: In the paper the dataset is called GreenEarthNet, but in the codebase you will also find the following acronyms that were used during development:
earthnet2021x
,en21x
.
One Minicube
One Minicube (one sample) of GreenEarthNet contains 20 variables of different dimensions:
- Spatio-temporal:
- Sentinel 2
- Bands B02, B03, B04, B8A (blue, green, red, near-infrared)
- 20m resolution
- 5-daily (with NaN in between)
- Variable names
["s2_B02", "s2_B03", "s2_B04", "s2_B8A"]
- Sentinel 2 Auxilary information
- Improved cloud mask (variable name
"s2_mask"
) - Scene classification layer SCL (variable name
"s2_SCL"
) - Availability indicator (only temporal, variable name
"s2_avail"
)
- Improved cloud mask (variable name
- Sentinel 2
- Temporal:
- E-OBS meterology
- Wind speed
"eobs_fg"
(often missing!) - Relative humidity
"eobs_hu"
- Rainfall
"eobs_rr"
- Sea-level pressure
"eobs_pp"
- Shortwave downwelling radiation
"eobs_qq"
- Temperature (Daily Avg, Min, Max:
"eobs_tg", "eobs_tn", "eobs_tx"
) - daily
- Wind speed
- E-OBS meterology
- Spatial:
- Digital Elevation models
- from NASA, ESA and JAXA
- Variable names
["nasa_dem", "cop_dem", "alos_dem"]
- Resampled to 20m
- ESA Worldover Landcover map
"esawc_lc"
- Geomorpho90m terrain classification
"geom_cls"
- Digital Elevation models
Computing NDVI
The recommended way for computing the Normalized Difference Vegetation Index (NDVI) is using the Python package xarray
:
import xarray as xr
minicube = xr.open_dataset("path_to_minicube")
nir = minicube.s2_B8A
red = minicube.s2_B04
mask = minicube.s2_mask
ndvi = ((nir - red) / (nir + red + 1e-8)).where(mask == 0, np.NaN)
minicube["s2_ndvi"] = ndvi
Getting Sentinel 2 dates
Sentinel 2 observations are only (at maximum) 5-daily within the minicube. They are on each 5th datum, i.e. there is preceding 4 days of meterological observations before each Sentinel 2 observation.
You may select only dates with Sentinel 2 observations using the Python package xarray
:
import xarray as xr
minicube = xr.open_dataset("path_to_minicube")
minicube_on_sen2_dates = minicube.isel(time = slice(4, None, 5))
Aggregating E-OBS data to 5-daily
You may want to aggregate E-OBS data to 5-daily to match with Sentinel 2 observations. This is possible using the Python package xarray
:
import xarray as xr
minicube = xr.open_dataset("path_to_minicube")
minicube_5daily = minicube.coarsen(time = 5, coord_func = "max").mean()
Instead of mean()
, you may use other aggregation functions such as min()
or max()
.
Folder structure
After downloading GreenEarthNet to your data_dir
, you will have the following folder structure:
data_dir
├── train # training set
| ├── 29SND # Sentinel 2 tile with samples at lon 29, lat S, subquadrant ND
| | ├── 29SND_2017...124.nc # First training minicube, name cubename.nc
| | └── ...
| ├── 29SPC # there is 85 tiles in the train set
| └── ... # with 23816 .nc train minicubes in total
├── val_chopped # validation set
| └── ... # same as train, but with test samples
├── ood-t_chopped # ood-t test set
| └── ...
├── ood-s_chopped # ood-s test set
| └── ...
└── ood-st_chopped # ood-st test set
| └── ...