Structure
Nothing clever here 😉. Every sample is stored as a compressed numpy array .npz. File structure follows <split_name>/<tile_name>/<cube_name.npz>
EarthNet2021
├── train # training set of the EarthNet2021 challenge
| ├── 29SND # Sentinel 2 tile with samples at lon 29, lat S, subquadrant ND
| | ├── 29SND_2017...124.npz # First training sample, name cubename.npz
| | └── ...
| ├── 29SPC # there is 85 tiles in the train set
| └── ... # with 23904 .npz train samples in total
├── iid_test_split # main track testing samples
| ├── context # input data ("context") for models
| | └── 29SND
| | | └── context_29SND_....npz # context cube, name: context_cubename.npz
| | └── ...
| └── target # target/output data for models
| ├── 29SND # tiles containing iid_test samples.
| | └── target_29SND_....npz # context cube, name: target_cubename.npz
| └── ... # there is 4219 iid_test samples in total
├── ood_test_split # robustness track testing samples
| ├── context
| └── target # there is 4214 ood_test samples in total
├── extreme_test_split # extreme weather testing samples
| ├── context
| └── target # there is 4000 extreme_test samples in total
└── seasonal_test_split # seasonal testing samples
├── context
└── target # there is 4000 iid_test samples in total
ProTip: Each cubename has format tile_startyear_startmonth_startday_endyear_endmonth_endday_hrxmin_hrxmax_hrymin_hrymax_mesoxmin_mesoxmax_mesoymin_mesoymax
. So contains the exact spatiotemporal footprint of the particular sample.