# TUEG Dataset

- Data acquired from the official [TUH EEG](https://isip.piconepress.com/projects/nedc/html/tuh_eeg/) repository
- Dataset used for masked autoencoding pre-training task
- We select only EEG channels (between 18 and 36 depending on the sample) at 256 Hz. See 
- To enforce disjointness between our pre-training and fine-tuning data, we remove all TUAB patients from the TUEG corpus.
- Each sample consists of 5 seconds of EEG recording.
- We additionally apply min-max normalization per channel each time a data sample is retrieved.
- The data can be stored in different modalities. We recommend HDF5 format (see `datasets/hdf5_dataset.py`).
- To create the dataset, once acquiring the relevant files from the TUH repository, see `make_datasets/make_tueg.py`.
