{COMPANYNAME}11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype DiscoveryDownload PDF

Sep 25, 2019 (edited Dec 24, 2019)ICLR 2020 Conference Blind SubmissionReaders: Everyone
  • Original Pdf: pdf
  • TL;DR: We release a dataset constructed from single-lead ECG data from 11,000 patients who were prescribed to use the {DEVICENAME}(TM) device.
  • Abstract: We release the largest public ECG dataset of continuous raw signals for representation learning containing over 11k patients and 2 billion labelled beats. Our goal is to enable semi-supervised ECG models to be made as well as to discover unknown subtypes of arrhythmia and anomalous ECG signal events. To this end, we propose an unsupervised representation learning task, evaluated in a semi-supervised fashion. We provide a set of baselines for different feature extractors that can be built upon. Additionally, we perform qualitative evaluations on results from PCA embeddings, where we identify some clustering of known subtypes indicating the potential for representation learning in arrhythmia sub-type discovery.
  • Code: https://drive.google.com/file/d/1nwF-yGrDUIiBa15fcaPOXxuasn_6B50M/view
  • Keywords: representation learning, healthcare, medical, clinical, dataset, ecg, cardiology, heart, discovery, anomaly detection, out of distribution
7 Replies