[CS582Reading Assignment 1-3] DNA language model GROVER learns sequence context in the human genome

18 Sept 2024 (modified: 08 Oct 2024)UIUC Fall 2024 CS582 MLCB SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: tokenization, human genome LM
Abstract: **Additional question 1** What might be the potential downside or limitation of using BPE tokenization and next k-mer prediction as training tasks?
Submission Number: 3
Loading