# Dataset collections overview:

currently dataset can be divided into 3 classes

- language knowledge

  - summarization

  - translation

- dialogue : don't let user know you are a robot

- STEM : knowledge about the world

  - code

  - world knowledge <= ideally we want to handle this via prefix context

- qa

Issues and TODO:

- as dataset are growing, how can we update this section less

- ideally we can update the config yaml and new dataset will be download from
  hub

  - one possible idea is we upload the transform format of these dataset to the
    OA hub
