# Stack Overflow Dataset (Preprocessed for Oort)

StackOverflow.com is an online question-and-answer site for programmers. This dataset includes an archive of Stack Overflow content, including posts, votes, tags, and badges.


## Organization

The [dataset](stackoverflow_preprocessed.tar.gz) is splited into training and testing set. Random ids were assigned to each client and client ids are encoded in the file name. 

# References
This dataset is covered in more detail at [https://www.tensorflow.org/federated/api_docs/python/tff/simulation/datasets/stackoverflow/load_data](https://www.tensorflow.org/federated/api_docs/python/tff/simulation/datasets/stackoverflow/load_data) and Its original location is at
[https://storage.googleapis.com/tff-datasets-public/stackoverflow.tar.bz2](https://storage.googleapis.com/tff-datasets-public/stackoverflow.tar.bz2).