Abstract: Highlights•Release of pretrained BERT models for Brazilian Portuguese trained on the brWaC corpus.•State-of-the-art performance on three Portuguese NLP tasks: STS, RTE and NER.•Tokenization analysis reveals correlation between task performance and subword splits.
Loading