---
language: catalan
---

# CALBERT: a Catalan Language Model

## Introduction

CALBERT is an open-source language model for Catalan based on the ALBERT architecture. 

It is now available on Hugging Face in its `base-uncased` version, and was pretrained on the [OSCAR dataset](https://traces1.inria.fr/oscar/).

For further information or requests, please go to the [GitHub repository](https://github.com/codegram/calbert)

## Pre-trained models

| Model                               |  Arch.           | Training data                     |
|-------------------------------------|------------------|-----------------------------------|
| `codegram` / `calbert-base-uncased` |  Base (uncased)  | OSCAR (4.3 GB of text)            |


## Authors 

CALBERT was trained and evaluated by [Txus Bach](https://twitter.com/txustice), as part of [Codegram](https://www.codegram.com)'s applied research.

