## ✨ CoDiEmb

`CoDiEmb` is a high-performance framework for training unified text embedding models, especially suitable for **Information Retrieval (IR)** and **Semantic Textual Similarity (STS)** tasks.
<p align="center">
    <img src="./imgs/CoDiEmb.png" width="100%" height="100%">
</p>

## 🌟 Highlights

- **A Powerful Unified Framework**: Train a single model that converges effectively on both IR and STS tasks, providing a strong foundation for representation learning research.

- **Advanced Optimization Techniques**: Explore novel loss functions and a dynamic sampler designed to balance disparate task requirements and reduce training noise.

- **No Data Discarded**: Our unified data format handles training corpora of any granularity, ensuring no valuable training samples are wasted.