FLARE: Task-Agnostic Embedding Model Evaluation via Normalizing Flows

ACL ARR 2026 January Submission4688 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Text Embedding, Task-Agnostic Evaluation, Normalizing Flow, Generalization Bound
Abstract: Despite the widespread adoption of text embedding models, selecting the optimal model for a specific target corpus remains challenging due to the lack of task-specific labels. While task-agnostic evaluation offers a promising solution by relying on unlabeled data, existing approaches based on kernel estimators or Gaussian mixtures fail to model high-dimensional distributions effectively, resulting in unstable rankings. To address this limitation, we propose \textbf{FLARE} (\textbf{F}low-based \textbf{L}abel-free \textbf{A}ssessment of \textbf{R}epresentation \textbf{E}mbeddings), which leverages normalizing flows to estimate information sufficency in high-dimensional spaces. By learning invertible transformations, flows enable exact density estimation while mitigating the instability inherent in distance-based methods. We provide theoretical guarantees showing that our estimation error depends on the data's intrinsic structure rather than its raw dimensionality. Experiments across 11 datasets demonstrate that FLARE achieves a strong Spearman's $\rho$ (up to 0.90) with supervised benchmarks, remaining robust even for high-dimensional embeddings ($d \ge 3{,}584$).
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: evaluation methodologies, metrics
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 4688
Loading