\documentclass{article}

\usepackage[utf8]{inputenc}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{graphicx}
\usepackage{hyperref}

\title{Scratchpad}
\author{}
\date{\today}

\begin{document}

\maketitle

\section*{Notes}

% Start your notes here

\subsection*{Section 1}


\textsc{\subsection{Model Types}

We categorize the models evaluated in this study according to their underlying architecture and training strategies.

\textbf{Transformer Base Models.}
As a baseline, we include transformer-based encoder models that have not been further fine-tuned for any specific downstream task. For these models, the final \textit{\[CLS\]} token representation is extracted and cosine similarity is used to compute the relevance between the input text and each candidate label. This straightforward approach provides a useful point of reference for subsequent comparisons. The base models considered in this category are the original BERT (\textit{bert-large-uncased} \cite{Devlin2019BERT}), the increasingly adopted ModernBERT (\textit{ModernBERT-large} \cite{ModernBERT2024}), and DeBERTa-v3 (\textit{deberta-v3-large} \cite{He2023DeBERTaV3}), a popular and robust modification of BERT that has demonstrated strong performance on a variety of NLP benchmarks.

\textbf{NLI-based Cross-Encoders.}
These models are trained on natural language inference (NLI) datasets and perform classification by assessing the degree of entailment between an input text and each candidate label, formulated as a premise–hypothesis pair. \textit{BART-Large-MNLI} \cite{Lewis2020BART} is included as the canonical representative, being the first widely used NLI-based cross-encoder for zero-shot classification. We also consider \textit{NLI-RoBERTa-base}, following \cite{Reimers2019SBERT}, as well as a set of custom-trained cross-encoders using \textit{BERT}, \textit{DeBERTa-v3}, and \textit{ModernBERT} backbones. Both base and large versions are evaluated to analyze the effect of model scale, and two loss variants are tested to assess the impact of training objectives. Full details of the training procedure are provided in Section~\ref{sec:exp_setup}. In total, 11 NLI-based cross-encoders are benchmarked, covering the most widely used configurations in the literature.

\textbf{Embedding Models.}
This category comprises models optimized to produce fixed-size vector representations of text for a range of downstream tasks, including classification. As a canonical embedding model, \textit{all-MiniLM-L6-v2} \cite{SBERT2019} is included for its efficiency and strong empirical results, serving as a baseline for this model family. Additionally, we evaluate both base and large variants of BGE, GTE, and E5, all of which use variations of transformer encoders as backbones. To provide contrast, we also include embedding models that leverage large language model architectures, such as Qwen3-Embedding and e5-mistral-7b-instruct; for Qwen3-Embedding, both 0.6B and 8B parameter variants are tested to study the effect of scale. Overall, the embedding model category comprises 11 distinct models.

\textbf{Rerankers.}
Reranker models are typically employed in information retrieval, where they re-score candidate documents for relevance to a given query. The \textit{ms-marco-MiniLM-L6-v2} model serves as the reranker counterpart to \textit{all-MiniLM-L6-v2} and is used as the baseline for this group. Similarly, \textit{gte-reranker-modernbert-base} and \textit{bge-reranker-base/large} serve as reranking counterparts to their respective embedding models. We further include \textit{Qwen3-Reranker}, which outputs a relevance score between a document and a query by prompting the model to decide if the document is relevant. The probability assigned to the "yes" token (computed from the model’s vocabulary distribution using a softmax, with all other tokens masked out, except for "yes" and "no") is used as the final relevance score. Both the 0.6B and 8B variants of \textit{Qwen3-Reranker} are evaluated to analyze the impact of model size. In total, 6 reranker models are benchmarked.

Table~\ref{tab:tbl_02_models_overview} summarizes the models included in our experiments, listing their architecture, training data, and parameter count. In total, the benchmark covers 29 models.
}

\subsection*{Section 2}

% ...

\end{document}
