Abstract: Domain adaptation of Large Language Mod
els (LLMs) leads to models better suited for a
particular domain by capturing patterns from
domain text which leads to improvements in
downstream tasks. To the naked eye, these
improvements are visible; however, the pat
terns are not so. How can we know which pat
terns and how much they contribute to changes
in downstream scores? Through a Multilevel
Analysis we discover and quantify the effect of
text patterns on downstream scores of domain
adapted Llama 2 for the task of sentence sim
ilarity (BIOSSES dataset). We show that text
patterns from PubMed abstracts such as clear
writing and simplicity, as well as the amount
of biomedical information, are the key for im
proving downstream scores. Also, we show
how another factor not usually quantified con
tributes equally to downstream scores: choice
of hyperparameters for both domain adaptation
and fine-tuning.
Loading