Keywords: Large Language Models, Hyperbolic Space, Low-Rank Adaptation, Embedding Space
TL;DR: A novel method to efficiently fine-tune Large Language Models in hyperbolic space, unlocking their latent tree-like structures and significantly boosting complex reasoning performance by adapting directly on the hyperbolic manifold.
Abstract: Large language models (LLMs) have demonstrated remarkable performance across various tasks. However, it remains an open question whether the default Euclidean space is the most suitable choice for LLMs.
In this study, we investigate the geometric characteristics of LLMs, focusing specifically on tokens and their embeddings.
Our findings reveal that token frequency follows a power-law distribution, where high-frequency tokens (e.g., the, that ) constitute the minority, while low-frequency tokens (e.g., apple, dog) constitute the majority. Furthermore, high-frequency tokens cluster near the origin, whereas low-frequency tokens are positioned farther away in the embedding space.
Additionally, token embeddings exhibit hyperbolic characteristics, indicating a latent tree-like structure within the embedding space.
Motivated by these observations, we propose **HypLoRA**, an efficient fine-tuning approach that operates in hyperbolic space to exploit these underlying hierarchical structures better.
HypLoRA performs low-rank adaptation directly in hyperbolic space, thereby preserving hyperbolic modeling capabilities throughout the fine-tuning process.
Extensive experiments across various base models and reasoning benchmarks, specifically arithmetic and commonsense reasoning tasks, demonstrate that HypLoRA substantially improves LLM performance.
Supplementary Material: zip
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 20160
Loading