HyST: LLM-Powered Hybrid Retrieval over Semi-Structured Tabular Data

Jiyoon Myung, Jihyeon Park, Joohyung Han

Published: 2025, Last Modified: 27 Feb 2026CoRR 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: User queries in real-world recommendation systems often combine structured constraints (e.g., category, attributes) with unstructured preferences (e.g., product descriptions or reviews). We introduce HyST (Hybrid retrieval over Semi-structured Tabular data), a hybrid retrieval framework that combines LLM-powered structured filtering with semantic embedding search to support complex information needs over semi-structured tabular data. HyST extracts attribute-level constraints from natural language using large language models (LLMs) and applies them as metadata filters, while processing the remaining unstructured query components via embedding-based retrieval. Experiments on a semi-structured benchmark show that HyST consistently outperforms tradtional baselines, highlighting the importance of structured filtering in improving retrieval precision, offering a scalable and accurate solution for real-world user queries.
Loading