MOGIC: METADATA-INFUSED ORACLE GUIDANCE FOR IMPROVED EXTREME CLASSIFICATION

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: recommendation systems, auxiliary information, extreme classification, metadata
TL;DR: Oracle guided enhancement of memory representations improves task performance
Abstract:

Retrieval-augmented classification and generation models significantly benefit from the early-stage fusion of high-quality text-based auxiliary metadata, often called memory, but they suffer from high inference latency and poor robustness to noise. In classifications tasks, particularly the extreme classification (XC) setting, where low latency is critical, existing methods incorporate metadata for context enrichment via an XC-based retriever and obtain the representations of the relevant memory items to perform late-stage fusion to achieve low latency. With an aim of achieving higher accuracy while meeting the low latency constraints, in this paper, we propose MOGIC, an approach for metadata-infused Oracle guidance for XC tasks. In particular, we train an early-fusion Oracle classifier with access to both query-side and label-side ground-truth metadata in the textual form. The Oracle is subsequently used to guide the training of any existing memory-based XC Disciple model via regularization. The MOGIC algorithm, when applied to memory-based XC Disciple models such as OAK, improves precision@1 and propensity-scored precision@1 by ~2% on four standard datasets, at no additional inference-time costs to the Disciple model. We also show the feasibility of applying the MOGIC algorithm to improve the performance of state-of-the-art memory-free XC approaches such as NGAME or DEXA, demonstrating that the MOGIC algorithm can be used atop any existing XC-based approach in a plug-and-play manner. Finally, we also show the robustness of the MOGIC method to missing and noisy metadata settings. We will release code on acceptance.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7374
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview