Rotative Factorization Machines

16 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Feature Interaction, Neural Networks, Self-Attentive Rotation
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Relation-Aware Feature Interaction Learning via Self-Attentive Rotations
Abstract: Feature interaction learning, which focuses on capturing the complex relationships among multiple features, is crucial in various real-world predictive tasks. However, most feature interaction approaches empirically enumerate all feature interactions within a predefined maximal order, which leads to suboptimal results due to the restricted learning capacity. Some recent studies propose intricate transformations to convert the feature interaction orders into learnable parameters, enabling them to automatically learn the interactions from data. Despite the progress, the interaction order of each feature is often independently learned, which lacks the flexibility to capture the feature dependencies in the varying context. In addition, they can only model the feature interactions within a bounded order due to the exponential growth of the interaction terms. To address these issues, we present a Rotative Factorization Machine (RFM). Unlike prior studies, RFM represents each feature as a polar angle in the complex plane. As such, the feature interactions are converted into a series of complex rotations, where the orders are cast into the rotation coefficients, thereby allowing for the learning of arbitrarily large order. Further, we propose a novel self-attentive rotation function that models the rotation coefficients through a rotation-based attention mechanism, which can adaptively learn the interaction orders from different interaction contexts. Moreover, it incorporates a modulus amplification network to learn the modulus of the complex features that further enhances the representations. Such a network can adaptively capture the feature interactions in the varying context, with no need of predefined order coefficients. Extensive experiments conducted on five widely used datasets have demonstrated the effectiveness of our approach.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 608
Loading