Toggle navigation
OpenReview
.net
Login
×
Back to
ICML
ICML 2025 Workshop MOSS Submissions
Understanding Attention Glitches with Threshold Relative Attention
Mattia Opper
,
Roland Fernandez
,
Paul Smolensky
,
Jianfeng Gao
Published: 10 Jun 2025, Last Modified: 15 Jul 2025
MOSS@ICML2025
Readers:
Everyone
Review, Remask, Refine: Process-Guided Block Diffusion for Text Generation
Nikita Mounier
,
Parsa Idehpour
Published: 10 Jun 2025, Last Modified: 15 Jul 2025
MOSS@ICML2025
Readers:
Everyone
Parity Requires Unified Input Dependence and Negative Eigenvalues in SSMs
Behnoush Khavari
,
Jayesh Khullar
,
Mehran Shakerinava
,
Jerry Huang
,
Siamak Ravanbakhsh
,
Sarath Chandar
Published: 10 Jun 2025, Last Modified: 15 Jul 2025
MOSS@ICML2025
Readers:
Everyone
Learning Gaussian Mixture Models via Transformer Measure Flows
Aleksandr Zimin
,
Anastasiia Kutakh
,
Yury Polyanskiy
,
Philippe Rigollet
Published: 10 Jun 2025, Last Modified: 15 Jul 2025
MOSS@ICML2025
Readers:
Everyone
SynDaCaTE: A Synthetic Dataset For Evaluating Part-Whole Hierarchical Inference
Jake Levi
,
Mark van der Wilk
Published: 10 Jun 2025, Last Modified: 15 Jul 2025
MOSS@ICML2025
Readers:
Everyone
Restoring Task-Relevant Information in Synthetic Data: A Small-Scale V-Information View
Sid Bharthulwar
Published: 10 Jun 2025, Last Modified: 15 Jul 2025
MOSS@ICML2025
Readers:
Everyone
From SGD to Spectra: A Theory of Neural Network Weight Dynamics
Brian Richard Olsen
,
Sam Fatehmanesh
,
Frank Xiao
,
Adarsh Kumarappan
,
Anirudh Gajula
Published: 10 Jun 2025, Last Modified: 15 Jul 2025
MOSS@ICML2025
Readers:
Everyone
Gradient descent in presence of extreme flatness and steepness
Dravyansh Sharma
Published: 10 Jun 2025, Last Modified: 15 Jul 2025
MOSS@ICML2025
Readers:
Everyone
Approximate Message Passing on General Factor Graphs using Shallow Neural Networks
Leonhard Hennicke
,
Jan Lemcke
,
Rainer Schlosser
,
Ralf Herbrich
Published: 10 Jun 2025, Last Modified: 15 Jul 2025
MOSS@ICML2025
Readers:
Everyone
Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry
Sai Sumedh R. Hindupur
,
Ekdeep Singh Lubana
,
Thomas Fel
,
Demba E. Ba
Published: 10 Jun 2025, Last Modified: 15 Jul 2025
MOSS@ICML2025
Readers:
Everyone
Improving Pathfinding with Anchoring Tokens
Huaqing Zhang
,
Bingbin Liu
,
Juno Kim
,
Andrej Risteski
Published: 10 Jun 2025, Last Modified: 15 Jul 2025
MOSS@ICML2025
Readers:
Everyone
Personalizing AI Interventions in Multiple Health Behavioral Change Settings
Samantha Marks
,
Michelle Chang
,
Eura Nofshin
,
Weiwei Pan
,
Finale Doshi-Velez
Published: 10 Jun 2025, Last Modified: 15 Jul 2025
MOSS@ICML2025
Readers:
Everyone
Towards Understanding Self-Pretraining for Sequence Classification
Omar Coser
,
Antonio Orvieto
Published: 10 Jun 2025, Last Modified: 15 Jul 2025
MOSS@ICML2025
Readers:
Everyone
Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit
Valérie Costa
,
Thomas Fel
,
Ekdeep Singh Lubana
,
Bahareh Tolooshams
,
Demba E. Ba
Published: 10 Jun 2025, Last Modified: 15 Jul 2025
MOSS@ICML2025
Readers:
Everyone
Geometry of Rank Constraints in Shallow Polynomial Neural Networks
Param Mody
,
Maksym Zubkov
Published: 10 Jun 2025, Last Modified: 15 Jul 2025
MOSS@ICML2025
Readers:
Everyone
Discovering Hidden Algebraic Structures via Transformers with Rank-Aware Beam GRPO
Jaeha Lee
,
Gio Huh
,
Ning Su
,
Tony Yue YU
Published: 10 Jun 2025, Last Modified: 15 Jul 2025
MOSS@ICML2025
Readers:
Everyone
How Much Context Does Natural Language Actually Require? An Analysis Using LLMs as Statistical Oracles
Vala Vakilian
,
Sadegh Mahdavi
,
Christos Thrampoulidis
Published: 10 Jun 2025, Last Modified: 15 Jul 2025
MOSS@ICML2025
Readers:
Everyone
Decomposed Learning: An Avenue for Mitigating Grokking
Gabryel Mason-Williams
,
Israel Mason-Williams
Published: 10 Jun 2025, Last Modified: 15 Jul 2025
MOSS@ICML2025
Readers:
Everyone
Pruning Increases Orderedness in Weight-Tied Recurrent Computation
YIDING SONG
Published: 10 Jun 2025, Last Modified: 15 Jul 2025
MOSS@ICML2025
Readers:
Everyone
Stats or Facts: Decomposing Generalization in Language Models with Small-Scale Models
Tina Behnia
,
Puneesh Deora
,
Christos Thrampoulidis
Published: 10 Jun 2025, Last Modified: 15 Jul 2025
MOSS@ICML2025 Oral
Readers:
Everyone
In-Context Occam’s Razor: How Transformers Prefer Simpler Hypotheses on the Fly
Puneesh Deora
,
Bhavya Vasudeva
,
Tina Behnia
,
Christos Thrampoulidis
Published: 10 Jun 2025, Last Modified: 15 Jul 2025
MOSS@ICML2025 Oral
Readers:
Everyone
Why Loss Re-weighting Works If You Stop Early: Training Dynamics of Unconstrained Features
Yize Zhao
,
Christos Thrampoulidis
Published: 10 Jun 2025, Last Modified: 15 Jul 2025
MOSS@ICML2025
Readers:
Everyone
Continuous Chain of Thought Enables Parallel Exploration and Reasoning
Halil Alperen Gozeten
,
Muhammed Emrullah Ildiz
,
Xuechen Zhang
,
Hrayr Harutyunyan
,
Ankit Singh Rawat
,
Samet Oymak
Published: 10 Jun 2025, Last Modified: 15 Jul 2025
MOSS@ICML2025
Readers:
Everyone
The Necessity for Intervention Fidelity: Unintended Side Effects When Steering LLMs
Jonas B Raedler
,
Weiyue Li
,
Alyssa Mia Taliotis
,
Manasvi Goyal
,
Siddharth Swaroop
,
Weiwei Pan
Published: 10 Jun 2025, Last Modified: 15 Jul 2025
MOSS@ICML2025
Readers:
Everyone
Optimizing Explanations: Nuances Matter When Evaluation Metrics Become Loss Functions
Jonas B Raedler
,
Hiwot Belay Tadesse
,
Weiwei Pan
,
Finale Doshi-Velez
Published: 10 Jun 2025, Last Modified: 17 Jul 2025
MOSS@ICML2025
Readers:
Everyone
«
‹
1
2
3
›
»