Toggle navigation
OpenReview
.net
Login
×
Back to
ICML
ICML 2024 Workshop ES-FoMo-II Submissions
Mamba-PTQ: Outlier Channels in Recurrent Large Language Models
Alessandro Pierro
,
Steven Abreu
Published: 21 Jun 2024, Last Modified: 26 Jul 2024
ES-FoMo-II 2024 Poster
Readers:
Everyone
Performance Control in Early Exiting to Deploy Large Models at the Same Cost of Smaller Ones
Mehrnaz Mofakhami
,
Reza Bayat
,
Ioannis Mitliagkas
,
Joao Monteiro
,
Valentina Zantedeschi
Published: 21 Jun 2024, Last Modified: 26 Jul 2024
ES-FoMo-II 2024 Poster
Readers:
Everyone
Just read twice: closing the recall gap for recurrent language models
Simran Arora
,
Aman Timalsina
,
Aaryan Singhal
,
Sabri Eyuboglu
,
Xinyi Zhao
,
Ashish Rao
,
Atri Rudra
,
Christopher Re
Published: 21 Jun 2024, Last Modified: 26 Jul 2024
ES-FoMo-II 2024 Poster
Readers:
Everyone
Mobile and Edge Evaluation of Large Language Models
Stefanos Laskaridis
,
Kleomenis Katevas
,
Lorenzo Minto
,
Hamed Haddadi
Published: 21 Jun 2024, Last Modified: 26 Jul 2024
ES-FoMo-II 2024 Poster
Readers:
Everyone
Low-rank Linearization of Large Language Models
Michael Zhang
,
Aaryan Singhal
,
Benjamin Frederick Spector
,
Simran Arora
,
Christopher Re
Published: 21 Jun 2024, Last Modified: 26 Jul 2024
ES-FoMo-II 2024 Poster
Readers:
Everyone
Towards smaller language models via layer looping
Sabri Eyuboglu
,
Dylan Zinsley
,
Jon Saad-Falcon
,
Simran Arora
,
Atri Rudra
,
James Zou
,
Christopher Re
Published: 21 Jun 2024, Last Modified: 26 Jul 2024
ES-FoMo-II 2024 Poster
Readers:
Everyone
Simple linear attention language models balance the recall-throughput tradeoff
Simran Arora
,
Sabri Eyuboglu
,
Michael Zhang
,
Aman Timalsina
,
Silas Alberti
,
Dylan Zinsley
,
James Zou
,
Atri Rudra
,
Christopher Re
Published: 21 Jun 2024, Last Modified: 26 Jul 2024
ES-FoMo-II 2024 Oral
Readers:
Everyone
Fast and Memory-Efficient Multi-Sequence Generation via Structured Masking
Daniel Mingyi Israel
,
Siyan Zhao
,
Guy Van den Broeck
,
Aditya Grover
Published: 21 Jun 2024, Last Modified: 24 Jul 2024
ES-FoMo-II 2024 Poster
Readers:
Everyone
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
Alexander Hägele
,
Elie Bakouch
,
Atli Kosson
,
Loubna Ben allal
,
Leandro Von Werra
,
Martin Jaggi
Published: 21 Jun 2024, Last Modified: 26 Jul 2024
ES-FoMo-II 2024 Poster
Readers:
Everyone
TinyAgent: Quantization-aware Model Compression and Adaptation for On-device LLM Agent Deployment
Jason Kong
,
Lanxiang Hu
,
Flavio Ponzina
,
Tajana Rosing
Published: 21 Jun 2024, Last Modified: 24 Jul 2024
ES-FoMo-II 2024 Poster
Readers:
Everyone
Efficient LLM Pruning with Global Token-Dependency Awareness and Hardware-Adapted Inference
Oshin Dutta
,
Ritvik Gupta
,
Sumeet Agarwal
Published: 21 Jun 2024, Last Modified: 26 Jul 2024
ES-FoMo-II 2024 Poster
Readers:
Everyone
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
Qichen Fu
,
Minsik Cho
,
Thomas Merth
,
Sachin Mehta
,
Mohammad Rastegari
,
Mahyar Najibi
Published: 21 Jun 2024, Last Modified: 26 Jul 2024
ES-FoMo-II 2024 Poster
Readers:
Everyone
ExpoMamba: Exploiting Frequency SSM Blocks for Efficient and Effective Image Enhancement
Eashan Adhikarla
,
Kai Zhang
,
John Nicholson
,
Brian D. Davison
Published: 21 Jun 2024, Last Modified: 26 Jul 2024
ES-FoMo-II 2024 Poster
Readers:
Everyone
NVDSL: Simplifying Tensor Cores with Python-Driven MLIR Metaprogramming
guray ozen
Published: 21 Jun 2024, Last Modified: 26 Jul 2024
ES-FoMo-II 2024 Poster
Readers:
Everyone
Characterizing Prompt Compression Methods for Long Context Inference
Siddharth Jha
,
Lutfi Eren Erdogan
,
Sehoon Kim
,
Kurt Keutzer
,
Amir Gholami
Published: 21 Jun 2024, Last Modified: 26 Jul 2024
ES-FoMo-II 2024 Oral
Readers:
Everyone
Does your data spark joy? Performance gains from domain upsampling at the end of training
Cody Blakeney
,
Mansheej Paul
,
Brett W. Larsen
,
Sean Owen
,
Jonathan Frankle
Published: 21 Jun 2024, Last Modified: 26 Jul 2024
ES-FoMo-II 2024 Poster
Readers:
Everyone
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Jordan Juravsky
,
Bradley Brown
,
Ryan Saul Ehrlich
,
Daniel Y Fu
,
Christopher Re
,
Azalia Mirhoseini
Published: 21 Jun 2024, Last Modified: 24 Jul 2024
ES-FoMo-II 2024 Poster
Readers:
Everyone
Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models
Siyan Zhao
,
Daniel Mingyi Israel
,
Guy Van den Broeck
,
Aditya Grover
Published: 21 Jun 2024, Last Modified: 24 Jul 2024
ES-FoMo-II 2024 Poster
Readers:
Everyone
In Defense of Structural Sparse Adapters for Concurrent LLM Serving
Junda Su
,
Zirui Liu
,
Zeju Qiu
,
Weiyang Liu
,
Zhaozhuo Xu
Published: 21 Jun 2024, Last Modified: 26 Jul 2024
ES-FoMo-II 2024 Poster
Readers:
Everyone
Can Transformers Solve Least Squares to High Precision?
Jerry Weihong Liu
,
Jessica Grogan
,
Owen M Dugan
,
Simran Arora
,
Atri Rudra
,
Christopher Re
Published: 21 Jun 2024, Last Modified: 26 Jul 2024
ES-FoMo-II 2024 Poster
Readers:
Everyone
Revisiting Cascaded Ensembles for Efficient Inference
Steven Kolawole
,
Don Dennis
,
Ameet Talwalkar
,
Virginia Smith
Published: 21 Jun 2024, Last Modified: 26 Jul 2024
ES-FoMo-II 2024 Poster
Readers:
Everyone
MoRe Fine-Tuning with 10x Fewer Parameters
Wenxuan Tan
,
Nicholas Roberts
,
Tzu-Heng Huang
,
Jitian Zhao
,
John Cooper
,
Samuel Guo
,
Chengyu Duan
,
Frederic Sala
Published: 21 Jun 2024, Last Modified: 26 Jul 2024
ES-FoMo-II 2024 Poster
Readers:
Everyone
GRASS: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients
Aashiq Muhamed
,
Oscar Li
,
David Woodruff
,
Mona T. Diab
,
Virginia Smith
Published: 21 Jun 2024, Last Modified: 26 Jul 2024
ES-FoMo-II 2024 Poster
Readers:
Everyone
Prompt-prompted Adaptive Structured Pruning for Efficient LLM Generation
Harry Dong
,
Beidi Chen
,
Yuejie Chi
Published: 21 Jun 2024, Last Modified: 26 Jul 2024
ES-FoMo-II 2024 Oral
Readers:
Everyone
OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training
Sami Jaghouar
,
Johannes Hagemann
Published: 21 Jun 2024, Last Modified: 26 Jul 2024
ES-FoMo-II 2024 Poster
Readers:
Everyone
«
‹
1
2
3
4
›
»