Toggle navigation
OpenReview
.net
Login
×
Back to
ICLR
ICLR 2024 Workshop SeT LLM Submissions
Toward Robust Unlearning for LLMs
ICLR 2024 Workshop SeT LLM Submission114 Authors
Published: 04 Mar 2024, Last Modified: 06 May 2024
SeT LLM @ ICLR 2024
Readers:
Everyone
PANDORA: Detailed LLM Jailbreaking via Collaborated Phishing Agents with Decomposed Reasoning
Zhaorun Chen
,
Zhuokai Zhao
,
Wenjie Qu
,
Zichen Wen
,
Zhiguang Han
,
Zhihong Zhu
,
Jiaheng Zhang
,
Huaxiu Yao
Published: 04 Mar 2024, Last Modified: 14 Apr 2024
SeT LLM @ ICLR 2024
Readers:
Everyone
Sequence-Level Certainty Reduces Hallucination In Knowledge-Grounded Dialogue Generation
Yixin Wan
,
Fanyou Wu
,
Weijie Xu
,
Srinivasan H. Sengamedu
Published: 04 Mar 2024, Last Modified: 14 Apr 2024
SeT LLM @ ICLR 2024
Readers:
Everyone
Differentially Private Synthetic Data via Foundation Model APIs 2: Text
Chulin Xie
,
Zinan Lin
,
Arturs Backurs
,
Sivakanth Gopi
,
Da Yu
,
Huseyin A Inan
,
Harsha Nori
,
Haotian Jiang
,
Huishuai Zhang
,
Yin Tat Lee
,
Bo Li
,
Sergey Yekhanin
Published: 04 Mar 2024, Last Modified: 14 Apr 2024
SeT LLM @ ICLR 2024
Readers:
Everyone
MARS: Meaning-Aware Response Scoring for Uncertainty Estimation in Generative LLMs
Yavuz Faruk Bakman
,
Duygu Nur Yaldiz
,
Baturalp Buyukates
,
Chenyang Tao
,
Dimitrios Dimitriadis
,
Salman Avestimehr
Published: 04 Mar 2024, Last Modified: 14 Apr 2024
SeT LLM @ ICLR 2024
Readers:
Everyone
Quantitative Certification of Knowledge Comprehension in LLMs
Isha Chaudhary
,
Vedaant V Jain
,
Gagandeep Singh
Published: 04 Mar 2024, Last Modified: 14 Apr 2024
SeT LLM @ ICLR 2024
Readers:
Everyone
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
Aleksandar Makelov
,
Georg Lange
,
Neel Nanda
Published: 04 Mar 2024, Last Modified: 14 Apr 2024
SeT LLM @ ICLR 2024
Readers:
Everyone
Group Preference Optimization: Few-Shot Alignment of Large Language Models
Siyan Zhao
,
John Dang
,
Aditya Grover
Published: 04 Mar 2024, Last Modified: 14 Apr 2024
SeT LLM @ ICLR 2024
Readers:
Everyone
Source-Aware Training Enables Knowledge Attribution in Language Models
Muhammad Khalifa
,
David Wadden
,
Emma Strubell
,
Honglak Lee
,
Lu Wang
,
Iz Beltagy
,
Hao Peng
Published: 04 Mar 2024, Last Modified: 14 Apr 2024
SeT LLM @ ICLR 2024
Readers:
Everyone
Self-evaluation and self-prompting to improve the reliability of LLMs
Alexandre Piché
,
Aristides Milios
,
Dzmitry Bahdanau
,
Christopher Pal
Published: 04 Mar 2024, Last Modified: 14 Apr 2024
SeT LLM @ ICLR 2024
Readers:
Everyone
The Effect of Model Size on LLM Post-hoc Explainability via LIME
Henning Heyen
,
Amy Widdicombe
,
Noah Yamamoto Siegel
,
Philip Colin Treleaven
,
Maria Perez-Ortiz
Published: 04 Mar 2024, Last Modified: 14 Apr 2024
SeT LLM @ ICLR 2024
Readers:
Everyone
Attacks on Third-Party APIs of Large Language Models
Wanru Zhao
,
Vidit Khazanchi
,
Haodi Xing
,
Xuanli He
,
Qiongkai Xu
,
Nicholas Donald Lane
Published: 04 Mar 2024, Last Modified: 14 Apr 2024
SeT LLM @ ICLR 2024
Readers:
Everyone
Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations
ICLR 2024 Workshop SeT LLM Submission95 Authors
Published: 04 Mar 2024, Last Modified: 15 Apr 2024
SeT LLM @ ICLR 2024
Readers:
Everyone
Are Large Language Models Bayesian? A Martingale Perspective on In-Context Learning
Fabian Falck
,
Ziyu Wang
,
Christopher C. Holmes
Published: 04 Mar 2024, Last Modified: 14 Apr 2024
SeT LLM @ ICLR 2024
Readers:
Everyone
Backward Chaining Circuits in a Transformer Trained on a Symbolic Reasoning Task
Jannik Brinkmann
,
Abhay Sheshadri
,
Victor Levoso
,
Paul Swoboda
,
Christian Bartelt
Published: 04 Mar 2024, Last Modified: 14 Apr 2024
SeT LLM @ ICLR 2024
Readers:
Everyone
Single-pass detection of jailbreaking input in large language models
Leyla Naz Candogan
,
Yongtao Wu
,
Elias Abad Rocamora
,
Grigorios Chrysos
,
Volkan Cevher
Published: 04 Mar 2024, Last Modified: 14 Apr 2024
SeT LLM @ ICLR 2024
Readers:
Everyone
Is Your Jailbreaking Prompt Truly Effective for Large Language Models?
ICLR 2024 Workshop SeT LLM Submission90 Authors
Published: 04 Mar 2024, Last Modified: 19 Apr 2024
SeT LLM @ ICLR 2024
Readers:
Everyone
I'm not familiar with the name Harry Potter: Prompting Baselines for Unlearning in LLMs
Pratiksha Thaker
,
Yash Maurya
,
Virginia Smith
Published: 04 Mar 2024, Last Modified: 14 Apr 2024
SeT LLM @ ICLR 2024
Readers:
Everyone
Enhancing and Evaluating Logical Reasoning Abilities of Large Language Models
Shujie Deng
,
Honghua Dong
,
Xujie Si
Published: 04 Mar 2024, Last Modified: 14 Apr 2024
SeT LLM @ ICLR 2024
Readers:
Everyone
How Susceptible are Large Language Models to Ideological Manipulation?
Kai Chen
,
Zihao He
,
Jun Yan
,
Taiwei Shi
,
Kristina Lerman
Published: 04 Mar 2024, Last Modified: 14 Apr 2024
SeT LLM @ ICLR 2024
Readers:
Everyone
Bayesian reward models for LLM alignment
Adam X. Yang
,
Maxime Robeyns
,
Thomas Coste
,
Jun Wang
,
Haitham Bou Ammar
,
Laurence Aitchison
Published: 04 Mar 2024, Last Modified: 14 Apr 2024
SeT LLM @ ICLR 2024
Readers:
Everyone
Simple Permutations Can Fool LLaMA: Permutation Attack and Defense for Large Language Models
Liang CHEN
,
Yatao Bian
,
Li Shen
,
Kam-Fai Wong
Published: 04 Mar 2024, Last Modified: 14 Apr 2024
SeT LLM @ ICLR 2024
Readers:
Everyone
Character-level robustness should be revisited
Elias Abad Rocamora
,
Yongtao Wu
,
Fanghui Liu
,
Grigorios Chrysos
,
Volkan Cevher
Published: 04 Mar 2024, Last Modified: 14 Apr 2024
SeT LLM @ ICLR 2024
Readers:
Everyone
An Assessment of Model-on-Model Deception
Julius Heitkoetter
,
Michael Gerovitch
,
Laker Newhouse
Published: 04 Mar 2024, Last Modified: 14 Apr 2024
SeT LLM @ ICLR 2024
Readers:
Everyone
Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks
Samyak Jain
,
Robert Kirk
,
Ekdeep Singh Lubana
,
Robert P. Dick
,
Hidenori Tanaka
,
Edward Grefenstette
,
Tim Rocktäschel
,
David Krueger
Published: 04 Mar 2024, Last Modified: 14 Apr 2024
SeT LLM @ ICLR 2024
Readers:
Everyone
«
‹
1
2
3
›
»