Toggle navigation
OpenReview
.net
Login
×
Back to
EMNLP
EMNLP 2024 Workshop BlackBoxNLP Submissions
Does Alignment Tuning Really Break LLMs’ Internal Confidence?
Hongseok Oh
,
Wonseok Hwang
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
How LLMs Reinforce Political Misinformation: Insights from the Analysis of False Presuppositions
Judith Sieker
,
Clara Lachenmaier
,
Sina Zarrieß
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Uncovering Syllable Constituents in the Self-Attention-Based Speech Representations of Whisper
Erfan A Shams
,
Iona Gessinger
,
Julie Carson-Berndsen
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
An Adversarial Example for Direct Logit Attribution: Memory Management in GELU-4L
Jett Janiak
,
Can Rager
,
James Dao
,
Yeu-Tong Lau
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
On the alignment of LM language generation and human language comprehension
Lena Sophia Bolliger
,
Patrick Haller
,
Lena Ann Jäger
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Transformers Learn Transition Dynamics when Trained to Predict Markov Decision Processes
Yuxi Chen
,
Suwei Ma
,
Tony Dear
,
Xu Chen
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Faithfulness and the Notion of Adversarial Sensitivity in NLP Explanations
Supriya Manna
,
Niladri Sett
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Compositional Cores: Persistent Attention Patterns in Compositionally Generalizing Subnetworks
Michael Y. Hu
,
Chuan Shi
,
Tal Linzen
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Exploring Alignment in Shared Cross-Lingual Spaces
Basel Mousi
,
Nadir Durrani
,
Fahim Dalvi
,
Majd Hawasly
,
Ahmed Abdelali
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Enhancing Question Answering on Charts Through Effective Pre-training Tasks
Ashim Gupta
,
Vivek Gupta
,
Shuo Zhang
,
Yujie He
,
Ning Zhang
,
Shalin Shah
Published: 21 Sept 2024, Last Modified: 22 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Attend First, Consolidate Later: On the Importance of Attention in Different LLM Layers
Amit Ben Artzy
,
Roy Schwartz
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Can We Statically Locate Knowledge in Large Language Models? Financial Domain and Toxicity Reduction Case Studies
Jordi Armengol-Estapé
,
Lingyu Li
,
Sebastian Gehrmann
,
Achintya Gopal
,
David S Rosenberg
,
Gideon S. Mann
,
Mark Dredze
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Latent Concept-based Explanation of NLP Models
Xuemin Yu
,
Fahim Dalvi
,
Nadir Durrani
,
Marzia Nouri
,
Hassan Sajjad
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
MultiContrievers: Analysis of Dense Retrieval Representations
Seraphina Goldfarb-Tarrant
,
Pedro Rodriguez
,
Jane Dwivedi-Yu
,
Patrick Lewis
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Implicit Meta-Learning in Small Transformer Models: Insights from a Toy Task
Luan Fletcher
,
Victor Levoso
,
Kunvar Thaman
,
Misha Kilianovski
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Enhancing adversarial robustness in Natural Language Inference using explanations
Alexandros Koulakos
,
Maria Lymperaiou
,
Giorgos Filandrianos
,
Giorgos Stamou
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
LLM Internal States Reveal Hallucination Risk Faced With a Query
Ziwei Ji
,
Delong Chen
,
Etsuko Ishii
,
Samuel Cahyawijaya
,
Yejin Bang
,
Bryan Wilie
,
Pascale Fung
Published: 21 Sept 2024, Last Modified: 23 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Language Models Linearly Represent Sentiment
Oskar John Hollinsworth
,
Curt Tigges
,
Atticus Geiger
,
Neel Nanda
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Inducing Induction in Llama via Linear Probe Interventions
Sheridan Feucht
,
Byron C Wallace
,
David Bau
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Learning, Forgetting, Remembering: Insights From Tracking LLM Memorization During Training
Danny D. Leybzon
,
Corentin Kervadec
Published: 21 Sept 2024, Last Modified: 11 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals
Francesco Ortu
,
Zhijing Jin
,
Diego Doimo
,
Mrinmaya Sachan
,
Alberto Cazzaniga
,
Bernhard Schölkopf
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Are there identifiable structural parts in the sentence embedding whole?
Vivi Nastase
,
Paola Merlo
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Routing in Sparsely-gated Language Models responds to Context
Stefan Arnold
,
Marian Fietta
,
Dilara Yesilbas
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Optimal and efficient text counterfactuals using Graph Neural Networks
Dimitris Lymperopoulos
,
Maria Lymperaiou
,
Giorgos Filandrianos
,
Giorgos Stamou
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Fifty shapes of BLiMP: syntactic learning curves in language models are not uniform, but sometimes unruly
Bastian Bunzeck
,
Sina Zarrieß
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
«
‹
1
2
›
»