Resolving Lexical Bias in Edit Scoping with Projector Editor Networks

Hammad Rizwan; Domenic Rosati; Ga Wu; Hassan Sajjad

Resolving Lexical Bias in Edit Scoping with Projector Editor Networks

Hammad Rizwan, Domenic Rosati, Ga Wu, Hassan Sajjad

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Representation Learning, Model Editing and LLM's

TL;DR: PENME is a model editing technique that overcomes limitations of distance-based scoping by using a projector network. It effectively handles lexical biases, improving performance while remaining efficient and adaptable.

Abstract: Weight-preserving large language model editing techniques rely heavily on the scoping mechanism that decides when to apply an edit to the base model. These scoping mechanisms utilize distance functions in the representation space. In this work, we show that distance-based scoping functions grapple with strong lexical biases leading to issues such as deciding that irrelevant prompts that share overlapping words should result in applying an edit. We address these problems by introducing Projector Editor Networks for Model Editing (PENME), a principled model editing approach designed to learn the optimal representation space for scoping via contrastive learning. We show that PENME achieves state of the art model editing results while being compute-efficient at inference time than previous methods and flexible enough to adapt across architectures

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 11599

Loading