Toggle navigation
OpenReview
.net
Login
×
Back to
NeurIPS
NeurIPS 2023 Workshop SoLaR Submissions
Welfare Diplomacy: Benchmarking Language Model Cooperation
Gabriel Mukobi
,
Hannah Erlebach
,
Niklas Lauffer
,
Lewis Hammond
,
Alan Chan
,
Jesse Clifton
Published: 23 Oct 2023, Last Modified: 28 Nov 2023
SoLaR Poster
Readers:
Everyone
Reinforcement Learning Fine-tuning of Language Models is Biased Towards More Extractable Features
Diogo Cruz
,
Edoardo Pona
,
Alex Holness-Tofts
,
Elias Schmied
,
Víctor Abia Alonso
,
Charlie Griffin
,
Bogdan-Ionut Cirstea
Published: 23 Oct 2023, Last Modified: 28 Nov 2023
SoLaR Poster
Readers:
Everyone
The Effect of Group Status on the Variability of Group Representations in LLM-generated Text
Messi Lee
,
Jacob Montgomery
,
Calvin Lai
Published: 23 Oct 2023, Last Modified: 28 Nov 2023
SoLaR Poster
Readers:
Everyone
Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models
Alan Chan
,
Benjamin Bucknall
,
Herbie Bradley
,
David Krueger
Published: 23 Oct 2023, Last Modified: 28 Nov 2023
SoLaR Spotlight
Readers:
Everyone
Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints
Chaoqi Wang
,
Yibo Jiang
,
Chenghao Yang
,
Han Liu
,
Yuxin Chen
Published: 23 Oct 2023, Last Modified: 28 Nov 2023
SoLaR Poster
Readers:
Everyone
Low-Resource Languages Jailbreak GPT-4
Zheng Xin Yong
,
Cristina Menghini
,
Stephen Bach
Published: 23 Oct 2023, Last Modified: 28 Nov 2023
SoLaR Spotlight
Readers:
Everyone
Are Large Language Models Really Robust to Word-Level Perturbations?
Haoyu Wang
,
Guozheng Ma
,
Cong Yu
,
Ning Gui
,
Linrui Zhang
,
Zhiqi Huang
,
Suwei Ma
,
Yongzhe Chang
,
Sen Zhang
,
Li Shen
,
Xueqian Wang
,
Peilin Zhao
,
Dacheng Tao
Published: 23 Oct 2023, Last Modified: 28 Nov 2023
SoLaR Poster
Readers:
Everyone
Weakly Supervised Detection of Hallucinations in LLM Activations
Miriam Rateike
,
Celia Cintas
,
John Wamburu
,
Tanya Akumu
,
Skyler Speakman
Published: 23 Oct 2023, Last Modified: 28 Nov 2023
SoLaR Poster
Readers:
Everyone
A Divide-Conquer-Reasoning Approach to Consistency Evaluation and Improvement in Blackbox Large Language Models
Wendi Cui
,
Jiaxin Zhang
,
Zhuohang Li
,
Damien Lopez
,
Kamalika Das
,
Bradley Malin
,
Sricharan Kumar
Published: 23 Oct 2023, Last Modified: 28 Nov 2023
SoLaR Poster
Readers:
Everyone
KoMultiText: Large-Scale Korean Text Dataset for Classifying Biased Speech in Real-World Online Services
Dasol Choi
,
Jooyoung Song
,
Eunsun Lee
,
Seo Jin woo
,
HeeJune Park
,
Dongbin Na
Published: 23 Oct 2023, Last Modified: 28 Nov 2023
SoLaR Poster
Readers:
Everyone
«
‹
1
2
3
›
»