Toggle navigation
OpenReview
.net
Login
×
Back to
NeurIPS
NeurIPS 2025 Workshop LLM Evaluation Submissions
Uncertainty Quantification for Language Models: Standardizing and Evaluating Black-Box, White-Box, LLM Judge, and Ensemble Scorers
Dylan Bouchard
,
Mohit Singh Chauhan
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
NeurIPS 2025 LLM Evaluation Workshop Poster
Readers:
Everyone
Scaling Laws for Upcycling Mixture-of-Experts Language Models
Seng Pei Liew
,
Takuya Kato
,
Sho Takase
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
NeurIPS 2025 LLM Evaluation Workshop Poster
Readers:
Everyone
Demystify the Potential of Large Language Models as General-Purpose Surrogate Code Executors
Bohan Lyu
,
Siqiao Huang
,
Zichen Liang
,
Wenjia Yang
,
Qian Sun
,
Jiaming Zhang
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
NeurIPS 2025 LLM Evaluation Workshop Poster
Readers:
Everyone
Retrieval Capabilities of Large Language Models Scale with Pretraining FLOPs
Jacob Portes
,
Connor Jennings
,
Erica Ji Yuen
,
Sasha Doubov
,
Michael Carbin
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
NeurIPS 2025 LLM Evaluation Workshop Poster
Readers:
Everyone
Search-Time Data Contamination
Ziwen Han
,
Meher Mankikar
,
Julian Michael
,
Zifan Wang
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
NeurIPS 2025 LLM Evaluation Workshop Poster
Readers:
Everyone
Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base
Linxin Song
,
Xuwei Ding
,
Jieyu Zhang
,
Taiwei Shi
,
Ryotaro Shimizu
,
Rahul Gupta
,
Yang Liu
,
Jian Kang
,
Jieyu Zhao
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
NeurIPS 2025 LLM Evaluation Workshop Poster
Readers:
Everyone
MindVote: When AI Meets the Wild West of Social Media Opinion
Xutao Mao
,
Ezra Xuanru Tao
,
Leyao Wang
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
NeurIPS 2025 LLM Evaluation Workshop Poster
Readers:
Everyone
CivicParse: A Benchmark and Pipeline for Structured Online Deliberation
Abhay Gupta
,
Mark Klein
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
NeurIPS 2025 LLM Evaluation Workshop Poster
Readers:
Everyone
Breaking Down Video LLM Benchmarks: Knowledge, Spatial Perception, or True Temporal Understanding?
Bo Feng
,
Zhengfeng Lai
,
Shiyu Li
,
Zizhen Wang
,
Xiaoming Simon Wang
,
Ping Huang
,
Meng Cao
Published: 24 Sept 2025, Last Modified: 16 Oct 2025
NeurIPS 2025 LLM Evaluation Workshop Poster
Readers:
Everyone
Mitigating Self-Preference by Authorship Obfuscation
Taslim Mahbub
,
Shi Feng
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
NeurIPS 2025 LLM Evaluation Workshop Poster
Readers:
Everyone
Evaluating AI Alignment Using Adapted Clinical Empathy Assessments
Cassandra Feilbach
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
NeurIPS 2025 LLM Evaluation Workshop Poster
Readers:
Everyone
SATBench: Benchmarking LLMs' Logical Reasoning via Automated Puzzle Generation from SAT Formulas
Anjiang Wei
,
Yuheng Wu
,
Yingjia Wan
,
Tarun Suresh
,
Huanmi Tan
,
Zhanke Zhou
,
Sanmi Koyejo
,
Ke Wang
,
Alex Aiken
Published: 24 Sept 2025, Last Modified: 24 Oct 2025
NeurIPS 2025 LLM Evaluation Workshop Poster
Readers:
Everyone
Context-Masked Meta-Prompting for Privacy-Preserving LLM Adaptation in Finance
Sayash Raaj Hiraou
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
NeurIPS 2025 LLM Evaluation Workshop Poster
Readers:
Everyone
Schema Lineage Extraction at Scale: Multilingual Pipelines, Composite Evaluation, and Language-Model Benchmarks
Jiaqi Yin
,
Yi-Wei Chen
,
MENG-LUNG LEE
,
Xiya Liu
Published: 24 Sept 2025, Last Modified: 18 Oct 2025
NeurIPS 2025 LLM Evaluation Workshop Poster
Readers:
Everyone
«
‹
1
2
3
4
5
6
7
8
›
»