MAMORX: Multi-agent Multi-Modal Scientific Review Generation with External Knowledge

Pawin Taechoyotin; Guanchao Wang; Tong Zeng; Bradley Sides; Daniel Acuna

MAMORX: Multi-agent Multi-Modal Scientific Review Generation with External Knowledge

Pawin Taechoyotin, Guanchao Wang, Tong Zeng, Bradley Sides, Daniel Acuna

Published: 11 Oct 2024, Last Modified: 12 Nov 2024Neurips 2024 Workshop FM4Science PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-agent systems, Multi-modal Foundation Models, Scientific review generation

TL;DR: MAMORX is AI system that improves scientific review quality by integrating multi-agent, multi-modal analysis with external knowledge sources.

Abstract: The deluge of scientific papers has made it challenging for researchers to throughly evaluate their own and others' ideas with regards to novelty and improvements. We propose *MAMORX*, an automated scientific review generation system that relies on multi-modal foundation models to address this challenge. *MAMORX* replicates key aspects of human review by integrating attention to text, figures, and citations, along with access to external knowledge sources. Compared to previous work, it takes advantage of large context windows to significantly reduce the number of agents and the processing time needed. The system relies on structured outputs and function calling to handle figures, evaluate novelty, and build general and domain-specific knowledge bases from external scholarly search systems. To test our method, we conducted an arena-style competition between several baselines and human reviews on diverse papers from general machine learning and NLP fields, calculating Elo ratings based on human preferences. *MAMORX* has a high win rate against human reviews and outperforms the next-best model, a multi-agent system. We share our system (the code for our system can be found at https://github.com/sciosci/mamorx-review-system and an example implementation is running at https://rev0.ai), and discuss further applications of foundation models for scientific evaluation.

Submission Number: 17

Loading