CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature

ACL ARR 2025 July Submission90 Authors

22 Jul 2025 (modified: 03 Sept 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: A hallmark of human innovation is *recombination*---the creation of novel ideas by integrating elements from existing concepts and mechanisms. In this work, we introduce CHIMERA, a large-scale Knowledge Base (KB) of over 28K recombination examples automatically mined from the scientific literature. CHIMERA enables large-scale empirical analysis of how scientists recombine concepts and draw inspiration from different areas, and enables training models that propose novel, cross-disciplinary research directions. To construct this KB, we define a new information extraction task: identifying recombination instances in scientific abstracts. We curate a high-quality, expert-annotated dataset and use it to fine-tune a large language model, which we apply to a broad corpus of AI papers. We showcase the utility of CHIMERA through two applications. First, we analyze patterns of recombination across AI subfields. Second, we train a scientific hypothesis generation model using the KB, showing that it can propose novel research directions that researchers rate as inspiring. We release our data and code at https://anonymous.4open.science/r/CHIMERA-0510.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: knowledge graphs
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English
Previous URL: https://openreview.net/forum?id=EKHv05vmeG
Explanation Of Revisions PDF: pdf
Reassignment Request Area Chair: Yes, I want a different area chair for our submission
Reassignment Request Reviewers: Yes, I want a different set of reviewers
Justification For Not Keeping Action Editor Or Reviewers: We request reassignment of both reviewers and the meta-reviewer. Reviews had the following issues: shallow engagement with the paper, lack of demonstrated expertise (including factual misunderstandings of standard methodology and dismissing well-established practices), signs of AI-generated content, and a failure to acknowledge key clarifications and evidence provided in our rebuttal, including the qualifications of our annotators and the clearly stated motivation, novelty, and value of our dataset.
Software: zip
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: N/A
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: Section 3, Appendix B.2, Appendix E.2, Section 5
B2 Discuss The License For Artifacts: N/A
B3 Artifact Use Consistent With Intended Use: Yes
B3 Elaboration: Ethical Considerations
B4 Data Contains Personally Identifying Info Or Offensive Content: Yes
B4 Elaboration: We collect scientific data (the language is professional). We do not disclose annotators personal information, as discussed in our Ethics Statement.
B5 Documentation Of Artifacts: Yes
B5 Elaboration: Section 3, tables 2, 3 + text describing the knowledge base.
B6 Statistics For Data: Yes
B6 Elaboration: Section 3, Table 2, 3 and Section 5, Table 5
C Computational Experiments: Yes
C1 Model Size And Budget: Yes
C1 Elaboration: Appendix B.2 (extraction baselines implementation), Appendix E.2 (prediction baselines implementation)
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: Section 4.1 (experimental settings). Additional implementation details, such as hyper parameters, are in Appendix B.2, Appendix E.2. We also provide a highly documented code repository for reproducing our experiments - https://anonymous.4open.science/r/CHIMERA-0510
C3 Descriptive Statistics: Yes
C3 Elaboration: Section 4 (results), Section 4.2 (Extraction Results)
C4 Parameters For Packages: Yes
C4 Elaboration: We provide implementation details including specific parameters in Appendix B.2 and Appendix E.2. We also provide a highly documented code repository for reproducing our experiments - https://anonymous.4open.science/r/CHIMERA-0510
D Human Subjects Including Annotators: Yes
D1 Instructions Given To Participants: Yes
D1 Elaboration: Section 3.1 (recombination mining) references our annotations guidelines for the extraction task. We provide our user study guidelines in Appendix F.
D2 Recruitment And Payment: Yes
D2 Elaboration: We detail our recruitment process in Section 3.1. Annotators' payment is discussed in the Ethical Considerations section
D3 Data Consent: Yes
D3 Elaboration: Our Section 3.1 (recombination mining) references our annotations guidelines for the extraction task. We provide our user study guidelines in Appendix F. Both describe the intended use of the collected data.
D4 Ethics Review Board Approval: N/A
D5 Characteristics Of Annotators: Yes
D5 Elaboration: The annotators characteristics are described in Section 3.1. Section 5.1 describes the characteristics of the user-study participants.
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: Yes
E1 Elaboration: Ethical Considerations
Author Submission Checklist: yes
Submission Number: 90
Loading