Trusted Convergence and Knowing What We Know Together: Privacy-Preserving Knowledge Discovery Across Neurodegenerative Disease Institutes

Published: 30 May 2026, Last Modified: 30 May 2026ICML2026-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: Track 1: Original Research/Position/Education/Attention Track
TL;DR: Giovanna is a privacy-preserving, domain-adapted retrieval-augmented generation system for neurodegenerative research that supports literature search and collaborator discovery.
Abstract: The rapid growth and disciplinary diversification of the neuroscience literature make it increasingly difficult for research institutes to identify where their expertise converges with that of peer institutions, a critical barrier to accelerating therapies for conditions such as Alzheimer's disease. Existing AI-assisted discovery tools rely on unconstrained, unverified web sources with no mechanism for secure, quality-controlled knowledge sharing. We present Giovanna, a domain-adapted retrieval-augmented generation (RAG) framework built on curated institutional corpora from Institute A and Institute B, designed to surface latent connections, support grounded hypothesis generation, and reveal cross-institutional research convergence within a trusted research environment. We contribute: (i) a neuroscience-specific embedding model fine-tuned on institutional corpora, achieving a recall@1 of 0.678 and MRR@20 of 0.780 superior to baseline; (ii) a privacy-preserving embedding-sharing approach that identifies shared research themes without exchanging raw text; and (iii) an empirical comparison of reasoning and non-reasoning language models across query-complexity tiers, showing RAG over trusted institutional knowledge is essential for complex queries, while selective generative fine-tuning yields gains only on domain-specific synthesis tasks. We release Giovanna as a lightweight application for faster insight discovery in neuroscience both within and across research institutions.
Keywords: large language models, privacy-preserving machine learning, neurodegenerative diseases, biomedical natural language processing, scientific knowledge discovery
Submission Number: 106
Loading