NutriSync AI: Graph-Augmented Retrieval for Personalized Food-Safety Guidance Grounding LLM Recommendations in Biomedical Knowledge-Graph Evidence and Quantitative RAG Evaluation
Keywords: food safety; nutrition informatics; ingredient normalization; retrieval-augmented generation (RAG); knowledge graphs; FAISS; Neo4j; Comparative Toxicogenomics Database (CTD); Open Food Facts; RAGAS evaluation.
TL;DR: We propose a hybrid evidence pipeline for personalized food-safety decisions that couples high-recall vector retrieval with evidence-bearing biomedical knowledge-graph traversal, and we evaluate grounding quality using established RAG metrics.
Abstract: Dynamic lexical variability in ingredient labels and food product metadata (e.g., brand-specific naming, additive codes, scientific synonyms, and spelling variants) creates a persistent challenge for computational food-safety systems. For consumers managing chronic conditions (e.g., diabetes, irritable bowel syndrome, fatty liver disease) or allergies, this variability can obscure risk-relevant signals and hinder the generation of reliable, individualized guidance. Existing nutrition applications often provide coarse summaries (calories, macros) but struggle to connect ingredient mentions to mechanistic or evidence-backed health associations, and they rarely provide transparent provenance that users can audit.
NutriSync AI addresses this gap by combining semantic retrieval with explicit biomedical evidence paths. The system begins with mobile capture of a barcode or ingredient panel and applies optical character recognition (OCR) and named-entity recognition (NER) to extract, normalize, and canonicalize ingredient mentions. Normalization includes alias resolution (scientific names, E-numbers, common trade names), token-level cleanup, and mapping to canonical identifiers when possible. This produces a stable ingredient representation that can support downstream retrieval and reasoning.
Given a user profile (dietary constraints, allergies, and optionally condition-specific preferences), NutriSync AI executes a hybrid retrieval step. First, it performs vector similarity search using FAISS over a curated text corpus of ingredient and chemical descriptions to obtain high-recall context for ambiguous or novel ingredient variants. In parallel, it traverses a Neo4j biomedical knowledge graph that links ingredients to constituent chemicals and diseases via evidence-bearing chemical–disease associations from the Comparative Toxicogenomics Database (CTD). The graph traversal returns interpretable evidence trails (ingredient chemical disease), together with association metadata that can be surfaced in the final explanation.
The retrieved text passages and knowledge-graph evidence are fused in a graph-augmented retrieval-augmented generation (RAG) step. The generation component is constrained to produce machine-readable outputs (e.g., structured recommendation fields covering flagged ingredients, condition-relevant concerns, uncertainty notes, and safer alternatives). Importantly, the response is designed to remain grounded in retrieved passages and explicit graph evidence, enabling downstream auditing and error analysis. This structure supports not only end-user guidance but also systematic evaluation and iterative improvement.
Because food-safety guidance is health-adjacent and errors may have practical consequences, we treat evaluation as a first-class component. We propose measuring retrieval and response quality using RAGAS-style metrics, including context precision and recall, response relevance, and faithfulness to retrieved evidence. These metrics enable targeted improvements (e.g., better canonicalization rules, deeper or narrower graph traversal, improved retrieval corpora, and prompt constraints) and help reduce ungrounded generation.
We plan to extend NutriSync AI along three directions: (i) expanded synonym and ingredient-to-chemical mapping coverage using additional open resources, (ii) evidence weighting and thresholding strategies that reflect association strength and context, and (iii) agentic coordination (retrieval agent, biomedical evidence agent, and mediator agent) with user feedback loops for continuous refinement. Collectively, this work aims to provide a transparent, evidence-grounded pathway for personalized food-safety decisions at consumer scale.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 60
Loading