Team11-Vid2Insight

Team11-Vid2Insight

Indian Institute of Science Summer 2025 DA225o Submission8 Authors

07 Jun 2025 (modified: 24 Jun 2025)Indian Institute of Science Summer 2025 DA225o SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multimodal AI, Video Content Analysis, Knowledge Base Construction, Large Language Models (LLMs), Gemini, LangGraph, Agentic AI, Automated Documentation, Personalized Learning, Educational Technology, Streamlit, Natural Language Processing.

TL;DR: Developing a system to ingest video (and audio), using Google's Gemini to build a rich knowledge base & employing two specialized subgraphs: one for corporate product documentation & for educational materials along with a chat interface(streamlit).

Abstract: The exponential growth of video content across diverse domains presents both an opportunity and a challenge for efficient knowledge extraction and utilization. This project proposes a novel framework to harness the rich information embedded in video data by developing an intelligent system capable of transforming raw video inputs into a structured knowledge base, which then fuels specialized content generation agents. The core of this system involves leveraging the advanced multimodal understanding capabilities of Google's Gemini model. Gemini will be employed to perform deep analysis of video frames and, where available, accompanying audio tracks, to extract key entities, concepts, semantic relationships, and contextual information. This extracted data will populate a comprehensive and dynamic knowledge base, serving as the single source of truth for subsequent applications. Building upon this knowledge base, the project will develop two distinct, specialized intelligent agents using LangGraph, a library for building stateful, multi-actor applications with Large Language Models (LLMs). Corporate Documentation Agent: This agent will be engineered to query the knowledge base and generate a variety of product documentation tailored for corporate use cases. Potential outputs include technical specifications, user manuals, API documentation, feature summaries, and troubleshooting guides, thereby streamlining documentation workflows. Educational Content Agent: This agent will focus on transforming video-derived knowledge into valuable educational resources. It will be capable of generating student-centric materials such as concise study notes, summaries, multiple-choice questions (MCQs) for assessment, flashcards, and other personalized learning aids to enhance comprehension and engagement for students. To facilitate intuitive user interaction with the system and its agents, a conversational interface will be developed using Streamlit. This will allow users to input videos, interact with the knowledge base, and request specific content generation tasks from the specialized agents. This research aims to contribute to the fields of multimodal AI, automated knowledge engineering, and applied agentic systems. The project will explore the efficacy of state-of-the-art models like Gemini for deep video understanding and the practical application of LangGraph for creating robust, task-oriented AI agents. The anticipated outcome is a versatile platform demonstrating a seamless pipeline from video input to actionable, context-aware content generation for both corporate and educational settings.

Submission Number: 8

Loading