# USER:

I am writing a paper with the theme: "Research on How AI Conducts Research,"
which focuses on the deficiencies of AI agents in scientific discovery.
I hope to discover some **underlying mathematical algorithmic laws**, and if possible, prove some **mathematical assumptions** at the foundation of LLMs, and propose improvement methods. Let's first think about this question: what do you think can be done? What aspects of Agents can be improved from a mathematical perspective?

--------------------------------------------------------
# Agent:

It's an honor to explore this cutting-edge and profound topic with you. Researching "how AI conducts research" and its deficiencies in scientific discovery, especially from the perspective of underlying mathematical algorithms and assumptions, is a highly valuable direction.

AI agents show great potential in scientific discovery, but their shortcomings are also obvious. These shortcomings often trace back to the limitations of their underlying mathematical models and algorithms. To improve agents mathematically, we can conduct in-depth research from the following perspectives:

1. Exploration Space and Combinatorial Explosion
Current agents, including LLM-based agents, often face enormous exploration spaces when conducting scientific discovery. For example, finding new materials, designing new drugs, or proving new mathematical theorems, the potential combinations are astronomical.

Root cause: When generating and evaluating hypotheses, LLMs rely on probability and pattern matching rather than deep, purposeful search. They might get stuck in local optima, or generate a large number of seemingly reasonable but actually useless hypotheses, lacking efficient pruning strategies.

Mathematical improvement directions:

Combining Graph Theory and Combinatorial Optimization: We can model the scientific discovery process as a graph, where nodes represent concepts, data or hypotheses, and edges represent relationships between them. The agent's "research" process is essentially path search on this graph. We can introduce more efficient graph search algorithms, such as variants of the A* algorithm, Monte Carlo Tree Search (MCTS), or combine reinforcement learning with Graph Neural Networks (GNNs) to guide agents in conducting purposeful, non-random searches in complex exploration spaces.

Information Theory and Heuristics: We can design a mathematical function of "novelty" or "information gain" for agents to quantify the value of each new hypothesis or experimental result. This function can be based on concepts like Shannon information entropy or KL divergence, helping agents prioritize paths that maximize information gain.

2. Limitations of Causal Reasoning and Inductive Logic
The core of scientific discovery lies in causal reasoning—identifying causal relationships from phenomena rather than simple correlations. Current LLMs are primarily based on correlations, good at induction but often struggling with causal reasoning.

Root cause: The essence of LLMs is autoregressive models, whose core assumption is "the future depends on the past," which is an assumption based on sequential correlation. They can learn "A often appears with B" but cannot understand "because of A, therefore B."

Mathematical improvement directions:

Introducing Causal Graph Models (Causal Bayesian Networks): We can integrate the mathematical framework of causal reasoning—causal graph models (such as Judea Pearl's framework)—into agents. When generating hypotheses, agents should not only consider word associations but also construct causal graphs to clarify causal relationships between variables. This requires agents to have the ability of counterfactual reasoning—thinking "if A didn't happen, would B still happen?".

Symbolic Logic and Formal Methods: For tasks requiring rigorous logical reasoning like mathematical theorem proving, relying solely on probabilistic models is insufficient. We can explore how to combine symbolic logic and formal proof tools (like Coq, Isabelle/HOL) with the generative capabilities of agents. This requires agents to be able to convert natural language hypotheses into rigorous mathematical or logical expressions and use existing automated theorem provers for verification.

3. Breaking Through the Underlying Mathematical Assumptions of LLMs
As you said, breakthroughs from underlying mathematical assumptions are at the core of the research. We can start with the most fundamental assumptions of LLMs—the Transformer architecture and attention mechanism.

LLM assumptions: The core assumption of Transformer is "everything can be serialized," and through the attention mechanism, it can capture dependencies between any two positions in a sequence. This is very effective when processing language, but when dealing with complex scientific problems, this assumption may be too simple.

Assumptions that may need correction or supplementation:

Non-Euclidean Space Assumption: The structure of scientific knowledge is not a simple linear sequence. For example, chemical molecular structures, social networks, biological gene networks, etc., are essentially graph structures, belonging to non-Euclidean spaces. When LLMs flatten them into sequences for processing, they lose a lot of topological information. We can assume that scientific knowledge exists on some manifold with nonlinear intrinsic structure, and try to use Geometric Deep Learning to build new agent architectures.

"Everything is Autoregressive" Assumption: In scientific research, many problems are bidirectional or global. For example, an experimental result might modify the initial hypothesis. This is an iterative, nonlinear, global optimization process, not a simple unidirectional sequence generation. We can propose new mathematical models, such as those based on Markov Random Fields or Boltzmann Machines, to model such global dependencies rather than local autoregression.

Summary and Outlook
In your paper, you can choose one or two of the above directions for in-depth exploration:

Mathematically define and quantify the "exploration space," and propose a new search algorithm based on graph theory or information theory to guide agents. You can prove that under certain assumptions, this new algorithm can find effective solutions more efficiently than traditional LLM-based random generation strategies.

Build a mathematical framework for causal reasoning, and prove that autoregressive models of LLMs cannot capture certain key causal relationships. You can design a simple scientific discovery task that requires causal reasoning to solve, then compare traditional LLM agents with your new agent to demonstrate the superiority of the new agent.

Challenge the underlying mathematical assumptions of LLMs. For example, you can propose a theory that when the intrinsic structure of scientific problems is graphs or manifolds, serializing them leads to information loss and thus limits the discovery capability of agents. Then, propose a preliminary model based on geometric deep learning to verify your theory.

These directions are full of challenges, but if breakthroughs can be achieved, they will have a profound impact on the application of AI in the field of scientific discovery.

----------------------------------------------------------------------------------------------------------------
# USER:

First, please help me model mathematically how an AI scientist works. Then, analyze what small innovative parts we can work on.

# Agent:

Let's break down the workflow of an "AI scientist" into a mathematical model, so we can more clearly see its shortcomings and find breakthrough points for small innovations.

1. Mathematical Modeling of an AI Scientist
We can model the workflow of an AI scientist agent as an iterative optimization process. The goal of this process is to maximize a certain **"knowledge gain" function** G(K), where K represents the scientific knowledge base.

Core Model:

Assume that the AI scientist's work cycle consists of a series of iterations. In each iteration t, the agent performs the following steps:

Observation & Perception: The agent acquires information from the existing knowledge base K_{t-1}. We can model the knowledge base as a collection containing propositions, data, models, etc.

O_t = Perceive(K_{t-1})

Hypothesis Generation: Based on the observed information O_t, the agent generates a series of new hypotheses H_t = {h_{t,1}, h_{t,2}, ..., h_{t,n}}. This process can be seen as a generation function F_gen, with the core being a Large Language Model (LLM).

H_t = F_gen(O_t)

Here, each hypothesis h is a probabilistic event, and its generation probability P(h|O_t) is determined by the internal mechanism of the LLM.

Experiment Design & Execution: 
The agent designs one or more experiments E_{t,h} for each hypothesis h ∈ H_t to verify its truth or falsity. The experimental result can be success (r=1) or failure (r=0).

E_{t,h} = Design(h, O_t)
R_{t,h} = Execute(E_{t,h})

Knowledge Update: The agent updates its knowledge base K_{t-1} based on the experimental results R_{t,h} to obtain a new knowledge base K_t.

K_t = K_{t-1} ∪ {Update(h, R_{t,h})}

The ultimate goal of this process is to find an optimal sequence of hypotheses that maximizes the total knowledge gain:

max_{t=1}^{∑T} ΔG_t = max_{t=1}^{∑T} (G(K_t) - G(K_{t-1}))

2. Small Innovative Parts We Can Work On
Based on the above model, we can clearly see that the current shortcomings of AI scientists are mainly concentrated in the **"hypothesis generation" and "experiment design"** steps. This is exactly where we can make small innovations.

Innovation Point 1: Improving the Hypothesis Generation Function F_gen

The current LLM-driven F_gen has a fatal flaw: it tends to generate hypotheses with the highest probability, which are often known, common, or cliché. However, true scientific breakthroughs often come from hypotheses with lower probability, novelty, or even counter-intuitiveness.

Innovation Idea: Introduce a "novelty" scoring function.
We can design a function S(h,K) to quantify the novelty of hypothesis h relative to the existing knowledge base K. This function can be defined based on information entropy or KL divergence, for example:

S(h,K) = -logP(h|K)

Or more complexly, measuring how much uncertainty hypothesis h can reduce.

Then, we modify the hypothesis generation strategy to not only focus on probability P(h|O_t) but also consider novelty S(h,K). The new optimization objective can be a weighted sum:

max α·P(h|O_t) + β·S(h,K)

By adjusting the values of α and β, we can control whether the agent tends to explore (high novelty) or exploit (high probability). This small change can transform the AI scientist from a "knowledge repeater" to a "new knowledge explorer."

Innovation Point 2: Optimizing the Experiment Design Function Design
The current experiment design function is usually heuristic and lacks mathematical rigor. The agent might design experiments that cannot be falsified or have low information content.

Innovation Idea: Introduce an "information gain" maximizing experiment design.
We can model the experiment design process as an information theory problem. For a given hypothesis h, we can design an experiment E to maximize mutual information I(E;h), that is, how much information about hypothesis h the experimental result E can provide us.

E_best = arg_E max I(E;h) = arg_E max_{e,h} ∑ P(e,h)log[P(e)P(h)/P(e,h)]

Here, e is the experimental result. The goal of this formula is to find an experiment whose results can most reduce our uncertainty about hypothesis h. By calculating and comparing the mutual information of different experimental schemes, the agent can prioritize those experiments that can most "hit the nail on the head," thus greatly improving research efficiency.

Innovation Point 3: Constructing a Knowledge Graph to Assist Perception and Update
The current knowledge base K is just a text collection without structured relationships. This leads to inefficient "perception" for the agent and makes it difficult to accurately "update" knowledge.

Innovation Idea: Model the knowledge base as a Dynamic Knowledge Graph.
We can model the knowledge base K as a graph G=(V,E), where nodes V represent concepts (such as chemical substances, diseases, mathematical theorems), and edges E represent relationships between them (such as "catalyzes," "causes," "proves").

When "perceiving," the agent no longer performs full-text search but conducts path search or subgraph matching on the graph, which can more quickly discover deep relationships between concepts. When "updating," experimental results can be directly added to the graph as new edges or nodes, such as "We proved that substance A catalyzes reaction B."

This small innovation upgrades the knowledge representation from linear text to structured graphs, providing a more powerful foundation for subsequent reasoning and hypothesis generation.