\documentclass{article}


% if you need to pass options to natbib, use, e.g.:
%     \PassOptionsToPackage{numbers, compress}{natbib}
% before loading maeb_2025


% ready for submission
\usepackage[final]{maeb_2025}
\usepackage{subfig}
\usepackage{xcolor}
\usepackage{listings}
\definecolor{codegray}{gray}{0.9}
\definecolor{keywordcolor}{rgb}{0.75,0.2,0.2}
\definecolor{stringcolor}{rgb}{0.2,0.6,0.2}

\lstset{
    backgroundcolor=\color{codegray}, % Light gray background
    basicstyle=\ttfamily\small,       % Monospaced font
    keywordstyle=\color{keywordcolor}\bfseries, % Keywords in bold red
    stringstyle=\color{stringcolor},  % Strings in green
    commentstyle=\color{blue},        % Comments in blue
    frame=single,                     % Adds a frame around the code
    rulecolor=\color{black},          % Frame border color
    breaklines=true,                  % Automatic line breaking
    captionpos=b,                      % Caption below the listing
    tabsize=4,                         % Tab width
    showspaces=false,                  % Do not show spaces
    showstringspaces=false,            % Do not show string spaces
}
% to compile a preprint version, e.g., for submission to arXiv, add add the
% [preprint] option:
%     \usepackage[preprint]{maeb_2025}


% to compile a camera-ready version, add the [final] option, e.g.:
%\usepackage[final]{maeb_2025}


% to avoid loading the natbib package, add option nonatbib:
%    \usepackage[nonatbib]{maeb_2025}


\usepackage[utf8]{inputenc} % allow utf-8 input
\usepackage[T1]{fontenc}    % use 8-bit T1 fonts
\usepackage{hyperref}       % hyperlinks
\usepackage{url}            % simple URL typesetting
\usepackage{booktabs}       % professional-quality tables
\usepackage{amsfonts}       % blackboard math symbols
\usepackage{nicefrac}       % compact symbols for 1/2, etc.
\usepackage{microtype}      % microtypography
\usepackage{xcolor}         % colors
\usepackage{algorithm}
\usepackage{algpseudocode}
\usepackage{amsmath}
\usepackage{graphicx}

\renewcommand{\sectionautorefname}{Section}
\renewcommand{\subsectionautorefname}{Section}
\renewcommand{\subsubsectionautorefname}{Section}

%Balancing Quality and Efficiency in AI Image Generation with Multi-Objective Evolutionary Algorithms
\title{DeepStableYolo: DeepSeek-Driven Prompt Engineering and Search-based Optimization for AI Image Generation}


% The \author macro works with any number of authors. There are two commands
% used to separate the names and addresses of multiple authors: \And and \AND.
%
% Using \And between authors leaves it to LaTeX to determine where to break the
% lines. Using \AND forces a line break at that point. So, if LaTeX puts 3 of 4
% authors names on the first line, and the last on the second line, try using
% \AND instead of \And before the third author name.


\author{%
  Héctor D. Menéndez \\%\thanks{Use footnote for providing further information
    %about author (webpage, alternative address)---\emph{not} for acknowledging
    %funding agencies.} \\
  Department of Informatics \\
  King's College London \\
  London, United Kingdom \\
  \texttt{hector.menendez@kcl.ac.uk} \\
  % examples of more authors
  \And
   Gema Bello-Orgaz,  Cristian Ramirez-Atencia\\
   ETSI de Sistemas Inform\'{a}ticos \\
   Universidad Polit\'{e}cnica de Madrid  \\
   Departamento de Sistemas Inform\'{a}ticos, Madrid, Spain \\
  \texttt{\{gema.borgaz,cristian.ramirez\}@upm.es} 
  % \AND
  % Coauthor \\
  % Affiliation \\
  % Address \\
  % \texttt{email} \\
  % \And
  % Coauthor \\
  % Affiliation \\
  % Address \\
  % \texttt{email} \\
  % \And
  % Coauthor \\
  % Affiliation \\
  % Address \\
  % \texttt{email} \\
}


\begin{document}


\maketitle


\begin{abstract}


AI-driven image generation heavily relies on effective prompt engineering and precise tuning of model parameters. The StableYolo framework addressed these challenges by integrating evolutionary computation with Stable Diffusion, enabling simultaneous optimization of both prompts and model parameters while using YOLO as a guiding metric to enhance image quality. In this work, we extend the capabilities of StableYolo by introducing mechanisms for prompt improvement through large language models (LLMs), aiming to maximize image generation quality. We incorporate DeepSeek to enhance prompt engineering, ensuring more effective and context-aware prompt generation. However, our refined approach demonstrates that enhancing prompts does not yield significant improvements in either the efficiency or quality of AI-generated images, suggesting that clear and concise prompts are equally effective in the process.


  % The abstract paragraph should be indented \nicefrac{1}{2}~inch (3~picas) on
  % both the left- and right-hand margins. Use 10~point type, with a vertical
  % spacing (leading) of 11~points.  The word \textbf{Abstract} must be centered,
  % bold, and in point size 12. Two line spaces precede the abstract. The abstract
  % must be limited to one paragraph.
\end{abstract}



\section{Introduction}

AI-based image generation has achieved remarkable progress, with models like Stable Diffusion at the forefront  \cite{rombach2022high}. However, generating high-quality images still depends heavily on two critical aspects: effective prompt engineering and precise tuning of model parameters. Both tasks require significant manual effort and expertise, presenting ongoing challenges for researchers and users alike  \cite{wang2023review}. The StableYolo framework \cite{berger2023stableyolo} addressed these challenges by integrating evolutionary computation with Stable Diffusion, simultaneously optimizing prompts and model parameters while using YOLO as a guiding metric to enhance image quality  \cite{redmon2016you}.

%Building on this foundation, we propose an extension to StableYolo that introduces a multi-objective evolutionary algorithm (MOGA) based on the SPEA2 (Strength Pareto Evolutionary Algorithm 2) algorithm \cite{zitzler2001spea2}. This extension aims to optimize two key objectives: minimizing inference steps (to reduce computational cost) and maximizing image quality (using YOLO as the evaluation metric). By transitioning from a single-objective genetic algorithm (GA) to a multi-objective approach, we enable a more balanced exploration of the trade-offs between computational efficiency and output quality \cite{deb2002fast}.

Building on this foundation, we propose an extension to StableYolo that enhances the optimization process by integrating DeepSeek  \cite{liu2024deepseek}, a state-of-the-art large language model (LLM), for the generation and refinement of both the prompt and the negative prompts. DeepSeek's advanced natural language reflective capabilities enable the automatic generation of context-aware and semantically rich prompts, significantly reducing the need for manual intervention  \cite{brown2020language}. Meanwhile, StableYolo's search process focuses on optimizing the remaining parameters of Stable Diffusion, such as guidance scale, inference steps, and other hyperparameters, ensuring a comprehensive and automated tuning process  \cite{rombach2022high}.


This refined approach aims to enhance the efficiency and quality of AI-generated images by improving the exploration of the search space through optimized prompt descriptions. By combining DeepSeek's prompt-enhancing capabilities with the optimization power of search algorithms, we aim to evaluate whether original prompts can be improved and how this impacts the final quality of the generated images. However, during our experiments, our results showed that enhancing prompts with DeepSeek does not yield any noticeable improvements in StableYolo's search process. The main contributions of this extended work are:
\begin{enumerate}
     \item DeepStableYolo, which integrates DeepSeek into the prompt engineering process of StableYolo for enhanced generation of positive and negative prompts.\footnote{The code (\url{https://github.com/hdg7/stableyolo}) and data (\url{https://zenodo.org/records/14933760}) are openly available.}
     %\item The direct application of
     \item A comprehensive analysis to evaluate whether enriching prompts with LLMs can expand the search space and reveal unexplored areas in the image generation process, while also addressing the associated limitations and trade-offs.
     \item The identification of the most frequently used words in prompts that enhance image quality, offering insights into how prompt structure influences model performance.
\end{enumerate}

\section{Background}

Recent advancements in artificial intelligence have led to the development of powerful image generation models capable of creating high-quality images from textual descriptions, revolutionizing fields such as graphic design, advertising, and entertainment \cite{ramesh2021zero,rombach2022high}. However, achieving optimal results still depends on two critical factors: effective prompt engineering and precise parameter tuning. Both require significant expertise and manual effort \cite{beyan2023review}.

To enhance text-based image generation, various prompt engineering techniques have been explored. Large language models (LLMs) like GPT-3 \cite{brown2020language} and DeepSeek \cite{lu2024deepseek} have been utilized to generate more coherent and contextually relevant prompts, thereby improving image generation outcomes. While these advancements are promising, integrating LLMs into image generation workflows presents challenges related to scalability and computational efficiency \cite{hagos2024recent}.

A promising approach to improving AI-driven image generation involves combining genetic algorithms with LLMs to optimize both prompts and model parameters. For example, the Multimodal LLM Adapter (MoMA) framework \cite{song2024moma} introduces a multimodal LLM adapter for personalized image generation, synergizing reference images and text prompts to produce high-quality images. Additionally, iterative prompting techniques using multimodal LLMs have been employed to reproduce both natural and AI-generated images, demonstrating the potential of integrating evolutionary algorithms with LLMs for prompt engineering \cite{naseh2024iteratively}. These approaches aim to automate and enhance the image generation process while balancing factors such as image quality and computational efficiency.

Models like DeepSeek \cite{liu2024deepseek} have demonstrated advanced natural language capabilities, making them well-suited for automating and improving the prompt engineering process. Their integration into image generation workflows not only reduces manual intervention but also enables the creation of semantically richer and contextually relevant prompts. Furthermore, combining StableYolo \cite{berger2023stableyolo} with these advanced LLMs offers a significant advantage by optimizing multiple aspects of the image generation process simultaneously. By balancing objectives such as image quality, computational efficiency, and semantic relevance, this combination should, in principle, provide a more refined and automated workflow. However, this is not always the case, as optimization does not necessarily require additional boosting to create suitable prompts for generating high-quality images, as we will see in \autoref{sec:results}.


\section{Methodology}
With a user-provided prompt for generating photo-realistic images, this work aims to enhance image quality through evolutionary search. Each individual in the population is represented as a dictionary object that incorporates parameters based on StableDiffusion's documentation \cite{stablediffusion}. The structure of each individual includes the following key parameters:
 
 \begin{itemize}
     \item \textbf{Number of Iterations}: The number of diffusion steps required for the AI to process the image (range: [1, 100]). 
     
     \item \textbf{Classifier-Free Guidance Scale (CFG)}: A parameter that controls the influence of the prompt on the generated image. Higher values increase the prompt's influence but may also reduce image quality if set too high (range: [1, 20]). 
     
     \item \textbf{Seed}: The generation seed used for randomization. It ensures consistent image generation when using the same seed or variation with different seeds. It is included in the search to guarantee consistency between genotypes and fenotypes.
     
     \item \textbf{Guidance Rescale}: A parameter that prevents overexposure by rescaling the guidance factor (range: [0, 1]). 
     
     \item \textbf{Positive Prompt}: The text or set of words describing the desired image and its details, aimed at enhancing the realism of the generated image. 
     
     \item \textbf{Negative Prompt}: A sequence of keywords to be excluded during image generation to reduce non-realistic components in the final output.
 \end{itemize}

DeepSeek is employed to refine both the positive and negative prompts, producing more contextually relevant and semantically enriched text that enhances the realism of the generated images. This approach extends the original StableYolo engine, which solely employs a search process to identify suitable prompts \cite{berger2023stableyolo}. The prompt used to enhance the positive and negative prompts for StableDiffusion is as follows:  


\begin{lstlisting}[caption=Enhanced Prompt]
" The following prompt aims to generate a good quality photograph from StableDiffussion. It is a < positive | negative > prompt. Please provide an enhanced version as a string starting with ```txt and ending with ```. The text is: "
\end{lstlisting}

The rest of the prompt consists of a list created by StableYolo after selecting the prompt keywords. An example of an individual in the population could be represented as:  
 
\begin{verbatim}
{
    'iterations': 50,
    'cfg_scale': 15,
    'seed': 12345,
    'guidance_rescale': 0.5,
    'positive_prompt': 'a person, photograph, digital, color, blended visuals',
    'negative_prompt': 'illustration, painting, drawing, art'
}
\end{verbatim}


As presented in Algorithm \ref{algo:DSY}, the image generation process follows an evolutionary framework, where the population aims to optimize image quality using YOLO as the evaluation metric. Initially, the algorithm creates the population by setting parameters at random, defining a user-specified prompt goal (e.g., ``a person''), and selecting prompt variations from a fixed list of topics. Once these variations are chosen, DeepSeek is employed for prompt enrichment each time a new individual is generated. The population then evolves through a genetic algorithm that applies extended crossover and mutation operators.

The crossover operator facilitates the exchange of values between two individuals by selecting which values to swap uniformly at random. Mutation modifies these values within their allowable ranges. Specifically, for the prompts, mutation selects alternate word sets from the predefined vocabulary available for both positive and negative prompts.
 
%%%Revisar la estructura del algoritmo cuando este desarrollado la extensión
\begin{algorithm}
\caption{DeepStableYolo algorithm}
\label{algo:DSY}
\begin{algorithmic}[1]
\Require Population size $N$, number of generations $G$, crossover rate $p_c$, mutation rate $p_m$, the original user prompt $prompt$
\Ensure Optimised model parameters and prompts variations
\State $P \gets \text{CreateRandomPopulation}(N)$ \Comment{Initialize population}
\State $Fitness \gets \text{YOLO\_score}(P)$ \Comment{Evaluate YOLO score}
\For{$t = 1$ to $G$} \Comment{Main loop}
    \State $parents \gets \text{SelectionTournament}(N, P)$ \Comment{Select parents}
    \State $offspring \gets \emptyset$ \Comment{Initialize offspring set}
    \ForAll{$p_1, p_2 \in parents$} \Comment{Generate offspring}
        \State $child \gets \text{CrossoverAndMutation}(p_1, p_2, p_c, p_m)$
        \State $child.prompts \gets \text{DeepSeekEnhace}(child.prompts)$ \Comment{LLM-based prompt improvement}
        \State $offspring \gets offspring \cup \{child\}$
    \EndFor
    \State $Fitness \gets \text{YOLO\_score}(offspring)$ \Comment{Evaluate YOLO score for offspring}
    \State $P \gets \text{Replacement}(P, offspring)$ \Comment{Replace population using $\mu + \lambda$}
\EndFor
\State \Return Population $P$ \Comment{Return final solutions}
\end{algorithmic}
\end{algorithm}




% \begin{algorithm}
% \caption{DeepStableYolo algorithm}
% \label{algo:DSY}
% \begin{algorithmic}[1]
% \Require Population size $N$, number of generations $G$, crossover rate $p_c$, mutation rate $p_m$, the original user prompt $prompt$
% \Ensure Optimised model parameters and prompts variations
% \State $P \gets \text{CreateRandomPopulation}(N)$ \Comment{Initialize population}
% \State $Fitness \gets \text{YOLO\_score}(P)$ \Comment{Evaluate YOLO score}
% \State $F_2 \gets \text{Inference\_Steps}(P)$ \Comment{Evaluate inference steps}
% \State $A_t \gets \text{POF}(P, F_1, F_2)$ \Comment{Compute Pareto-optimal front}
% \For{$t = 1$ to $G$} \Comment{Main loop}
%     \State $parents \gets \text{SelectionTournament}(N, A_t)$ \Comment{Select parents}
%     \State $offspring \gets \emptyset$ \Comment{Initialize offspring set}
%     \ForAll{$p_1, p_2 \in parents$} \Comment{Generate offspring}
%         \State $child \gets \text{CrossoverAndMutation}(p_1, p_2, p_c, p_m)$
%         \State $offspring \gets offspring \cup \{child\}$
%     \EndFor
%     \State $F_1 \gets \text{YOLO\_score}(offspring)$ \Comment{Evaluate YOLO score for offspring}
%     \State $F_2 \gets \text{Inference\_Steps}(offspring)$ \Comment{Evaluate inference steps for offspring}
%     \State $A_t \gets \text{POF}(P \cup offspring, F_1, F_2)$ \Comment{Update Pareto-optimal front}
%     \State $P \gets \text{Replacement}(P, offspring, A_t)$ \Comment{Replace population}
% \EndFor
% \State \Return Non-dominated solutions from $P$ \Comment{Return final Pareto-optimal solutions}
% \end{algorithmic}
% \end{algorithm}


The YOLO\_score, utilized as part of the fitness evaluation process, is calculated through four distinct steps:
\begin{enumerate}
    \item \textbf{Prompt Generation and Configuration}: StableYolo generates both positive and negative prompts. These prompts are then used to configure the parameters of Stable Diffusion, resulting in the production of four images per prompt.
    \item \textbf{Prompt Enhancement for Photorealism}: DeepSeek enhances these generated prompts with the objective of producing photorealistic images.
    \item \textbf{Object Detection and Confidence Scoring}: YOLO processes each image, detecting objects within them. For each detection, a confidence score is assigned, which serves as an indicator of the image's quality.
    \item \textbf{Final Score Calculation}: The final value is determined by averaging these confidence scores across all detected objects and generated images.
\end{enumerate}


\section{Experimental Setup}

To evaluate the new approach for the StableYolo framework thoroughly, we utilized 10 distinct categories of objects, animals, and people recognized by YOLO (concretely: banana, bear, bird, cat, dog, elephant, giraffe, train, person, and zebra). These categories were selected to investigate how prompts influence achieving optimal image quality. To gauge improvements, we compared the baseline prompt with enhanced (DeepStableYolo) and StableYolo's prompts. Our experiments aimed to address the following research questions:


\textbf{RQ1}: Do StableYolo and DeepStableYolo enhance baseline results?

\textbf{RQ2}: What are the most common words in the prompts, and what improvements do they bring compared to DeepStableYolo?

\textbf{RQ3}: How do images differ between StableYolo and DeepStableYolo?

\textbf{RQ4}: Do prompts reduce the need for other parameters, such as inference steps?

The genetic algorithm (GA) settings are outlined in Table \ref{tab:gaconfiguration}. Each experiment was repeated four times to ensure reliability. All experiments were conducted on a workstation with Ubuntu 20.04 LTS, equipped with 40 CPU cores, 128 GB RAM, and an NVIDIA A30 GPU featuring 24 GB memory.


% \textbf{RQ1}: \textit{How does the multi-objective MOGA-based approach improve image quality compared to StableYolo across various object categories?}

% \textbf{RQ2}: \textit{Does the new MOGA-based approach effectively balance computational efficiency and image quality?}

% \textbf{RQ3}: \textit{How does the integration of DeepSeek for prompt engineering affect the quality and efficiency?}


%%%Revisar que parametros se utilizan
\begin{table}
  \caption{Genetic parameters of DeepStableYolo algorithm, following the original parameters of StableYolo \cite{berger2023stableyolo}}
  \label{tab:gaconfiguration}
  \centering
  \begin{tabular}{lll}
    \toprule

    Name     & Description     & Value ($\mu$m) \\
    \midrule
    N & Population size  & 25     \\
    G     & Maximum number of generations  & 50      \\
    $\mu$     & Number of individuals to select for the next generation      & 5  \\
    $p_c$     & Crossover probability      & 0.2  \\
    $p_m$     & Mutation probability      & 0.2  \\
    \bottomrule
  \end{tabular}
\end{table}

\section{Results}
\label{sec:results}


Our aim is to investigate whether enhanced prompts can improve image generation outcomes using StableYolo. To accomplish this, we integrated DeepSeek into our prompt enhancement process. DeepSeek enriches the text of the prompt by extending its length and encouraging broader exploration, enabling us to uncover areas of the search space that were previously unexplored with StableYolo's limited prompts. However, this exploration expands the search space in unforeseen ways, providing minimal control over the prompt beyond a baseline level. With this objective in mind, we present the experimental results conducted to address each research question outlined in this study.


\subsection{RQ1: Comparative Results}

To address our first research question, we compare StableYolo and DeepStableYolo. For a fair comparison, both techniques utilize identical parameters as those used in the original evaluation of StableYolo \cite{berger2023stableyolo}. Additionally, we establish a baseline by presenting results from random parameterizations adhering to the same guidelines.

\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{images/results}
    \caption{Comparitive Results among StableYolo, DeepStableYolo and the baseline values from random parametrizations.}
    \label{fig:results}
\end{figure}


\autoref{fig:results} demonstrates that both techniques enhance the baseline performance. DeepStableYolo achieves an average fitness population of 0.889 $\pm$ 0.055 across the 10 different cases, while StableYolo attains an average fitness population of 0.887 $\pm$ 0.043. The baseline, however, yields a significantly lower average fitness of 0.512 $\pm$ 0.097.
 
The results vary depending on the image category. For instance, the `banana` category produces the poorest outcomes (approximately 0.72 for StableYolo, 0.77 for DeepStableYolo, and 0.4 for the baseline), whereas the `bear` category shows the best performance (0.93 for DeepStableYolo and 0.95 for StableYolo). The remaining categories exhibit stable results around 0.9 for both techniques. In contrast, the baseline results are more variable, with image quality scores ranging from 0.7 to 0.35.
 
Interestingly, no significant improvements in image generation were observed between StableYolo and DeepStableYolo. Applying the Wilcoxon test \cite{cuzick1985wilcoxon} to their image quality scores resulted in a p-value of 0.90, which is considerably higher than the 0.05 threshold required to reject the null hypothesis. Thus, we cannot consider these results significantly different.
Regarding convergence points, StableYolo converges on average at generation 26.3 $\pm$ 15.09, while DeepStableYolo converges at 30.1 $\pm$ 12.26. The Wilcoxon test revealed no statistically significant difference between their convergence points (p-value = 0.65). These findings suggest that enhancing prompts does not significantly influence either the search quality or the convergence performance of the approaches. Furthermore, despite increasing the search space by enhancing prompts, the existing search space for image generation is already sufficient to yield highly competitive results.


\subsection{RQ2: Prompt Comparison}

DeepSeek uses the prompt created by StableYolo to develop an enhanced prompt. This enhancement may introduce specific terms that can improve image quality, which were not considered in StableYolo's design. Additionally, it improves prompt text quality by ensuring the LLM generates coherent prompts and applies enhancements based on its knowledge --similarly to Yao et al.'s work \cite{yao2024fabrication}.
 
The average positive prompt length is 45.0 $\pm$ 21.7 for DeepStableYolo and 14.86 $\pm$ 2.45 for StableYolo. For negative prompts, the averages are 21.6 $\pm$ 15.5 for DeepStableYolo and 4.4 $\pm$ 1.1 for StableYolo. The prompt lengths differ significantly between both approaches (positive and negative prompts) with a p-value of $2.30 \times 10^{-15}$, which is less than 0.05.

To analyze the most frequent words in each prompt, we examined the top ten terms for every approach and category. Figure \ref{fig:DSYpromtps} illustrates the frequencies of these terms in their positive and negative prompts. The results show some similarities between the two approaches.
 
The most frequent word in the positive prompts is `photograph' for both DeepStableYolo and StableYolo. In StableYolo's case, other common terms include `depth', `field', `100mm', and `color'. For DeepStableYolo, while `depth' and `field' are also common, the term `vibrant' emerges as frequent and is not present in StableYolo's prompts.
 
For negative prompts, both approaches commonly use the term `illustration'. However, StableYolo frequently includes it in nearly every case. DeepStableYolo's related terms include `art', `low', and `blurry', whereas StableYolo uses terms like `drawing', `painting', and `cropped,' which are not closely tied to `illustration.'


\begin{figure}
    \centering
    \includegraphics[width=0.75\linewidth]{images/DeepStableYolowords.pdf}
  \includegraphics[width=0.75\linewidth]{images/StableYolowords.pdf}
    \caption{Most frequent words for DeepStableYolo (top) and StableYolo (bottom) according to the top 10 solutions for each category search. The green bars represent the positive prompt while the red ones represent the negative prompts.}
    \label{fig:DSYpromtps}
    \label{fig:SYpromtps}
\end{figure}

\subsection{RQ3: Image Comparison} 

The final part of the evaluation involves a comparison of the top images across each category. From a quality perspective, there is no clear distinction in image quality between the two techniques. However, some interesting aspects emerge when examining DeepStableYolo's approach. For instance, as shown in Figure \ref{fig:imagencomparision}, the `banana' generated by DeepStableYolo appears mature, and the `elephant' image is rendered in grayscale. Additionally, the `bird' image captures its reflection in the water.

When comparing the `bear' and `zebra' images, both renderings are equally plausible for their respective prompts and would fit well in a photorealistic context. The `train' depictions also show similarities: DeepStableYolo presents a modern version, while StableYolo opts for a classical approach. Notably, both images remain quite similar overall. The last comparison involves the image of a `person'. DeepStableYolo generates an older individual with distinct facial expressions, whereas StableYolo's output features a younger person depicted from a distance. DeepStableYolo's rendition is significantly more detailed in this case.

Examining the prompts used by DeepStableYolo for these images (see \autoref{lst:banana}, \autoref{lst:bird}, \autoref{lst:person}, and \autoref{lst:zebra}), we observe variations from the original prompts created by StableYolo. Although the language model enhances narrative depth, this does not significantly impact the overall quality of the images. The most notable difference is in the prompt for the person's image, where DeepStableYolo specifies "detailed facial expressions," directing the generation process to focus on the individual's face.

\begin{lstlisting}[caption=DeepStableYolo prompt for the best image of `banana'., label={lst:banana}]
Positive: a vibrant banana laid on a rustic wooden table under soft golden light, with the warm glow casting long shadows across the surface, creating a lively and inviting scene
Negative: a highly stylized detailed illustration that is distorted, cartoonish, overly exaggerated, poorly drawn, pixelated, low-quality, blurry, with a lot of artifacts or errors in the image layout, cropped in error mode, and lacks sharpness.
\end{lstlisting}

\begin{lstlisting}[caption=DeepStableYolo prompt for the best image of `bird'., label={lst:bird}]
Positive: bird, photograph, color, Kodak portra 800, Depth of field 100mm, overlapping compositions, blended visuals
Negative: high quality; illustration; drawing; art; vector art; lowres; error; cropped; sharp focus; detailed; clean; clear; high contrast; well-composed; artistic style; minimal distractions; precise details; natural lighting; modern aesthetic; elegant; professional artwork; fine details; intricate design; refined; polished
\end{lstlisting}

\begin{lstlisting}[caption=DeepStableYolo prompt for the best image of `person'., label={lst:person}]
Positive: a vibrant color palette with warm tones, person, photograph, Kodak portra 800 film, depth of field 100mm, soft lighting, detailed facial expressions, cinematic blur, overlapping compositions with smooth transitions between layers, ethereal quality, and a sense of depth and detail in the shadows
Negative: txt:illustration,drawing,art,fake,synthetic,blurry,cropped,error,negative image,lowres,highly unrealistic,bad quality
\end{lstlisting}


\begin{lstlisting}[caption=DeepStableYolo prompt for the best image of `zebra'., label={lst:zebra}]
Positive: zebra, photograph, Ultra Real, Depth of field 100mm, trending on artstation, award winning
Negative: High-quality photograph illustration, painting style with sketches, realistic artistic vision
\end{lstlisting}


\begin{figure}
    \centering
    \subfloat[Prompts Enhanced with DeepSeek.]{
        \includegraphics[width=0.24\linewidth]{images/photos/MAEBBanana.png}
        \includegraphics[width=0.24\linewidth]{images/photos/MAEBBird.png}
        \includegraphics[width=0.24\linewidth]{images/photos/MAEBCat.png}
        \includegraphics[width=0.24\linewidth]{images/photos/MAEBElephant.png}

    }\\
    \subfloat[Original StableYolo Prompts.]{
        \includegraphics[width=0.24\linewidth]{images/photos/SSBSEBanana.png}
        \includegraphics[width=0.24\linewidth]{images/photos/SSBSEBird.png}
        \includegraphics[width=0.24\linewidth]{images/photos/SSBSECat.png}
        \includegraphics[width=0.24\linewidth]{images/photos/SSBSEElephant.png}
    }\\
\subfloat[Prompts Enhanced with DeepSeek.]{
        \includegraphics[width=0.24\linewidth]{images/photos/MAEBBear.png}
        \includegraphics[width=0.24\linewidth]{images/photos/MAEBZebra.png}
        \includegraphics[width=0.24\linewidth]{images/photos/MAEBTrain.png}
        \includegraphics[width=0.24\linewidth]{images/photos/MAEBPerson.png}
    }\\
  \subfloat[Original StableYolo Prompts.]{       \includegraphics[width=0.24\linewidth]{images/photos/SSBSEBear.png}
        \includegraphics[width=0.24\linewidth]{images/photos/SSBSEZebra.png}
        \includegraphics[width=0.24\linewidth]{images/photos/SSBSETrain.png}
        \includegraphics[width=0.24\linewidth]{images/photos/SSBSEPerson.png}
    }
    \caption{Comparison of image differences between StableYolo and DeepStableYolo.}
    \label{fig:imagencomparision}
\end{figure}

\subsection{RQ4: Parameters Comparison}


The remaining parameters generated during the search processes of DeepStableYolo and StableYolo also exhibited no significant differences.  

The first parameter is `Inference Steps'. For DeepStableYolo, the mean value was 50.5 $\pm$ 22.95, while for StableYolo, it was 47.2 $\pm$ 25.54. After applying the Wilcoxon test to both distributions, the p-value was 0.92, showing no statistical difference between them.  

For the `Guidance Scale', the values were also similar. For DeepStableYolo, the mean value was 10.7 $\pm$ 2.47, and for StableYolo, it was 9.5 $\pm$ 2.18. The p-value for the Wilcoxon test was 0.32, indicating no statistically significant difference between the two distributions.  

For the `Guidance Rescale', the average values were 0.43 $\pm$ 0.21 for DeepStableYolo and 0.68 $\pm$ 0.17 for StableYolo. After applying the Wilcoxon test, the p-value was 0.04, indicating a statistically significant difference between the two groups. However, this parameter does not affect generation performance in practice, similar to how `Inference Steps' or the `Guidance Scale' influence the process.


\section{Discussion and Conclusions}
In general terms, the study demonstrates that while LLM-enhanced prompts can enrich the narrative and expand the search space for image generation, they do not significantly improve image quality compared to methods like StableYolo where promps are chosen manually.

However, the findings of this study have significant implications for the field of prompt engineering and image generation. While models such as DeepSeek can enhance prompts and expand the search space, this does not necessarily result in improved image quality, suggesting that the effectiveness of prompt enhancement is context-dependent and requires further investigation. Moreover, the similarity in word frequency between StableYolo and DeepStableYolo indicates that certain core elements, such as "photograph" and "depth," are essential for high-quality image generation, highlighting the potential for their optimization in future research. Additionally, although the creative modifications introduced by DeepStableYolo enhance visual appeal, they do not lead to measurable improvements in image quality, raising important questions about the trade-offs between creativity and technical performance in generative models.

Based on the findings of this study, several avenues for future research can be identified. Further exploration of alternative prompt enhancement techniques could be valuable, including domain-specific fine-tuning of LLMs or integrating user feedback into the prompt generation process. Additionally, while DeepStableYolo produces longer prompts, the relationship between prompt length and image quality remains uncertain, warranting further investigation into whether an optimal length exists for different image generation tasks. Moreover, as this study focused on a specific set of image categories, future research could extend the evaluation to a wider range of domains to assess the broader applicability of these findings.

\section*{Acknowledgments}
The support of the UKRI Trustworthy Autonomous Systems Hub (reference EP/V00784X/1), Trustworthy Autonomous Systems Node in Verifiability (reference EP/V026801/2) and the Comunidad Aut\'{o}noma de Madrid under ALENTAR-J-CM project (reference TEC-2024/COM-224) is gratefully acknowledged. 


\bibliographystyle{plain}
\bibliography{bibliography}

% References follow the acknowledgments in the camera-ready paper. Use unnumbered first-level heading for
% the references. Any choice of citation style is acceptable as long as you are
% consistent. It is permissible to reduce the font size to \verb+small+ (9 point)
% when listing the references.
% Note that the Reference section does not count towards the page limit.
% \medskip


% {
% \small


% [1] Alexander, J.A.\ \& Mozer, M.C.\ (1995) Template-based algorithms for
% connectionist rule extraction. In G.\ Tesauro, D.S.\ Touretzky and T.K.\ Leen
% (eds.), {\it Advances in Neural Information Processing Systems 7},
% pp.\ 609--616. Cambridge, MA: MIT Press.


% [2] Bower, J.M.\ \& Beeman, D.\ (1995) {\it The Book of GENESIS: Exploring
%   Realistic Neural Models with the GEneral NEural SImulation System.}  New York:
% TELOS/Springer--Verlag.


% [3] Hasselmo, M.E., Schnell, E.\ \& Barkai, E.\ (1995) Dynamics of learning and
% recall at excitatory recurrent synapses and cholinergic modulation in rat
% hippocampal region CA3. {\it Journal of Neuroscience} {\bf 15}(7):5249-5262.
% }


% \subsection{Style}


% Papers to be submitted to MAEB 2025 must be prepared according to the
% instructions presented here. Papers may only be up to {\bf nine} pages long,
% including figures. Additional pages \emph{containing only acknowledgments and
% references} are allowed. Papers that exceed the page limit will not be
% reviewed, or in any other way considered for presentation at the conference.


% Authors are required to use the MAEB \LaTeX{} style files obtainable at the
% MAEB website as indicated below. Please make sure you use the current files
% and not previous versions. Tweaking the style files may be grounds for
% rejection.


% \subsection{Retrieval of style files}


% The style files for MAEB and other conference information are available on
% the website at
% \begin{center}
%   \url{http://www.maeb2025.org/}
% \end{center}
% The file \verb+maeb_2025.pdf+ contains these instructions and illustrates the
% various formatting requirements your MAEB paper must satisfy.


% The only supported style file for MAEB 2025 is \verb+maeb_2025.sty+,
% rewritten for \LaTeXe{}.  \textbf{Previous style files for \LaTeX{} 2.09,
%   Microsoft Word, and RTF are no longer supported!}


% The \LaTeX{} style file contains three optional arguments: \verb+final+, which
% creates a camera-ready copy, \verb+preprint+, which creates a preprint for
% submission to, e.g., arXiv, and \verb+nonatbib+, which will not load the
% \verb+natbib+ package for you in case of package clash.


% \paragraph{Preprint option}
% If you wish to post a preprint of your work online, e.g., on arXiv, using the
% MAEB style, please use the \verb+preprint+ option. This will create a
% nonanonymized version of your work with the text ``Preprint. Work in progress.''
% in the footer. This version may be distributed as you see fit, as long as you do not say which conference it was submitted to. Please \textbf{do
%   not} use the \verb+final+ option, which should \textbf{only} be used for
% papers accepted to MAEB.


% At submission time, please omit the \verb+final+ and \verb+preprint+
% options. This will anonymize your submission and add line numbers to aid
% review. Please do \emph{not} refer to these line numbers in your paper as they
% will be removed during generation of camera-ready copies.


% The file \verb+maeb_2025.tex+ may be used as a ``shell'' for writing your
% paper. All you have to do is replace the author, title, abstract, and text of
% the paper with your own.


% The formatting instructions contained in these style files are summarized in
% Sections \ref{gen_inst}, \ref{headings}, and \ref{others} below.


% \section{General formatting instructions}
% \label{gen_inst}


% The text must be confined within a rectangle 5.5~inches (33~picas) wide and
% 9~inches (54~picas) long. The left margin is 1.5~inch (9~picas).  Use 10~point
% type with a vertical spacing (leading) of 11~points.  Times New Roman is the
% preferred typeface throughout, and will be selected for you by default.
% Paragraphs are separated by \nicefrac{1}{2}~line space (5.5 points), with no
% indentation.


% The paper title should be 17~point, initial caps/lower case, bold, centered
% between two horizontal rules. The top rule should be 4~points thick and the
% bottom rule should be 1~point thick. Allow \nicefrac{1}{4}~inch space above and
% below the title to rules. All pages should start at 1~inch (6~picas) from the
% top of the page.


% For the final version, authors' names are set in boldface, and each name is
% centered above the corresponding address. The lead author's name is to be listed
% first (left-most), and the co-authors' names (if different address) are set to
% follow. If there is only one co-author, list both author and co-author side by
% side.


% Please pay special attention to the instructions in Section \ref{others}
% regarding figures, tables, acknowledgments, and references.


% \section{Headings: first level}
% \label{headings}


% All headings should be lower case (except for first word and proper nouns),
% flush left, and bold.


% First-level headings should be in 12-point type.


% \subsection{Headings: second level}


% Second-level headings should be in 10-point type.


% \subsubsection{Headings: third level}


% Third-level headings should be in 10-point type.


% \paragraph{Paragraphs}


% There is also a \verb+\paragraph+ command available, which sets the heading in
% bold, flush left, and inline with the text, with the heading followed by 1\,em
% of space.


% \section{Citations, figures, tables, references}
% \label{others}


% These instructions apply to everyone.


% \subsection{Citations within the text}


% The \verb+natbib+ package will be loaded for you by default.  Citations may be
% author/year or numeric, as long as you maintain internal consistency.  As to the
% format of the references themselves, any style is acceptable as long as it is
% used consistently.


% The documentation for \verb+natbib+ may be found at
% \begin{center}
%   \url{http://mirrors.ctan.org/macros/latex/contrib/natbib/natnotes.pdf}
% \end{center}
% Of note is the command \verb+\citet+, which produces citations appropriate for
% use in inline text.  For example,
% \begin{verbatim}
%    \citet{hasselmo} investigated\dots
% \end{verbatim}
% produces
% \begin{quote}
%   Hasselmo, et al.\ (1995) investigated\dots
% \end{quote}


% If you wish to load the \verb+natbib+ package with options, you may add the
% following before loading the \verb+maeb_2025+ package:
% \begin{verbatim}
%    \PassOptionsToPackage{options}{natbib}
% \end{verbatim}


% If \verb+natbib+ clashes with another package you load, you can add the optional
% argument \verb+nonatbib+ when loading the style file:
% \begin{verbatim}
%    \usepackage[nonatbib]{maeb_2025}
% \end{verbatim}


% As submission is double blind, refer to your own published work in the third
% person. That is, use ``In the previous work of Jones et al.\ [4],'' not ``In our
% previous work [4].'' If you cite your other papers that are not widely available
% (e.g., a journal paper under review), use anonymous author names in the
% citation, e.g., an author of the form ``A.\ Anonymous'' and include a copy of the anonymized paper in the supplementary material.


% \subsection{Footnotes}


% Footnotes should be used sparingly.  If you do require a footnote, indicate
% footnotes with a number\footnote{Sample of the first footnote.} in the
% text. Place the footnotes at the bottom of the page on which they appear.
% Precede the footnote with a horizontal rule of 2~inches (12~picas).


% Note that footnotes are properly typeset \emph{after} punctuation
% marks.\footnote{As in this example.}


% \subsection{Figures}


% \begin{figure}
%   \centering
%   \fbox{\rule[-.5cm]{0cm}{4cm} \rule[-.5cm]{4cm}{0cm}}
%   \caption{Sample figure caption.}
% \end{figure}


% All artwork must be neat, clean, and legible. Lines should be dark enough for
% purposes of reproduction. The figure number and caption always appear after the
% figure. Place one line space before the figure caption and one line space after
% the figure. The figure caption should be lower case (except for first word and
% proper nouns); figures are numbered consecutively.


% You may use color figures.  However, it is best for the figure captions and the
% paper body to be legible if the paper is printed in either black/white or in
% color.


% \subsection{Tables}


% All tables must be centered, neat, clean and legible.  The table number and
% title always appear before the table.  See Table~\ref{sample-table}.


% Place one line space before the table title, one line space after the
% table title, and one line space after the table. The table title must
% be lower case (except for first word and proper nouns); tables are
% numbered consecutively.


% Note that publication-quality tables \emph{do not contain vertical rules.} We
% strongly suggest the use of the \verb+booktabs+ package, which allows for
% typesetting high-quality, professional tables:
% \begin{center}
%   \url{https://www.ctan.org/pkg/booktabs}
% \end{center}
% This package was used to typeset Table~\ref{sample-table}.


% \begin{table}
%   \caption{Sample table title}
%   \label{sample-table}
%   \centering
%   \begin{tabular}{lll}
%     \toprule
%     \multicolumn{2}{c}{Part}                   \\
%     \cmidrule(r){1-2}
%     Name     & Description     & Size ($\mu$m) \\
%     \midrule
%     Dendrite & Input terminal  & $\sim$100     \\
%     Axon     & Output terminal & $\sim$10      \\
%     Soma     & Cell body       & up to $10^6$  \\
%     \bottomrule
%   \end{tabular}
% \end{table}

% \subsection{Math}
% Note that display math in bare TeX commands will not create correct line numbers for submission. Please use LaTeX (or AMSTeX) commands for unnumbered display math. (You really shouldn't be using \$\$ anyway; see \url{https://tex.stackexchange.com/questions/503/why-is-preferable-to} and \url{https://tex.stackexchange.com/questions/40492/what-are-the-differences-between-align-equation-and-displaymath} for more information.)

% \subsection{Final instructions}

% Do not change any aspects of the formatting parameters in the style files.  In
% particular, do not modify the width or length of the rectangle the text should
% fit into, and do not change font sizes (except perhaps in the
% \textbf{References} section; see below). Please note that pages should be
% numbered.


% \section{Preparing PDF files}


% Please prepare submission files with paper size ``US Letter,'' and not, for
% example, ``A4.''


% \begin{itemize}


% \item You should directly generate PDF files using \verb+pdflatex+.


% \item You can check which fonts a PDF files uses.  In Acrobat Reader, select the
%   menu Files$>$Document Properties$>$Fonts and select Show All Fonts. You can
%   also use the program \verb+pdffonts+ which comes with \verb+xpdf+ and is
%   available out-of-the-box on most Linux machines.


% \item \verb+xfig+ "patterned" shapes are implemented with bitmap fonts.  Use
%   "solid" shapes instead.


% \item The \verb+\bbold+ package almost always uses bitmap fonts.  You should use
%   the equivalent AMS Fonts:
% \begin{verbatim}
%    \usepackage{amsfonts}
% \end{verbatim}
% followed by, e.g., \verb+\mathbb{R}+, \verb+\mathbb{N}+, or \verb+\mathbb{C}+
% for $\mathbb{R}$, $\mathbb{N}$ or $\mathbb{C}$.  You can also use the following
% workaround for reals, natural and complex:
% \begin{verbatim}
%    \newcommand{\RR}{I\!\!R} %real numbers
%    \newcommand{\Nat}{I\!\!N} %natural numbers
%    \newcommand{\CC}{I\!\!\!\!C} %complex numbers
% \end{verbatim}
% Note that \verb+amsfonts+ is automatically loaded by the \verb+amssymb+ package.


% \end{itemize}


% If your file contains type 3 fonts or non embedded TrueType fonts, we will ask
% you to fix it.


% \subsection{Margins in \LaTeX{}}


% Most of the margin problems come from figures positioned by hand using
% \verb+\special+ or other commands. We suggest using the command
% \verb+\includegraphics+ from the \verb+graphicx+ package. Always specify the
% figure width as a multiple of the line width as in the example below:
% \begin{verbatim}
%    \usepackage[pdftex]{graphicx} ...
%    \includegraphics[width=0.8\linewidth]{myfile.pdf}
% \end{verbatim}
% See Section 4.4 in the graphics bundle documentation
% (\url{http://mirrors.ctan.org/macros/latex/required/graphics/grfguide.pdf})


% A number of width problems arise when \LaTeX{} cannot properly hyphenate a
% line. Please give LaTeX hyphenation hints using the \verb+\-+ command when
% necessary.

% \begin{ack}
% Use unnumbered first level headings for the acknowledgments. All acknowledgments
% go at the end of the paper before the list of references. Moreover, you are required to declare
% funding (financial activities supporting the submitted work) and competing interests (related financial activities outside the submitted work).


% Do {\bf not} include this section in the anonymized submission, only in the final paper. You can use the \texttt{ack} environment provided in the style file to automatically hide this section in the anonymized submission.
% \end{ack}


\end{document}