% This is samplepaper.tex, a sample chapter demonstrating the
% LLNCS macro package for Springer Computer Science proceedings;
% Version 2.21 of 2022/01/12
%
\documentclass[runningheads]{llncs}

\usepackage[T1]{fontenc}


\usepackage{graphicx}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{bbm}
\usepackage{graphicx}
\usepackage[colorlinks=true, linkcolor=blue, citecolor=blue]{hyperref}
\usepackage{booktabs}
\usepackage{bm}

\usepackage{algorithm}
\usepackage{algpseudocode}
\usepackage{comment}

\usepackage{amsfonts}

\usepackage{xcolor}

\usepackage{tikz}
\usepackage{amsmath}

\usepackage{array}
\usepackage{lipsum}
\usepackage{orcidlink}





\begin{document}

\title{GRAM: Graph Regularizable Assessment Metric}




\author{Mariem Touihri\textsuperscript{1}\orcidlink{0009-0005-2283-0237}  \and Ahmed Nebli\textsuperscript{2}\orcidlink{0000-0003-4565-4502} } 
%index{Touihri, Mariem}
%index{Nebli, Ahmed}



\authorrunning{M. Touihri, A. Nebli}


\institute{Higher School of Digital Economy (ESEN), University of Manouba, Tunisia \\
\email{mariem.touihri@esen.tn} \and 
Independant researcher \\
\email{mr.ahmednebli@gmail.com}}


\maketitle              

\begin{abstract}



Here, we propose the Graph Regularizable Assessment Metric ($GRAM$), a customizable tool for evaluating the quality of generated brain graphs. Current geometric deep learning methods often lack robust quantification techniques for assessing the synthetic brain graphs integrity. $GRAM$ addresses this gap by proportionally combining a set of existing graph metrics to establish a linear correlation between distortions' levels and metric values of ground-truth graphs. To evaluate the performance of our model, we generated a synthetic dataset of structural brain connectomes which was derived from an existing dataset and used to simulate a set of predicted connectomes from a generative model with controlled levels of distortions. Our results show that $GRAM$ outperforms single metrics in quantifying the distortion between generated and original graphs. This approach is a significant step towards establishing a universal graph quality index for graph-based predictive studies.

\keywords{Predicted brain graphs  \and Quality metrics \and Customized metrics.}
\end{abstract}



\section{Introduction}

Brain connectomes are crucial for exploring the connectivity patterns underlying cognitive processes \cite{sporns2004small}. These connectomes provide a framework for predicting the progression of neurodegenerative diseases by integrating connectomic analyses with established neuroscientific knowledge \cite{bullmore2009complex}. For instance,  \cite{lee2019,nebli2020} introduced geometric deep learning approaches to forecast Alzheimer's disease progression using brain connectome data. Despite their potential, obtaining connectomic data poses significant challenges. One major impediment is the extensive processing required for neuroimages acquired through modalities such as Magnetic Resonance Imaging (MRI). This process is both time-consuming and computationally intensive. Another challenge is the limited availability of sufficient MRI data, which can hinder comprehensive analyses. 


One approach to address the challenge of limited MRI data involves the use of generative models to produce synthetic neuroimages. Generative Adversarial Networks (GANs) \cite{goodfellow2020generative} have shown significant capability in generating realistic brain scans. For instance, \cite{nie2018} proposed a GAN model based on a fully convolutional network and an Auto-Context Model to enhance the realism and accuracy of synthetic images. Similarly,  \cite{shin2018} developed a GAN model that produces high-quality, realistic images that simulate the ground-truth brain images- to improve the performance of diagnostic models in medical diagnostics. Despite the potential benefits, using GAN-generated MRI data to study brain connectomes introduces two key challenges. First, the generated MRI data must be indistinguishable from real data both quantitatively and qualitatively. Second, the synthetic data requires additional processing to extract connectivity matrices. 


To address the above-mentioned issues, \cite{flat-net} proposed predicting brain connectivity matrices using a graph GAN-based approach. The authors created representative templates from clustered brain graphs to train models that predict the evolution of connectivities for a given brain disease over time.  Their novel few-shot learning framework uses minimal training data and employs clustering and Connectional Brain Templates (CBTs) to handle the diversity within brain connectomic data. This ensures robust model training despite limited data. However, unlike images, brain connectomes are virtually impossible to evaluate qualitatively. Instead, quantitative metrics (e.g., centrality measures \cite{freeman2002centrality},  Average Neighbor Degree \cite{yao2017average} and Diversity Index \cite{shannonformula}) are, \emph{thus far}, a single way to evaluate the quality of the generated graphs. 

In this paper, we highlight the limitations of existing metrics for graph quality assessment and propose a novel universal customizable metric to quantify the quality of generated graphs with an application to a simulated prediction of brain connectivities based on an existing dataset. In particular, we propose Graph Regularizable Assessment Metric ($GRAM$), a customizable framework designed to learn to proportionally combine a set of existing graph metrics in order to evaluate the generated graph's quality. Drawing inspiration from the universal image quality index by \cite{wang2002}, $GRAM$ could be considered a first step towards a more universal graph quality index.  Our contributions are listed as follows:

\begin{enumerate}
    \item We propose a new general assumption for quantitatively interpreting the quality of a generated graph based on the linearity between the amount of distortion and the value of the reported metric.
    \item We propose a novel general metric based on the weighted combination of existing metrics.
    \item Our proposed metric is adjustable depending on the type of graph as well as the chosen metrics to report.
\end{enumerate}



 


\section{Methods}



In this section, we present in detail the proposed metric $GRAM$ for quantifying the quality of generated graphs. 



\begin{figure}[h]
\centering
\includegraphics[width=\linewidth]{final_fig.pdf}
\caption{\label{fig:method}\textit{Pipeline of the proposed $GRAM$ metric for assessing the quality of directed weighted graphs.} \textbf{(1) Simulate graph distortion.} For an input graph $G$ and for a distortion level $d$ we alter weights of randomly selected edges by random generated values in $[w_\text{min}, w_\text{max}]$ producing $m$ distorted graphs. \textbf{(2) Compute different correlations.} First we apply the single metrics to the ground truth graph and the distorted one then calculate the Pearson, Spearman and Kendall's Tau correlations between them.  Second for each graph and each correlation coefficient, we generate a matrix $A$ of size $(p\times q)$   containing the correlation values organized by distortion levels vertically and single metrics horizontally. \textbf{(3) Optimize weights by minimizing surface loss.}  We train $GRAM$ using an MLP to optimize the metrics' weights $\alpha_j$ forming the vector $B(q)$ by minimizing the loss between the predicted surface created by the MLP output $A \times B$ and the reference surface created by the vector $C(p)$ across $n$ surfaces.}
\end{figure}


\subsection{Simulation of Generated Brain Graphs}


Let \( G = (V, E) \) be a directed weighted graph,  \( V \) denotes the vertices, and \( E \) denotes the weighted edges given by \( w: e \rightarrow \mathbb{R} \). Let $\hat{G}$ be the simulation of the output of a given generative model $F$ aiming to predict a target brain connectome $G$ such that \(\hat{G} \approx G\). The goal of the simulation is to bypass the problem of finding the optimal $F$ to train as well as to control the amount of distortion \( d \) between \(\hat{G}\) and \( G \), where $d \in ]0, 1]$ with $s$ defined as distortion step.

The distortion level between \(\hat{G}\) and \( G \) is measured by the number of edges \( |E| \) with differing weights. Specifically, for an edge $e \in E$ with weight \( w \) in \( G \) and \( \hat{w} \) in \(\hat{G}\), the distortion is defined as the proportion of edges for which \( w \neq \hat{w} \), regardless of the magnitude of the difference \( \lvert w - \hat{w} \rvert \). {The objective is to detect any distortion in the generated graph, treating any alteration in edge weights as significant.}


As shown in Fig. \ref{fig:method}, we define a set of $m$ distorted graphs, each characterized by a distortion level $d$ denoted as  $\mathcal{\hat{G}}_d  = \{ \hat{G}_1, \dots, \hat{G}_m \}$. The process of generating a suite of distorted graphs across all distortion levels is detailed in Algorithm \ref{algorithm1}, resulting in the set $\mathcal{\hat{G}}  = \{ \mathcal{\hat{G}}_{s}, \dots, \mathcal{\hat{G}}_1 \}$. Initially, we set a predefined distortion step (e,g., 0.1). For each increment of the distortion level \( d \), we randomly select \( |\hat{E}| \) edges, where \( |\hat{E}| = d \times (|E| - |V|) \). Here, \( |E| \) is the total number of edges in the graph, and \( |V| \) is the total number of vertices. The term \( |E| - |V| \) represents the total number of non-diagonal edges, as diagonal edges (self-loops) are excluded. Therefore, by subtracting \( |V| \) from \( |E| \), we ensure that we only consider non-diagonal edges. At each selected edge \(e\), we replace its weight with a randomly generated value within the range \([w_{\text{min}}, w_{\text{max}}]\), where \( w_{\text{min}} \) and \( w_{\text{max}} \) are the minimum and maximum weight values in the graph \( G \), respectively. We repeat this process $m$ times to ensure that all the edges are distorted at least once.  


\begin{algorithm}
\caption{Generate Distorted Graphs \label{algorithm1}}
\begin{algorithmic}[1]
\Require Directed weighted graph $G = (V, E)$, distortion step $s$, number of iterations $m$
\Ensure Set of distorted graphs $\mathcal{\hat{G}}$
\State Initialize $\mathcal{\hat{G}} \leftarrow \emptyset$
\State $w_{\text{min}} \leftarrow \min\{ w(e) \mid e \in E \}$
\State $w_{\text{max}} \leftarrow \max\{ w(e) \mid e \in E \}$

\For{$d$ in $D$ with step $s$}
    \State Initialize $\mathcal{\hat{G}}_d \leftarrow \emptyset$
    \State $|\hat{E}| \leftarrow d \times (|E| - |V|)$
    
    \For{$i$ from $1$ to $m$}
        \State $\hat{G} \leftarrow G$
        \State Select $|\hat{E}|$ random edges from $E$
        
        \For{each selected edge $e$}
            \State $\hat{w}(e) \leftarrow \text{random}(w_{\text{min}}, w_{\text{max}})$
            \State Update edge weight in $\hat{G}$ to $\hat{w}(e)$
        \EndFor
        
        \State Add $\hat{G}$ to $\mathcal{\hat{G}}_d$
    \EndFor
    
    \State Add $\mathcal{\hat{G}}_d$ to $\mathcal{\hat{G}}$
\EndFor

\Return $\mathcal{\hat{G}}$
\end{algorithmic}
\end{algorithm}






\subsection{Graph Reliability Assessment Metric (GRAM)}



\textbf{Assumption 1\label{assumption1}:} Let $G = (V, E)$ represent a brain graph, where $V$ denotes vertices and $E$ denotes edges, with $w: e \rightarrow \mathbb{R}$ representing the weights of the edges in $E$. For a metric $\mathcal{M}$ that assesses graph quality, we postulate that the variation in $y$ such that: 
\begin{equation}
y = \rho(\mathcal{M}(G), \mathcal{M}(\hat{G}))
\end{equation}

is \textbf{linearly correlated} with the distortion $d$ applied to $G$. Where $\hat{G}$ is the distorted graph, and \(\rho\) is the correlation function. We express $\mathcal{M}$ as follows:

\begin{equation}
     \mathcal{M}(\hat{G})  = 1 - k \times d 
\end{equation}

Where $d$ is the distortion level expressed as a ratio (e.g., $d = 0.1$ for 10\% distortion), and $k$ is a constant scaling factor. 





We introduce the $GRAM$: an adjustable and learnable measure for evaluating the quality of generated graphs. Unlike existing metrics, such as centrality measurements that separately assess different graph aspects, $GRAM$ provides a linear approximation of the relationship between distortion evolution and its output value taking into consideration multiple aspects of the graph. For instance, a $GRAM$ value of 0.8 indicates that the graph is 80\% similar to the original data. The result of our metric is represented by the $y$ value, which indicates the degree of similarity between the generated and original graphs.
{We opt for a linear model due to its ease of interpretation and analytical benefits \cite{hastie2009elements}. In graph distortion context, the linear relationship clarifies how changes in edge weights impact overall metrics, enhancing result communication.}



To do so, as seen in Fig. \ref{fig:method} for each graph, we define a matrix $A \in \mathbb{R}^{p \times q}$, such that $p$ is the number of distortion levels, $q$ is the number of existing metrics. Within $A$, each element $A_{i,j}$ represents the correlation between a given metric's output ${\xi} ({G})$ and ${\xi} (\hat{G})$ applied to $G$ and $\hat{G}$, respectively, at a particular distortion level $d$. Here $i$ indexes a distinct distortion level, and $j$ refers to a particular metric correlation (e.g., At a distortion increment $s= 0.1$, $A_{1,3}$ corresponds to the third metric correlation at a 10\% distortion level). {We define $C$ as the reference output of the metric ensuring adherence to Assumption 1. $GRAM$ aims to find the values $\alpha_j$ forming a vector $B$ such that: $A \times B = C$.}

We define \(GRAM(G)\) as a weighted sum of the metrics' correlations between the ground truth and distorted graphs. Specifically, let ${\xi}_j ({G})$ and ${\xi}_j (\hat{G})$ denote the \(j\)-th metric evaluated on \(G\) and the distorted graph \(\hat{G}\), respectively. Then:
\begin{equation}
    GRAM(G) =  \sum_{j = 1}^{q} \alpha_{j} \times \rho({{\xi}_j(G)}, {\xi}_j (\hat{G}))
\end{equation}
Where \(\rho\) is the correlation function and \(\alpha_j\) are the learnable weights for each metric's correlation.


To solve the vector $B$, we leverage the universal approximation theorem demonstrated by \cite{hornik1989multilayer}, which establishes that a feedforward neural network featuring a single hidden layer can approximate any continuous function with sufficient neurons and appropriate parameters (weights and biases). Consequently, our approach involves training a Multi-Layer Perceptron (MLP) to determine the parameters within $B$, taking $A$ as input and $C$ as output.



To train the MLP, we minimize the loss between two surfaces: \( S_\text{r} \), formed by the intersection of vector \( C \) with the \( X \) and \( Y \) axes, and \( S_\text{p} \), defined by the curve of predicted weights in \( B \), where the vector \( A \times B \) intersects the \( X \) and \( Y \) axes, Fig. \ref{fig:method} (3). {The goal is to optimize the predicted weights in \( B \) so that the vector \( A \times B \) closely approximates vector \( C \), aligning surfaces \( S_\text{r} \) and \( S_\text{p} \).}




To do so, we use least squares regression \cite{levie2000curve} to fit a polynomial function \[ P(x) = a_n x^n + a_{n-1} x^{n-1} + \cdots + a_1 x + a_0 \] to the $A \times B$ output data. This involves finding a polynomial that minimizes the sum of the squared differences between the MLP output data points and the polynomial's predicted values. This method creates a continuous curve that closely follows the data pattern formed by $A \times B$, which we then use to approximate the integral within a specified range.

In our study, the loss is minimized by the following process: first, the total surface created by the MLP values is divided into $n$ distinct parts. Each part is optimized independently to simplify parameter convergence. Finally, we average all the results from the optimizations.

Our proposed surface loss can be defined as follows:

\begin{equation}
\text{Surface Loss} = \frac{1}{n} \sum_{i=1}^{n} \int_{x_{\text{min}}}^{x_{\text{max}}} \left| f_{\text{C}}(x) - f_{\text{MLP}}(x) \right| \, dx 
\end{equation}



where $n$ represents the number of distinct surface parts, \( x_{\text{min}} \) and \( x_{\text{max}} \) denote the minimum and maximum values of the input range, respectively. {The function \( f_{\text{C}}(x) \) denotes the line defined by \( C \) values, while \( f_{\text{MLP}}(x) \) corresponds to the polynomial approximation function of $B$ (the MLP output) multiplied by $A$.}

\textbf{Training details.} We train our model for 250 epochs, using Google Colab. For optimization, we use Adam optimize \cite{kingma_adam_2017}, with learning rate of 0.01. {We used an 80/20 split, resulting in a training set of 7,040 samples and a testing set of 1,760 samples. The training of $GRAM$ took 1 hour and 14 minutes.}




\section{Results and discussion}

In this section, we evaluate 10 selected individual graph metrics as well as our proposed $GRAM$. Additionally, we discuss each of the findings. 



\subsection{Dataset}


We used a dataset from \cite{vskoch2022human} that contains 88 subjects (48 females, 40 males aged between 18 and 48 years). All subjects are right-handed and healthy. The dataset contains the structural connectomes where each connectome contains 90 brain regions of intrest from the Automated Anatomical Labeling Atlas (AALA) \cite{tzourio2002automated}. {We emphasize that our simulations resulted in a total of 8,800 graphs.}   



\subsection{Single Metric Evaluation}



For this study, we select a step of $s = 0.1$ and a set of ten widely utilized graph metrics in the literature \cite{rubinov2010complex}. These metrics are: Betweenness Centrality  \cite{freeman1977set}, Closeness Centrality   \cite{sabidussi1966centrality}, Weighted Degree Centrality   \cite{barrat2004architecture}, Eigen Centrality       \cite{bonacich1972factoring}, Pagerank Centrality    \cite{page1999pagerank}, Katz Centrality         \cite{bonacich2001eigenvector}, Hub-Authority           \cite{chakrabarti1999mining}, Harmony   \cite{Marchiori2000HarmonyIT}, Average Neighbor Degree \cite{yao2017average}, Diversity Index \cite{shannonformula}. As a baseline for evaluating our proposed $GRAM$, Fig. \ref{fig:metrics_corr}, displays the correlation between the ground truth graphs and the generated ones, for each individual metric across various levels of distortion.










\begin{figure}[h!]
\centering
\includegraphics[width=12cm]{single_m.pdf}
\caption{\label{fig:metrics_corr}  \textit{Correlations for single metrics}. We plot different correlations (Pearson, Spearman and Kendall's Tau correlations) between the ground truth graphs and the distorted ones, for each individual metric across various levels of distortion.}
\end{figure}



Fig. {\ref{fig:metrics_corr}} shows the evolution of the independent metrics differs across the studied correlation coefficients. These evolutions are non-linear and could be visually categorized into two distinct patterns. The first pattern includes metrics such as Betweenness, Closeness, Harmony, and Weighted Degree Centralities which exhibit a moderate progression for values of $d < 0.8$ followed by a rapid decline towards a correlation values of 0. Contrarily, the second pattern shows metrics such as Eigenvector, PageRank, Katz, and Diversity Index. These metrics show a non-linear evolution, characterized by a rapid correlation decline for distortion levels lower than $d=0.3$. This decline is then followed by a gradual stabilization of the correlations for levels where $d > 0.3$. This observation highlights the insufficiency of relying on a single set of metrics to comprehensively assess generated graph quality. All metrics exhibit non-linear correlations compared to the reference line, thus rendering them unreliable due to their \emph{under-estimation} or \emph{over-estimation} of distortion levels.




\begin{figure}[h]
\centering
\includegraphics[width=12cm]{gram.pdf}
\caption{\label{fig:gram} \textit{$GRAM$ testing results}. The figure illustrates the intersection of $A \times B$ (blue) and {reference} vector $C$ (orange) as surfaces intersecting the $x$ and $y$ axes, shown for Pearson, Spearman, and Kendall's Tau correlation coefficients.}


\end{figure}


\subsection{GRAM evaluation}



We generated distorted graphs using a step value of \( s = 0.1 \) and trained $GRAM$ using the ten previously listed metrics. The training process optimizes two separate surfaces. The loss for the first surface is calculated over the range \([0.1, 0.5]\), while the loss for the second surface is calculated over the range \([0.5, 1]\). Fig. \ref{fig:gram} shows $GRAM$ testing results based on the previously mentioned correlations (i.e., Pearson correlation coefficient, etc ...). Visually, the correlational outputs of $GRAM$ seem to closely approximate the target triangular shape created by the reference line and its intersection with $x$ and $y$  axis. 



Table \ref{tab:alpha_j_values} shows the weight of each metric as produced by $GRAM$ across various correlation coefficients. The average neighbor degree consistently exhibits high weight values across Pearson, Spearman, and Kendall's Tau correlation coefficients. Yet, almost all the other metrics' weights are close to ($0 \pm 0.1$). This disparity in the correlation's values may be due to the significant overlap in the information captured by the average neighbor degree and closeness centrality with other metrics like betweenness or diversity index. The computation redundancy in some metrics could lead to one of these metrics to be over-represented compared to similar metrics.




\begin{table}[h!]
\centering
\setlength{\tabcolsep}{4.23mm}
\small 
\begin{tabular}{ccccl}
    \toprule
    \bm{$\alpha_j$} & \textbf{$\bm\rho_{p}$ }  & \textbf{$\bm\rho_{s}$}  & \textbf{$\bm\rho_{k}$ } & \textbf{Single metrics} \\
    \midrule
    $\bm{\alpha_1}$ & 0.230 & 0.250 & \colorbox{cyan!20}{0.643} & \textcolor[RGB]{55, 126, 184}{Betweenness} \\
    $\bm{\alpha_2}$ & \colorbox{orange!50} {0.473} & 0.159 & 0.166 & \textcolor[RGB]{255, 127, 0}{Closeness} \\
    \textbf{$\bm{\alpha_3}$} & {-0.087} & {0.223} & {0.106} & \textcolor[RGB]{77, 175, 74}{Weighted Degree} \\
    $\bm{\alpha_4}$ & -0.067 & 0.056 & 0.033 & \textcolor[RGB]{247, 129, 191}{Eigenvector} \\
    $\bm{\alpha_5}$ & 0.133 & 0.024 & 0.129 & \textcolor[RGB]{166, 86, 40}{PageRank} \\
    $\bm{\alpha_6}$ & 0.022 & 0.069 & 0.006 & \textcolor[RGB]{152, 78, 163}{Katz} \\
    $\bm{\alpha_7}$ & 0.001 & 0.006 & -0.059 & \textcolor[RGB]{153, 153, 153}{Hub-Authority} \\
    $\bm{\alpha_8}$ & 0.022 & -0.159 & -0.156 & \textcolor[RGB]{228, 26, 28}{Harmony} \\
    $\bm{\alpha_9}$ & \colorbox{yellow!50}{0.309} & \colorbox{yellow!50}{0.614} & \colorbox{yellow!50}{0.578} & \textcolor[RGB]{222, 222, 0}{Average Neighbor Degree} \\
    $\bm{\alpha_{10}}$ & 0.260 & \colorbox{blue!20}{0.501} & 0.349 & \textcolor[RGB]{77, 1, 177}{Diversity Index} \\
    \bottomrule
\end{tabular}

\caption{Optimised $\alpha_j$ values for Pearson $\rho_{p}$, Spearman $\rho_{s}$, and Kendall's Tau $\rho_{k}$ correlations}
\label{tab:alpha_j_values}
\end{table}



\textbf{Limitations and Future Directions.} This study marks an initial effort to establish a universal metric for assessing the quality of generated brain graphs. One notable limitation is the selection of ten metrics that share similar calculation methods. Another limitation is the evaluation based on a single dataset. {Future research should explore a broader range of metrics (e.g., that capture global network properties) and evaluate the model across various graph datasets (e.g., functional vs structural connectomes) for different applications.
Additionally, future work should include a comparison between GRAM and other methods for assessing graph quality. Future studies will also incorporate the use of GAN-generated data to further validate our approach.}



\section{Conclusion}

This paper introduced the Graph Regularizable Assessment Metric ($GRAM$) to evaluate the quality of generated brain graphs that could be used as a universal method in reporting the quality of generated graphs in future predictive studies. It combines multiple metrics in a weighted framework, addressing the limitations of existing graph quality metrics. Our proposed method establishes a general assumption for graph quality based on the linearity between distortion and metric values where we used a multi-layer perceptron to optimize metric weights. We test $GRAM$ using a set of simulated structural connectome data on which it demonstrated reasonable reliability in quantifying graph quality. {This approach is a significant step towards establishing a universal graph quality index for graph-based predictive studies (e.g., predicting disease progression in Alzheimer's, analyzing brain network development in infants).} In future work, we aim to extend $GRAM$ to diverse graph types and datasets.
\\ \\
\begin{credits}
\textbf{Code Availability.} All codes used for this study are available in: \url{https://github.com/mariemtouihri/GRAM-Metric}
\subsubsection{\discintname}
The authors have no competing interests to declare that are relevant to the content of this article.
\end{credits}
%
% ---- Bibliography ----
%
% BibTeX users should specify bibliography style 'splncs04'.
% References will then be sorted and formatted in the correct style.
%
\bibliographystyle{splncs04}
\bibliography{Paper-06}
%

\end{document}
