\newpage
\appendix
\begin{center}
\LARGE{\textbf{Appendix}}
\end{center}
\section{Problems Considered and their SDP}
\label{sec:sdp-formulations}
In this appendix, we describe the problems considered in this paper and the ILP formulations that we use for SDP relaxation.

\textbf{Max-Cut}: In the Max-Cut problem, we are given a weighted graph $G = (V, E)$ in which the edge $(u,v) \in E$ has weight $w_{uv
}$ and the goal is to partition the vertex set $V$ into two sets $S \subset V$ and $V \setminus S$ so that we maximize the number of edges in $E$ which have one end-point in both sets. This problem is NP-hard and we can use the following integer programming formulation for this problem.
\begin{align}
\max \sum_{u,v \in E} w_{uv} \left( \frac{1-x_u \cdot x_v}{2} \right) \text{  s.t. } x_u \in \{-1,1\}
\end{align}
Here, $x_u = 1$ if $u \in S$ and $x_u = -1$ otherwise. The value of the cut equals $\sum_{u \in S, v \in V/S} w_{uv} = \sum_{u,v \in E} w_{uv} \left( \frac{1-x_u \cdot x_v}{2} \right)$. The integer constraint $x_u \in \{-1,1\}$ can also be written as $x_u^2 = 1$.

The best approximation ratio for this problem is obtained by a celebrated algorithm due to Goemans and Williamson~\cite{GW95} using semi-definite programming. 
The SDP relaxation used in their paper is as follows: 
\begin{align}
\max \sum_{(u,v) \in E}  w_{uv} \left( \frac{1-\bm \sigma_u \cdot \bm \sigma_u}{2} \right)
\ \ \text{s.t. } \bm \sigma_u \in \mathbb{R}^n \text{ and }
\bm \sigma_u \cdot \bm \sigma_u  = 1\ \ \forall\ u \in V.\\
\end{align}
Here $\bm \sigma_v$ is an $n$-dimensional unit vector associated with the vertex $v$. Note that the quantity we are maximizing depends only on the inner products of the vectors and is not affected by rotations in $\mathbb{R}^n$. In this paper, we only use the pairwise dot products of the vectors associated with the vertices and do not really need the vectors $\bm \sigma_v$.

\textbf{Max-Clique}:
In the Max-Clique problem, we are given a graph $G=(V,E)$ and the goal is to find the largest subset $S \subseteq V$ which forms a clique i.e., each pair of vertices in $S$ are connected by an edge in the graph. The following integer programming formulation can be used for solving this problem:
\begin{align}
 \max\, & \sum_{v\in V} x_v\\
\text{s.t. } & x_u \cdot x_v = 0 \ \ \ \ \forall\ (u,v) \notin E\\
& x_v \in \{0,1\}\ \ \ \forall v \in V\\
\end{align}

Here is the SDP relaxation we use for this problem: 
\begin{align}
 \max\, & \sum_{v\in V} \bm \sigma_v \cdot \bm \sigma_v\\
\text{s.t. } & \bm \sigma_0 \cdot \bm \sigma_0 = 1,\\
&\bm \sigma_v  \cdot \bm \sigma_v = \bm \sigma_v \cdot \bm \sigma_0 \\
&\bm \sigma_u \cdot \bm \sigma_v = 0 \ \forall\ (u,v) \notin E
\end{align}
As in the previous problem, $\bm \sigma_v$ is an $n$-dimensional vector (not necessarily of length $1$) associated with the vertex $v$. In addition, we have a vector $\bm \sigma_0$. 

\textbf{Graph Coloring}:
In the graph colouring problem, we are given a graph $G=(V,E)$, and the goal is to colour the vertices with the minimum number of distinct colours so that any pair of vertices joined by an edge $e \in E$ have different colours. The ILP formulation for the problem is as follows:
\begin{align*}
    \min\, \sum_{k=1}^n y_k\\
    \text{ s.t. }&  \sum_{k=1}^n x_{uk} = 1 \ \ \forall u \in V\\
    & x_{uk} \leq y_k \ \ \ \forall\ k = 1, \ldots, n\ \ \text{ and } \forall u \in V\\
    &x_{uk} + x_{vk} \leq 1 \ \ \forall\ k = 1, \ldots, n\ \ \text{ and }\ \forall\ (u,v)\in E\\
    & 0 \leq x_{uk}, y_k \leq 1 \ \ \ \forall\ k = 1, \ldots, n\ \ \text{ and } \forall u \in V\\
    & x_{uk}, y_k \in \mathbb{Z}\\
\end{align*}

Here is the SDP relaxation we use:
\begin{align*}
    \min\, \  & t\\
    \text{ s.t. }&  \bm \sigma_u \cdot \bm \sigma_v \le t\ \forall\ (u,v) \in E\\
    & \bm \sigma_v \cdot \bm \sigma_v = 1\ \forall\ v\in V\\
    &\bm \sigma_v \in \mathbb{R}^n\ \forall\ v\in V
\end{align*}
Here too, for every vertex $v \in V$, we have an associated vector $\bm \sigma_v$ of unit length.


\section{Hopfield Network for Max-Clique}
\label{sec:hop-clique}
For this problem, we treat a neuron's output as the probability that it belongs to the Max-Clique. 
As the SDP formulation of the Max-Clique problem indicates, for any vertex $v$, $\bm \sigma_v \cdot \bm \sigma_v$ can be thought of as the ``probability that $v$ belongs to the clique". We therefore set $\bm \sigma_v \cdot \bm \sigma_v$ as the bias for the neuron corresponding to $v$. 
In addition, since we want to avoid picking non-adjacent vertices in the clique, we set the interaction between the neurons corresponding to the vertices $u$ and $v$ as $-1$ if $u$ and $v$ are non-adjacent and $1$ otherwise. One difference with the Max-Cut problem is that whereas any cut is a feasible solution to the Max-Cut problem, arbitrary subsets of the vertices do not generally form a clique and, therefore, are not feasible solutions. Unlike for Max-Cut, we initialize the outputs for each of the neurons to $0$, so that after the first round of updates, the neuron corresponding to vertex $v$ has a pre-activation output of $\bm \sigma_v \cdot \bm \sigma_v$. 

\textbf{Experimental results.} For this architecture, we tested its performance on the IMDB-Binary (1000 instances with 19 nodes and 96 edges on average), Google Colab~\cite{Yanardag15} (5000 instances with 74 nodes and 2457 edges on average) and Twitter~\cite{snap14} graph datasets along with a Custom Dataset consisting of 1000 Erdős-Rényi graphs with an edge probability of 0.5 and a planted hidden clique that is twice the size of the Max-Clique within the graph. We carefully ensure that we have a similar degree distribution between nodes within the planted clique and nodes outside it. We compare the performance with the Erdős Goes Neural~\cite{KL20} fast GNN architecture as well as a greedy heuristic. The greedy heuristic builds a clique by starting with the highest-degree node in a clique and iteratively adding the next highest-degree node that is connected to all current members of the clique until no further nodes can be added. Table~\ref{tab:clique} shows that our Hopfield network-based approach returns optimal or near-optimal maximum cliques on these graph collections. The mean optimality ratio of our approach is considerably better than the Erdős Goes Neural GNN technique and the greedy heuristic.



\begin{table}[!ht]
    \centering
    \begin{tabular}{|| c | c | c | c | c ||}
    \hline
         &  IMDB-Binary &  Collab &  Twitter & Custom Clique \\
    \hline
        Hopfield Network &  0.993($\pm$0.068) & 0.996($\pm$0.058)  & 0.978($\pm$0.115) & 0.994 ($\pm$0.062)  \\
    \hline
        Erdős Goes Neural~\cite{KL20} &  1.0($\pm$0.0) & 0.982($\pm$0.063)  & 0.924($\pm$0.133) & 0.810($\pm$0.226)  \\
    \hline
        Greedy Heuristic &  0.954($\pm$0.133) & 0.886($\pm$0.195)  & 0.848($\pm$0.154) & 0.740($\pm$0.238)  \\
    \hline
      \hline
        \% of Invalid Cliques &  0.4\% & 0.3\%  & 0\% & 0.3\%  \\
    \hline
    \hline
     SDP Runtime & 0.194($\pm$0.258) & 2.589($\pm$6.046)& 21.219($\pm$38.190)& 4.385($\pm$0.536)\\
    \hline
        Hopfield Runtime &  0.0006($\pm$0.0007) & 0.006($\pm$0.0130)  & 0.0132($\pm$0.0094) & 0.0117($\pm$0.0007)  \\
    \hline

    \end{tabular}
    \caption{Mean optimality ratio and the standard deviation for Max-Clique over graphs in different collections, runtimes in seconds and percentage of infeasible cliques found.}
    \label{tab:clique}
\end{table}

The "Erdoes Goes Neural" approach always produces feasible solutions. Our approach based on Hopfield networks can sometimes result in infeasible solutions. However, as shown in Table~\ref{tab:clique}, even for our approach, the frequency of producing infeasible solutions is very small (0.4\% on IMDB-Binary, 0.3\% on Collab, 0\% on Twitter instances, and 0. 3\% on Custom clique instances).

Note that for this problem, one can always return a subset of the vertices in the returned solution that form a maximal clique to ensure feasibility. Similarly, in the coloring problem, one can modify the algorithm so that it colors vertices one by one and we only allow a set of colors to a vertex which are distinct from already colored neighbors. For many problems, such postprocessing of the solution is possible.


\section{Hopfield Network for Graph Coloring}
\label{sec:hop-coloring}
\begin{table}[!ht]
    \centering
    \begin{tabular}{|| c | c | c || c | c||}
    \hline
         Graph Name&  $\chi(G)$ & Hopfield $\chi(G)$ & SDP Runtime & Hopfield Runtime \\ \hline
        1-Insertions-4 & 5 & 5 & 3.968 & 0.043 \\ \hline
        2-Insertions-4 & 5 & 5 & 63.319 & 0.156\\ \hline
        Anna & 11 & 11 & 3.878 & 0.576\\ \hline
        David & 11 & 11 & 1.624 & 0.354\\ \hline
        Games120 & 9 & 9 & 4.563 & 0.462\\ \hline
        Huck & 11 & 11 & 1.121 & 0.279\\ \hline
        Mugg88-1 & 4 & 4 & 28.658 & 0.067\\ \hline
        Myciel5 & 6 & 6 & 1.059 & 0.057\\ \hline
        Myciel6 & 7 & 7 & 5.53 & 0.319\\ \hline
        Myciel7 & 8 & 8 & 46.526 & 0.817\\ \hline
        Queen5-5 & 5 & 5 & 0.303  & 0.071\\ \hline
        Queen6-6 & 7 & 8 & 1.197  & 0.016\\ \hline
        Queen7-7 & 7 & 9 & 1.191 & 0.032\\ \hline
        Queen8-8 & 9 & 10 & 1.975 & 0.063\\ \hline
        Queen8-12 & 12 & 12 & 3.590 & 0.113\\ \hline
        Queen9-9 & 10 & 11 & 2.628 & 0.076\\ \hline
        Queen11-11 & 11 & 14 & 6.931 & 0.307\\ \hline
        Queen13-13 & 13 & 15 & 20.522 & 0.351\\ \hline
    \end{tabular}
    \caption{Optimal chromatic number and chromatic number returned by Hopfield network for several graphs and runtimes in seconds.}
    \label{tab:colour}
\end{table}


For the graph colouring problem, instead of directly minimizing the number of colours required via a Hopfield network, we fix a parameter $k$ and use a Hopfield network to try to find a $k$-coloring. If the network is not able to find such a colouring after several attempts, we increase $k$ and retry with a larger $k$. For the remainder of this subsection, we assume that $k$ is fixed, and our goal is to find a $k$-coloring. 
In this case, we want the output of each layer to be a probability vector whose $i^{th}$ entry corresponds to the probability that the vertex has colour $i$. 
This is done by having $k$ neurons corresponding to a vertex $v$ - one for each of the colours and using the softmax function as the activation function.
Let $t^*$ be the objective value of the SDP solution.  By definition, if vertices $u$ and $v$ are neighbours in the graph, then $\bm \sigma_u \cdot \bm \sigma_v \leq t^*$. In this case, we do not want $u$ and $v$ to have the same colour and to ensure this; we set the interaction between $u_i$ and $v_i$ for every $i \in\{1,\cdots, k\}$ to $-2n$, where $n$ is the number of vertices in the graph. 
Otherwise, if $\bm \sigma_u \cdot \bm \sigma_v > t^*$, we set the interaction between $u_i$ and $v_i$ to $\bm \sigma_u \cdot \bm \sigma_u - t^*$ so that the larger the quantity, the more we ``encourage" $u$ and $v$ to have the same colour. We choose the initial output of each neuron to be the uniform distribution over the $k$ colours. Once the output probability distribution (on the $k$ colours) for each vertex is fixed, we choose a colour for the vertex from this probability distribution.



 
\textbf{Experimental results.} To test the schema, we tested the performance on several graphs from the COLOR02/03/04 dataset\footnote{\url{https://mat.tepper.cmu.edu/COLOR02/}}. We selected a subset of graphs similar to that used in a recent paper on GNNs for graph colouring~\cite{LLMCPY22}. Table~\ref{tab:colour} shows that our approach finds the optimal or near-optimal chromatic number on these graphs.

\section{Generalization of learnt Hopfield network to different graph classes}
\label{app:generalization}
\begin{table}[!ht]
    \centering
    \begin{tabular}{|| c | c | c | c||}
    \hline
         &  SF-295 Test &  Twitter &  Custom Cut\\
    \hline
        Hopfield Network & 0.979($\pm$0.105) & 0.986($\pm$0.013)  & 0.998($\pm$0.002) \\
    \hline
    \hline
        Runtime  & 0.443($\pm$0.478) & 9.512($\pm$7.546)  & 7.364($\pm$0.306)\\
    \hline
    \end{tabular}
    \caption{Mean optimality ratio and runtimes in seconds for Max-Cut with Learnt Models}
    \label{tab:cutlearn}
\end{table}

\begin{table}[!ht]
    \centering
    \begin{tabular}{|| c | c | c | c | c||}
    \hline
         &  IMDB-Binary Test &  Collab &  Twitter & Custom Clique \\
    \hline
        \% of Invalid Cliques &  2\% & 2.3\%  & 35.1\% & 17\%  \\
    \hline
        Hopfield Network  &  1.0($\pm$0.000) & 0.996($\pm$0.004)  & 0.991($\pm$0.024)  & 1.0($\pm$0.000)  \\
    \hline
    \hline
        Runtime &  0.222($\pm$0.298) & 4.940($\pm$11.486)  & 10.291($\pm$8.122) & 8.101($\pm$0.083)  \\
    \hline
    \end{tabular}
    \caption{ Percentage of valid cliques found, mean optimality ratio of valid cliques and runtimes in seconds for Max-Clique with learnt models.}
    \label{tab:cliquelearn}
\end{table}

To test the generalization ability of our learning models to learn the interaction weights on Hopfield network, we tested our neural network performance of our learnt model for Max-Cut on the Twitter dataset and the custom cut dataset consisting of 128 nodes Erdős-Rényi graphs. Even though our training dataset consisted of 10-50 node SF-295 graphs (average number of nodes 31 with density 0.08) which is very different from the Twitter (average number of nodes is 130 with density 0.18) and Custom Cut of 128 node instances (average density of 0.48), Table~\ref{tab:cutlearn} shows that the model can still produce edge weights that the Hopfield network can decode to produce cuts of high quality. While the optimality ratio obtained on the test part of the SF-295 dataset is $0.979(\pm0.105)$, the optimality ratio for Twitter and Custom dataset is $0.986(\pm0.013)$ and $0.998(\pm0.002)$, respectively which is comparable to the instances from the same distribution. Likewise, for Max-Clique, we tested our neural network performance on the Google Collab, Twitter and Custom Clique datasets to show the generalization performance of our learnt model (Table~\ref{tab:cliquelearn}). The distribution of our training dataset (average number of nodes is 20 with density 0.52) again differs from the Google Collab (average number of nodes is 74 with density 0.51), Custom Clique of 128 node instances (average density 0.49) and Twitter datasets (average number of nodes is 130 with density 0.18), but still produces edge weights that result in high-quality cliques. This shows that the model has learnt a function which can be used on graphs outside of its training data/graph size and can generalize well across distributions and larger size instances.

\section{Sensitivity of Hopfield network to the noise and precision of SDP Gram matrix}
\label{app:sensitivity}
A potential criticism of our approach is that it relies on a computationally expensive step of calculating SDP vectors. In this section, we show that we do not actually need the exact computation of SDP vectors. In fact, a coarse approximation of SDP vectors can already yield near-optimal solutions. 
Such low-accuracy SDP vectors can be computed efficiently and scalably. For instance, a recent pre-print by Yau et al.~\cite{YLKXJ23} shows that GNNs can be used to learn a low-rank SDP. Yurtsever et al.~\cite{YTF21} solved the Max-Cut SDP (to moderate accuracy) for a sparse graph with over 20 million vertices, where the matrix variable has over $10^{14}$ entries on a laptop equivalent machine. In this context, we also refer the reader to a review article by Majumdar et al.~\cite{MHA20}, a recent paper by Durante et al.~\cite{DKS22} and the book chapter on ``Approximately Solving Semidefinite Programs" by G\"{a}rtner and Matou\v{s}ek~\cite{GM12book}. 

\begin{figure}[!ht]
    \begin{center}
      \includegraphics[width=.49\linewidth]{images/SF295NewNoise.png} 
      \includegraphics[width=.49\linewidth]{images/TwitterNewNoise.png} 
      \includegraphics[width=.49\linewidth]{images/CustomNewNoise.png}  
    \end{center}
    \caption{The plot shows how the mean optimality ratio decreases with increasing noise for our approach based on Hopfield network/SDP vectors and for the Goemans-Williamson approximation algorithm. The results are shown for the SF-295 graphs (top-left), the collection of Twitter graphs (top-right) and our collection of Custom Cut Graphs with 128 nodes (bottom). \label{fig:noise}}
\end{figure}

Figure~\ref{fig:noise} shows that our approach based on the Hopfield network is significantly more robust to the addition of noise than the Goemans-Williamson approximation algorithm~\cite{GW95}. In this experiment, we modified each value in our Gram matrix by adding some random value from a uniform distribution. A magnitude of $M$ in Figure~\ref{fig:noise} refers to adding a uniform random noise in the range $[-M,M]$ to each element in the Gram matrix. Note that both -- our approach and the Goemans-Williamson algorithm --  rely on SDP vectors. However, the optimality ratio of the Goemans-Williamson algorithm goes down considerably as we insert noise in the Gram matrix, whereas our approach returns optimal or near-optimal solutions even when a large amount of noise is added.  As expected, too much noise kills the performance of our approach as well, indicating that SDP solutions do contain useful information for our approach.

\begin{figure}[!ht]
    \begin{center}
      \includegraphics[width=.48\linewidth]{images/Precision-SF.png}  
      \includegraphics[width=.48\linewidth]{images/Precision-twitter.png} 
      \includegraphics[width=.48\linewidth]{images/Precision-Custom.png}  
    \end{center}
    \caption{The plot shows how the mean optimality ratio decreases with reduced precision for our approach based on Hopfield network/SDP vectors and for the Goemans-Williamson approximation algorithm. The results are shown for the SF-295 graphs (top-left), the collection of Twitter graphs (top-right) and our collection of Custom Cut Graphs with 128 nodes (bottom). \label{fig:precision}}
\end{figure}

Next, we explore the effect of reducing the precision by rounding the values in the Gram matrix to the nearest decimal place. Figure~\ref{fig:precision} shows that when we reduced the precision of Gram Matrix instead of adding random noise, we could still obtain optimal or near-optimal solutions on most problem instances. Again, this is in contrast with the Goemans-Williamson algorithm, for which the optimality ratio degrades considerably. We obtained similar noise/precision results for the graphs from IMDB and the Google Collab collection as well.

We obtained similar results for the Max Clique problem instances when we reduced the precision by rounding the values in the Gram matrix to the nearest decimal place.
From the above discussion, we conclude that our rounding approach is tolerant to noise and can potentially leverage a low-precision SDP Gram matrix (e.g., using an approximation) to extract good-quality solutions.

