\section{Introduction}
Semi-definite programming (SDP) is one of the most powerful techniques used in the design of approximation algorithms for combinatorial optimization problems. 
These approximation algorithms typically solve an SDP relaxation to the problem and round it to obtain a feasible solution (see the classical textbooks~\cite{GM12book,V01book,WS11} for several examples of these approximation algorithms). The latter part is usually the non-trivial part and while very clever rounding algorithms have been designed for different optimization problems, no general techniques exist. 
This implies that every new variant of even well-studied problems typically requires designing a rounding algorithm from scratch. This, along with the fact that these algorithms are usually geared towards worst-case performance guarantees and do not exploit real-world instance distributions, has prevented the widespread adoption of SDP-based techniques to solve practical problems. 
One fact that stands out clearly from the existence of so many non-trivial approximation algorithms based on SDPs (see e.g.,~\cite{KMS94, Blum92, C07, GK00, Karp72, KLS93, Wig83}) is that the SDP solution does contain useful information that captures the global structure of the problem. It is therefore desirable to find a generic rounding technique which can use the SDP solution to find good solutions for real-world instances. Such a rounding technique may not come with theoretical guarantees but it should be easy to use and work well for practical instances of a large number of problems. 

In this paper, we present a simple rounding technique based on Hopfield networks~\cite{Hop82}, whose edge weights are determined from the SDP Gram matrix for the corresponding problem. The dependence of the edge weights on the SDP Gram matrix is problem-specific and can either be hand-designed or learnt from data. The latter approach allows for the rounding technique to be dependent on and exploit the instance distribution. Our technique is sufficiently generic that it can be used with any problem for which there is an SDP relaxation. We chose to use a Hopfield network for rounding since it is a simple way to make the outputs binary while respecting pairwise correlations among them. We demonstrate the efficacy of our approach on three well-studied problems: Max-Cut, Max-Clique, and Graph Coloring. In all three cases, our heuristic rounding algorithm finds solutions close to the optimal solution that are significantly better than the solutions found by the corresponding approximation algorithms. 
Our work indicates that there is potential to design heuristics for SDP rounding that work well for practical instances. 

Our experiments also indicate that our rounding algorithm is not very sensitive to noise in the SDP solution. This allows us to replace the exact computation of the SDP solution with an approximate computation, thus speeding up the algorithm while incurring very little loss in the quality of the solution. 
To give an example, our algorithm for Max-Cut finds a cut of size within 0.6\% of the optimal solution in a graph with 10,000 nodes obtained from a benchmarking dataset in about one minute. 

From a machine learning (ML) point of view, existing approaches for solving combinatorial optimization problems typically rely on end-to-end deep learning (e.g., Graph Neural Networks). Such techniques throw away the algorithmic techniques that have already proven useful and, as a result, also require a lot of time and data for training. These techniques also struggle to generalize well to instances of larger size (compared to training instances) and instances from a different input distribution. In light of these issues, Bengio et al.~\cite{BLP21} opined in a survey of learning algorithms for combinatorial optimization that {\it``We believe end-to-end machine learning approaches to combinatorial optimization are not enough and advocate for using machine learning in combination with current combinatorial optimization algorithms to benefit from the theoretical guarantees and state-of-the-art algorithms already available."}
In line with this, we directly integrate the SDP solution (which is theoretically known to be very useful) in the learning technique. Due to the simplicity of our rounding algorithm, it suffices to use shallow neural networks to compute the Hopfield network weights - this means that their parameters can be learnt efficiently with very little data. 
Our empirical results show that the learning generalizes across instances of different sizes, allowing us to learn from small instances and apply the model to larger instances. It also generalizes quite well across different instance distributions. 

Embeddings obtained from Graph Neural Networks (GNNs) are known to be effective in capturing local topological features, but they often struggle to capture  
the global combinatorial structure in a problem instance (see e.g. \cite{SLM21},\cite{DKSH23}). In contrast, linear programming (LP) and semi-definite programming (SDP) based relaxations have been very successful in the design of approximation algorithms, implying that ``embeddings" obtained from them are good at capturing global combinatorial information, and this can be computed efficiently. It is therefore natural to consider machine learning architectures that leverage SDP-based embeddings.  This is what motivates our current work.

Information from an SDP solver comes in the form of a Gram matrix which indicates pairwise correlations. Hopfield networks are a simple way to do a rounding based on pairwise correlations. While one could certainly use more sophisticated techniques like GNNs, our main goal in the paper is to demonstrate the usefulness of SDP-based embeddings and for this, Hopfield networks suffice.

While we only considered three well-studied classical problems in this paper, SDPs have been successfully used in a wide variety of problems. Many combinatorial optimization problems of interest can be formulated as quadratic integer programs with binary variables and admit an SDP relaxation. Our approach is general enough to be applied to any such problem. 

Note that heuristics like local search, which are known to work well in practical settings (especially $1$-local search), can always be used on top of our framework. We do not study such additional tricks in this paper so that we can evaluate our framework in isolation.

\noindent
\textbf{Our Contribution.} 
To summarize, our contribution in this work is as follows:
\begin{itemize}
\item 
We first show that there is a simple heuristic based on Hopfield networks which is able to round SDP solutions for practical instances of well-known combinatorial optimization problems, namely Max-Cut, Max-Clique and Graph Colouring.

\item Next, we show that the functional relation between the SDP solution and the Hopfield network can be learnt from data. This allows the rounding technique to be adapted to new problems and also to the target input distributions for those problems. Furthermore, shallow neural networks suffice for this purpose, which means that the training process does not require much data or processing time.

\item Finally, we show that our learnt heuristic is robust to noise in the SDP solution. This allows the replacement of an exact computation of the SDP solution by a fast approximate computation with little loss in the solution quality. 

\end{itemize}

\section{Preliminaries}

In this section, we briefly describe the basic concepts of Semi-definite programming and Hopfield networks that we use in the rest of the paper.

\subsection{Semi-definite Programming.}

Let $S_{n}$ denote the set of $n \times n$ real-symmetric matrices and  $Tr(X)$ denote the trace of matrix $X$. Further, let $C,A_{1},...,A_{m} \in S_{n}$ and $B_{1},...,B_{m}$ be the input variables for a given problem and $X\succeq0$ denotes that matrix $X$ is positive semi-definite. A semi-definite program (SDP) is a convex optimization problem of the form:
\begin{mini*}|s|
{X\in S_{n}}{Tr(CX)}
{}{}
\addConstraint{A_{i}X=B_{i}, i= 1,...,m}
\addConstraint{X\succeq0}{}
\end{mini*}
Semi-definite programs are a special case of cone programming. Any integer quadratic program can be relaxed into the form of an SDP and solved in polynomial time using algorithms such as interior point method~\cite{VB96}. Throughout this paper, we transform several NP-Hard problems in their integer quadratic form into SDP versions of that problem. We consider the classical optimization problems of Max-Cut, Max-Clique and Graph coloring. We describe these problems and the SDP formulations that we use in Appendix~\ref{sec:sdp-formulations}.


\subsection{Hopfield Networks.}
A Hopfield network~\cite{Hop82} is a fully connected network with $n$
neurons which we can number from $1$ to $n$. Each neuron has one output which is initially set to some value (often randomly). Then we update the outputs in several rounds. 
Each neuron updates its output to a new value which is obtained by taking a linear combination of the outputs of all the other neurons and applying a non-linear activation function. Specifically, for each pair of neurons $i$ and $j$, the corresponding edge in the complete graph has a weight $
\bm{\mathit{a}}_{ij}$ called the {\em interaction} between the neurons $i$ and $j$. In addition, for each neuron $i$ there is an associated bias $b_i$. If we denote the output of neuron $i$ at any point in time by $z_i$, then it is updated as follows:
$$ z_i  := A\left( \sum_{j\ne i} \bm{\mathit{a}}_{ij} z_j + b_i \right)$$
where $A$ is a nonlinear activation function (typically the sigmoid or the tanh function). 
The output of the neurons can either be updated one by one (i.e., asynchronously) or together (synchronously). In the latter case, when updated $z_i$ for any $i$, we use the old values of $z_j$ for all $j\ne i$. Typically we continue updating the outputs until they have converged (i.e., they don't change significantly from one round to the next) or a certain threshold number of updates have been performed. 

\section{Related Work}
\subsection{Semi-definite Programs and Approximation Algorithms}
Semidefinite programming is among the most powerful tools used in the design of approximation algorithms~\cite{GM12book,WS11,V01book}.  Goemans and Williamson's algorithm~\cite{GW95} for the MaxCut problem is the first approximation algorithm (from 1995) based on semi-definite programming, and it is still considered to be among the simplest and most impressive results in this area. It is also known that under the Unique Games Conjecture, this algorithm provides the best approximation possible in polynomial time~\cite{KKMO07}. 
SDP-based approximation algorithms are known (e.g.,~\cite{KMS94, Blum92, C07, GK00, Karp72, KLS93, Wig83}) for a range of combinatorial optimization problems, such as graph colouring and maximum clique.

\subsection{Rounding Algorithms for LP and SDP}
Linear Programming and Semidefinite Programming relaxations are central to the design of many approximation algorithms. Several broad as well as problem-specific techniques have been devised for rounding the solutions to such relaxations. See for instance \cite{B19,Harris24,AZ05,BRS11,YLCT23,SGSX16} and the references therein. 

\subsection{Machine learning for Combinatorial Optimization}
In the last decade, a large number of machine learning techniques have been developed to solve combinatorial optimization problems (see \cite{BLP21} for a recent survey and the citations therein). These include graph neural networks (see \cite{CCK00V23} for a very recent survey and the citations therein), reinforcement learning (see \cite{MSIB21} for a survey), neural symbolic computing (see \cite{LGGPAV20} for a survey) and graph representation learning (see \cite{PCX21} for a survey). We focus on the last technique as it is closest to our work. In graph representation learning, the first stage embeds the graphs into low-dimension vectors, and the second stage uses machine learning to solve the combinatorial optimization problems using the embeddings of the graphs learned in the first stage. In contrast, we use embeddings derived from SDP formulations and use Hopfield networks to solve combinatorial optimization problems using the SDP embeddings. Furthermore, in graph embedding methods, the learning of the embeddings of the graphs has its own objective, which may not rely on the optimization problems to be solved. In contrast, the SDP embeddings are problem-dependent and capture global combinatorial information about the problem being solved.

\subsection{Machine Learning and SDP Gram matrix}
In recent years, neural networks have been used to approximate SDP Gram matrix computation (see e.g.,~\cite{YLKXJ23,Kriv2021,WDWK19,Baltean19,Cheng09}). In contrast, there is little work in using SDPs for learning to solve problems. Some examples in this direction are the use of SDPs for designing semi-supervised SVMs~\cite{YW14}, the use of SDP as a lower bound in a branch and bound algorithm for an unsupervised minimum sum-of-squares clustering~\cite{PRS22}, the use of a low-rank SDP for probabilistic inference in pairwise Markov Random Fields~\cite{PWK20} and community detection~\cite{WK20}. There are even fewer examples of work that uses SDPs in a learning framework for effectively solving combinatorial optimization problems. One example is the use of SDP to learn the Lovasz-$\Theta$ function and then use that to find planted cliques in random graphs~\cite{JMBD12}. In contrast, we are investigating a general machine-learning framework that can use SDPs to solve combinatorial optimization problems.

\section{Rounding SDP solutions using Hopfield Networks}
In each of the three problems we study in this paper, the corresponding SDP returns an $n$-dimensional embedding {$\bm \sigma_{\mathbf{v}}$} for each vertex $v$ in the graph. Most algorithms based on SDPs do not directly use these vectors~\footnote{The solutions are often rotation invariant i.e., applying the same rotation to all the vectors in a solution yields an equally good solution to the SDP.} and instead use the Gram matrix of pairwise dot products of these vectors. This is what we also do in this paper. The pairwise nature of the information extracted from the SDP naturally suggests the use of Hopfield networks in which the interaction between any pair of neurons is a function of the corresponding Gram matrix entry. 
The actual function used depends on the problem. 
Similarly, the bias and initial output used for each neuron, as well as the activation function, is problem-dependent. 
We use the Hopfield network to find a rounded solution as follows. We start by setting the output of each neuron to its initial value and update the outputs of each of the neurons (synchronously/asynchronously) until either a threshold number of rounds is exceeded or the outputs of all of the neurons have converged. At this point, we round each output by setting it either to the closest rounded value or to one of the values probabilistically (the closer the rounded value, the higher the probability of rounding to that value).     

To reiterate, we first obtain the correlation between two variables as a function of the corresponding Gram matrix entry in a problem-dependent way. After this, we use the Hopfield network as a mechanism to round the variables while trying to respect the pairwise correlations. This is somewhat akin to how graph neural networks are used for node classification except that in a Hopfield network, the underlying network is the complete graph. 
Our pipeline is illustrated in Figure~\ref{fig:schematic}.

\begin{figure}[!ht]
\begin{center}
\includegraphics[width=.9\linewidth]{images/Untitled presentation-4.png}
\caption{A schematic diagram showing our framework. \label{fig:schematic}}
\end{center}
\end{figure}

In the following subsections, we describe simple functions for each of the three problems and show that they yield good empirical results.  The particular hand-designed functions in this section are not the main point. The primary objective is to show that simple functions suffice and, therefore, an appropriate function can be efficiently learned from data. This is particularly useful for new problems where it may not be easy to hand-design good functions. 
We discuss how the function can be learnt from data in the next section. 

{\em Remark.} Note that updates in the Hopfield networks with all pairwise edges takes $\Theta(n^2)$ time in every round. This can be improved by doing approximate computations using matrix sketching~\cite{DKM06}. We do not discuss this in this paper since our focus is on evaluating the quality of solutions obtained using our framework. For the problem sizes we consider, this is not the bottleneck since SDP computations dominate the running time. 
However, for scaling to larger instances, matrix sketching-based optimizations along with fast approximate SDP solvers will be necessary. 

\subsection{Hopfield Networks for Max-Cut.}
\label{sec:hop_cut}
In this problem, we would like each neuron of the Hopfield network to output a number in the range $[0,1]$ from which we create a cut by taking one side to be the vertices whose corresponding neurons have output at most $0.5$ and taking the other side to be the remaining vertices. Given this, we use the sigmoid function as the activation function for the neurons. The outputs of the Hopfield network are rounded to the nearest binary output to obtain a binary solution. Given the symmetry between the sides in the cut (flipping the sides yields the same cut), we set the bias for each neuron to $0$.

In the formulation of the SDP for Max-Cut, the quantity $c_{uv} := (1-\bm \sigma_u \cdot \bm \sigma_v)/2$ is the coefficient of $w_{uv}$ in the objective function. If $c_{uv}$ is large, we would like $u$ and $v$ to be separated by the cut, and otherwise, we would like them to be on the same side of the cut. It is thus natural to use a decreasing function of $c_{uv}$, i.e., an increasing function of $\bm \sigma_u \cdot \bm \sigma_v$ as the interaction between the neurons corresponding to the vertices $u$ and $v$. One obvious option is to use $\bm \sigma_u \cdot \bm \sigma_v$ as the interaction. Another option motivated by the Goemans-Williamson algorithm~\cite{GW95} is to use $2p_{uv}-1$ as the interaction 
where 
$p_{uv} = \arccos ( \bm \sigma_u \cdot \bm \sigma_v)/\pi$ is the probability that ${\bm \sigma_u}$ and $\bm \sigma_v$
are separated by a random hyperplane through the origin. The reason for choosing $2p_{uv}-1$ as the interaction is that we want the range to be $[-1, +1]$.
The initial outputs of the neurons are chosen from $[0, 1]$ uniformly at random. Denoting the output of the neuron corresponding to vertex $u$ by $z_u$, we update $z_u$ to $A \left(\sum_{v \ne u} \bm{\mathit{a}}_{uv} z_v\right)$ where $\bm{\mathit{a}}_{uv}$ denotes the interaction between the neurons corresponding to vertices $u$ and $v$ and $A$ is activation function - in this case, the sigmoid function. As mentioned before, we repeat the updates until the outputs converge or the number of rounds of updates exceeds a threshold. 


{\bf Experimental results.} We compared the performance of our approach with that of the Goemans-Williamson algorithm on SF-295~\cite{Yan08}, a collection of graphs representing small molecules recording cancer screening (4026 instances with 31 nodes and 33 edges on average), and Twitter Snap~\cite{snap14} graph datasets (97 instances with 130 nodes and 1421 edges on average) along with a custom dataset. The custom dataset consists of 1000 Erdős-Rényi random graphs, each containing 128 nodes with a probability of edge existing between a node pair of 0.05. In addition, we inserted a Max-Cut between a random, even partition of nodes in the graph. To do this, we consider all node pairs with a node on either side of the partition and insert an edge between them with a probability of 0.15. We carefully remove random edges from both partitions to ensure that the degree of nodes has a similar distribution after the planted cuts.

Table~\ref{tab:cutresults} presents the result of this comparison. For the Goemans-Williamson algorithm (referred to as the GW Algorithm in the table), we use 5 random planes to generate cuts and take the best result from the 5 cuts. We also run our Hopfield Network based approach 5 times and take the best cut. Table~\ref{tab:cutresults} shows the mean optimality ratio (i.e., size of generated cut/optimal cut) and the standard deviation over all graphs in the different datasets. Throughout these experiments, the Hopfield network always converged to a stable state in a handful of iterations and didn't require to be terminated after a fixed number of steps. As can be seen in Table~\ref{tab:cutresults}, our approach based on the Hopfield network produces cuts that are significantly better than generating random cuts for each graph, implying that Hopfield networks can decode the cut information contained in SDP vectors. Surprisingly, we find that it even returns better cuts than the classical Goemans-Williamson approximation algorithm that has provable guarantees on the cut size. Outperforming the well-studied Goemans-Williamson approximation algorithm on various graph datasets clearly outlines the potential of our Hopfield network-based approach. We also show in Table~\ref{tab:cutresults} that the mean time to pre-process the Hopfield network edges, computing the weights (excluding the SDP computation time) and for the Hopfield network to converge to a solution is very small (less than a second).

In addition, we investigate how our framework performs as the size of the instances increases. To do this, we generated an additional 25 Erdős-Rényi graphs in the same manner as before, except that we generated them for sizes varying from 16 nodes to 512 nodes. We refer to them as our custom-cut instances. The same experiment as before was carried out with results displayed in Table~\ref{tab:cutsize}. Once again, we observe that the cuts produced by the Hopfield network have a better mean optimality ratio compared to the Goemans-Williamson approximation algorithm. We also compare it with taking a random cut and note that random cuts have a considerably poor mean optimality ratio and the usage of SDP vectors is indeed crucial to obtaining good cuts on these problem instances. 

\begin{table}[!ht]
    \centering
    \begin{tabular}{||c|c|c|c||}
    \hline
         &  SF-295 &  Twitter &  Custom Cut \\
    \hline
        Hopfield Network & 0.998($\pm$0.005) & 0.993($\pm$0.006)  & 0.998($\pm$0.002) \\
    \hline
        GW Algorithm & 0.943($\pm$0.073)& 0.937(0.041) & 0.897($\pm$0.051) \\
    \hline
        Random & 0.621($\pm$0.071)& 0.797($\pm$0.044)  &  0.686($\pm$0.019)\\
    \hline
    \hline
        SDP Runtime
        &
        0.745($\pm$2.173
        ) &
        9.001($\pm$8.709
        ) &
        3.961($\pm$0.823
        ) \\
    \hline
        Hopfield Network Runtime & 0.014($\pm$0.015) & 0.256($\pm$0.192)  & 0.203($\pm$0.006) \\
    \hline
        GW Algorithm Runtime & 0.0002($\pm$0.0001) & 0.002($\pm$0.001)  & 0.0017($\pm$0.0028) \\
    \hline
    \end{tabular}
    \caption{Mean optimality ratio for Max-Cut and runtimes in seconds}

    \label{tab:cutresults}
\end{table}

\begin{table}[!ht]
    \centering
    \begin{tabular}{|| c | c | c| c| c||}
    \hline
        Cut \# Nodes  &  Hopfield & Hopfield Runtime &   GW &  Random \\
    \hline
        16 & 0.998($\pm$0.006) &  0.002($\pm$0.001) & 0.951($\pm$0.045) & 0.782($\pm$0.060) \\
    \hline
        32 & 0.998($\pm$0.005) & 0.008($\pm$0.0006) & 0.935($\pm$0.047) & 0.679($\pm$0.065) \\
    \hline
        64 & 0.993($\pm$0.007)&  0.036(0.001) & 0.908($\pm$0.049) & 0.678($\pm$0.029)  \\
    \hline
        128 & 0.998($\pm$0.002)&  0.203($\pm$0.006) & 0.894($\pm$0.047) & 0.683($\pm$0.020)  \\
    \hline
        256 & 1.000($\pm$0.000)&  0.587($\pm$0.032) & 0.923($\pm$0.060) & 0.674($\pm$0.009)  \\
    \hline
        512 & 1.000($\pm$0.000)& 2.356($\pm$0.112) & 0.918($\pm$0.098) & 0.672($\pm$0.003)  \\
    \hline
    \end{tabular}
    \caption{Mean optimality ratio for varying size Erdős-Rényi Max-Cut instances}

    \label{tab:cutsize}
\end{table}

Table~\ref{tab:cutsize} also shows that for these graph sizes, the running time remains quite small (around 2.4 seconds for 512 node Erdős-Rényi graphs). The running time scales roughly quadratically with the input size (Around 1024 times increase as the number of nodes increases by a multiplicative factor of 32 from 16 to 512). This is in line with what we would expect theoretically --  Hopfield networks with all pairwise edges take $\Theta(n^2)$ time in every round. As discussed before, this can be improved by doing approximate computations using matrix sketching~\cite{DKM06}.

\subsection{Hopfield Networks for Max-Clique and Graph Coloring.}
We present the details of the Hopfield networks for the Max-Clique and Graph coloring problems in Appendix~\ref{sec:hop-clique} and ~\ref{sec:hop-coloring}, respectively. Our experimental results on various benchmark instances show that the SDP rounding based on Hopfield network finds near-optimal solutions in very little runtime. Our approach compares favourably to the Erdős Goes Neural~\cite{KL20} GNN architecture as well as a greedy heuristic for the Max-Clique problem.


\section{Learning the Hopfield Network parameters}
In this section, we show that instead of hand-crafting the weights of the Hopfield network, we can express the interaction between a pair of neurons as a parameterized function of the corresponding Gram matrix entry and learn the parameters from the data (as illustrated in Figure~\ref{fig:schematic2}). For instance, we can train a neural network that learns this function separately for each combinatorial optimization problem.
We experimentally demonstrate the efficacy of the idea for the Max-Cut problem. A similar approach can be used for other problems. 

\begin{figure}[!ht]
\begin{center}
\includegraphics[width=.9\linewidth]{images/Untitled presentation-5.png}
\caption{A schematic diagram showing our framework where a small neural network is used to learn the interaction weights of the Hopfield network. \label{fig:schematic2}}
\end{center}
\end{figure}

While the input graph $G=(V,E)$ can be sparse, the Hopfield network works on a complete graph $G' = (V,E')$ with $E \subseteq E'$. We need to learn the function governing the interaction weights $\bm{\mathit{a}}_{uv}$ for all edges $(u,v)$ in the Hopfield network. The input to the neural network that learns the weight function consists of the corresponding Gram matrix value $X_{uv}$ together with some polynomial terms $X_{uv}^2, X_{uv}^3$ etc. In addition, we also input an indicator variable corresponding to whether or not an edge in the Hopfield network is in the input graph or not.

We train a small, dense neural network to compute an interaction weight for the Hopfield network edges. For this, we need to design an appropriate loss function. For each vertex $u$, let $z_u$ be the output corresponding to $u$.  If the $z_u's$ were assumed to be in $\{0,1\}$, finding the max-cut is equivalent to maximizing $\sum_{(u,v) \in E} w_{uv} (z_u - z_v)^2$. 
Accordingly, we use the negative of the above sum as the loss function. Note that here we are using the "raw" outputs of the nodes in the Hopfield network and not rounding them since we want the loss function to be differentiable. 

{\em Remark.} The max-cut problem is particularly simple since all solutions are feasible. In problems with constraints, one needs to construct the loss function carefully. This is in general problem dependent but a generic method is to use a Lagrangian relaxation. For instance, in the max-clique problem, the loss function that we used is $\sum_{v \in V} z_v - \lambda \sum_{(u,v) \notin E} z_u z_v$ where $\lambda$ is a parameter which in our experiments was set to $1$.
We backpropagate the loss through each iteration of the Hopfield network - this is similar to backpropagation through time in Recurrent Neural Networks - 
yielding the gradient with respect to each edge weight in our Hopfield network. Since these edge weights are themselves the output of a neural network, we further backpropagate through that network to obtain the gradient of the loss with respect to the parameters in that network. 

\textbf{Experimental Results.} To evaluate this approach, we consider the Max-Cut Problem and select a subset of 1000 graphs from the SF-295 dataset. We perform a 70/30 train/test split on this dataset and train a small neural network for 50 epochs with a learning rate of $10^{-3}$ using the ADAM optimiser with the default parameters in Pytorch. The model consists of three dense layers, with a single hidden layer with a width of 6 neurons and $tanh$ activation functions. 

Similarly, for the Max-Clique problem, we perform a 70/30 train/test split on a subset of 1000 graphs from the IMDB-Binary graph dataset and train a small neural network for 50 epochs. The only difference is that the learning rate of $10^{-4}$ is used in ADAM optimiser instead of $10^{-3}$. The learning rate needs to be carefully tuned as there is a trade-off between the number of instances for which the Hopfield network returns cliques of sub-optimal size and the number of instances where the Hopfield network produces a superset of the optimal clique (which itself is not a clique and therefore an invalid solution). A large learning rate tends to cause the optimizer to jump between the two extremes. The hidden layer, in this case, had 10 neurons.  

Figure~\ref{fig:learn_ratio} shows that for both these problems, after only a few epochs, the optimality ratio associated with the learnt weight function on the Hopfield network edges reaches above 0.9. Thus, a good weight function for this instance distribution could be learnt in only a few iterations. 

\begin{figure}[!ht]
    \begin{center}
        \includesvg[width=.48\linewidth,
  height=4cm]{images/LearntLossCut.svg}
        \includesvg[width=.48\linewidth,
  height=4cm]{images/LearntLossClique.svg} 
    \end{center}
\caption{Optimality ratio associated with learning the weight function of the Hopfield network using SDP during training. The left plot shows the progress for the Max-Cut problem on the SF-295 training dataset, and the right plot shows the progress for the Max-Clique problem on the IMDB-Binary training dataset. \label{fig:learn_ratio}}
\end{figure}


After training the network, we compute the mean optimality ratio and standard deviation in a similar manner as our experiments in Section~\ref{sec:hop_cut}. Even with learnt edge weights (instead of manually selected edge weight in Section~\ref{sec:hop_cut}) using a small neural network, for Max-Cut our approach returned a cut with the mean optimality ratio of $0.979(\pm0.105)$ on the 300 test graphs of the SF-295 dataset. For Max-Clique, our approach returned a clique with a mean optimality ratio of $1.0(\pm0.0)$ for valid cliques, and for only two instances, the network failed to produce a valid clique. This shows that the neural network converged towards learning a function that produced edge weights with comparable performance to our manually selected edge weight function. This implies that the design of the transformation function can be automated through the use of a small dense neural network. 

In Appendix~\ref{app:generalization}, we show that our learning models for Max-Cut and Max-Clique generalizes well across distributions and larger size instances. Our models achieves near optimal solutions on graph classes that are very different in size and density from the classes on which they were trained.

We observed that the learnt model for Max-Clique is quite different from the carefully selected function presented in Appendix~\ref{sec:hop-clique}. Specifically, it puts less emphasis on the connection to the dummy vertex. The fact that the learning technique has converged to a different function for Max-Clique on this graph class and that this function was learnt from a space that generalizes the carefully designed weight function suggests that the learnt function minimizes the loss on the training instances more than the handcrafted weight function.

\section{Using approximate Low Rank SDP solutions}
A potential criticism of our approach is that it relies on a computationally expensive step of calculating SDP vectors. We ran experiments to check the sensitivity of the rounding algorithm to the accuracy of the SDP solutions and found that the algorithm is quite robust to noise. The results are presented in Appendix~\ref{app:sensitivity}. Motivated by these results, we tried our approach on large instances of Max-Cut in which instead of computing the optimal SDP solution (which would be prohibitive), we used low-rank SDP solutions obtained using the mixing method~\cite{WANG18} -  a simple and fast algorithm based on coordinate descent. Since for such large instances, we cannot compute the optimal solution in a reasonable amount of time, we compare our results with Breakout Local Search (BLS)~\cite{BEN13}, one of the top heuristics for this problem. The rank of the solutions is a tunable parameter that provides a trade-off between the quality of the solution and the speed of computation. We used $\sqrt{2n}$ as the rank in line with the recommendation by Wang et al.~\cite{WANG18}. We use the default  hyperparameters for BLS from Benlic and Hao~\cite{BEN13} since they use the same benchmarking dataset and had used those hyperparameters for that dataset.

Our Max-Cut instances were from the Gset graph dataset ~\cite{GSET}, with graphs having between 800 and 10,000 nodes. Table ~\ref{tab:cut_approx_sdp} shows that our approach based on Hopfield Networks obtains solutions that are nearly as good as those obtained by BLS~\cite{BEN13} but is significantly faster, especially for large instances. For instance, for the graph G70 with 10,000 nodes, our approach finishes in around a minute with a Max-Cut size that is only 0.6\% less than that of BLS, whereas BLS takes over 3 hours. 

To avoid the $O(n^2)$ storage requirement of the Gram matrix, we can approximately store the entries via a low-rank approximation. We know that there exist vectors $\bm v_1, ..., \bm v_n \in R^n$ s.t. the entry $G_{ij}$ of the Gram matrix is $\bm v_i \cdot \bm v_j$. Given these vectors, we could use dimension reduction to reduce their dimension to $O(\log n)$ while approximately preserving dot products. We could also use SDP solvers that can return approximate solutions where the vectors are of dimension $d \ll n$.

\begin{table}[!ht]
    \centering
    \begin{tabular}{|| c | c | c | c | c | c | c||}
    \hline
        &  $|V|$ &  SDP  & Hopfield  & Hopfield & BLS & BLS \\
        &  & Time & Obj.  & Time & Obj. & Time \\
    \hline
        G1 & 800 & 1.601 & 11450 & 1.553 & 11624 & 13 \\
    \hline
        G14 & 800 & 0.641 & 2970 & 0.91 & 3064 & 119 \\
    \hline
        G15 & 800 & 0.610 & 2977 & 0.92 & 3050 & 43 \\
    \hline
        G22 & 2000 & 2.203 & 13000 & 4.605 & 13359 & 560 \\
    \hline
        G23 & 2000 & 2.376 & 12973 & 4.698 & 13354 & 278 \\
    \hline
        G24 & 2000 & 2.161 & 13024 & 4.669 & 13337 & 311 \\
    \hline
        G35 & 2000 & 1.641 & 7403 & 4.276 & 7684 & 442 \\
    \hline
        G36 & 2000 & 1.561 & 7401 & 4.254 & 7677 & 604 \\
    \hline
        G37 & 2000 & 1.582 & 7407 & 4.21 & 7689 & 444 \\
    \hline
        G45 & 1000 & 1.016 & 6482 & 1.486 & 6554 & 104 \\
    \hline
        G53 & 1000 &  0.829 & 3732 & 1.308 & 3850 & 117 \\
    \hline
        G54 & 1000 & 0.813 & 3740 & 1.285 & 3852 & 131 \\
    \hline
        G55 & 5000 & 3.082 & 9903 & 4.34 & 10294 & 842 \\
    \hline
        G58 & 5000 & 4.317 & 18539 & 22.577 & 19263 & 1354 \\
    \hline
        G60 & 7000 & 4.471 & 13631 & 40.916 & 14176 & 2822 \\
    \hline
        G70 & 10000 & 4.572 & 9485 & 61.805 & 9541 & 11365 \\
    \hline
    \end{tabular}
    \caption{Comparison of the Max-Cut size and runtimes in seconds of the low-rank SDP solution rounded using our Hopfield network approach with Breakout Local Search (referred BLS)}
    \label{tab:cut_approx_sdp}
\end{table}

%\vspace*{-0.75cm}
\section{Discussion}
We have shown that a simple Hopfield network with appropriate interaction weights based on SDP embeddings can obtain near-optimal solutions for practical instances of the classical combinatorial optimization problems of Max-Cut, Max-Clique and Graph Colouring obtained from various benchmark datasets. Furthermore, we show that the appropriate problem-dependent interaction weights can be learnt efficiently using a small neural network. 

Our approach has two parts -- computing a low-rank SDP solution and then rounding the solution using a Hopfield network. Yu et al.~\cite{YLKXJ23} have recently shown that a low-rank SDP vector can be computed using a graph neural network (GNN). Similarly, for the second part, while we have used a Hopfield network, a more scalable and general approach would be to use GNNs initialized with the SDP solution vectors of a fixed rank. 
The final embedding obtained by the GNN can then be processed by another neural network to output a decision for each node (e.g., whether it is part of the solution, the colour of the node, etc.). While this approach sounds natural and appealing, it does not seem easy to make it work in practice. An interesting open problem is to design a flexible architecture based on GNNs that can integrate existing algorithmic tools like SDPs. 

\textbf{Code availability:} Our code is available at \url{https://anonymous.4open.science/r/SDP-Hopfield-645D/}


