# Datasets
Data are available at the following link: https://drive.google.com/file/d/1no3iE-T1KPsFlyC0EaThWgiQZ992bHdX/view?usp=share_link

We evaluate our methods on two primary datasets:

1.  **TSPLIB ($\mathcal{T}$):** Real-world Euclidean 2D TSP benchmark instances from [TSPLIB](http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/) with up to 1,300 nodes.
2.  **Uniform Random ($\mathcal{U}$):** Synthetic Euclidean 2D instances with node counts $n \in \{20, 50, 100, 200, 300, 500, 1000\}$. For each instance size $n$, we generate node coordinates by sampling integers uniformly from the range $[0, 2n]$ until $n$ unique positions are obtained.

All instances are solved to optimality using the [Concorde TSP Solver](https://www.math.uwaterloo.ca/tsp/concorde.html) to provide ground-truth tours and optimal lengths.

## Data Format

Instances are stored as pickled `networkx.Graph` objects in the following directory structure:
- `data/tsp_uniform/`
- `data/tsplib/`

Files are named `{n}_{s}.pkl`, where `n` is the number of nodes and `s` is the sample index (e.g., `50_001.pkl`). TSPLIB files are named according to their original TSPLIB identifiers (e.g., `berlin52.pkl`).

## Graph Attributes

Each `nx.Graph` is a complete undirected graph with the following attributes:

```text
[NODE] coord : the node coordinates
[NODE] opt_tour : an integer indicating the position of the node in an optimal tour. Two different flags because I need them for using Hudson et. al functions
[EDGE] weight :  C[i, j] is the (Euclidean, L2) distance between node coords[i] and node coords[j], rounded according to the Concorde TSP solver: http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/tsp95.pdf
[EDGE] weight_norm: C[i, j] / max_{i, j} C[i, j]
[EDGE] rounded_weigh: round(C[i, j]) to solve the instances with concorde
[EDGE] opt_tour/in_solution : an index indicating if the edge is in the optimal tour
[EDGE] GNNAR : probability matrix as obtained using the code of Joshi et al. It's a tuple. Given i < j, it contains (P(i, j), P(j, i))
[EDGE] GNNGLS : probability as obtained by transforming the regret of Hudson et al. via max(1-n*regret,0)
[EDGE] soft_dist : probability matrix as obtained using https://arxiv.org/pdf/2406.03503. As weights, we used the *normalized* version
[EDGE] LP: Value of the LP relaxaion. We only have this when n <= 300. 
[EDGE] regret_pred: The regret of the edge as predicted by Hudson et al. 
[EDGE] features: Feature vector as needed by Hudson et al. 
[EDGE] DIFUSCO: probability as obtained by Sun et al., it is a tuple (P(i, j), P(j, i)), i < j 
```