Supplemental material for the paper "Bridging ML and algorithms: comparison of hyperbolic embeddings"

Projects included:
==================

hyperbolic-embedder: from https://bitbucket.org/HaiZhung/hyperbolic-embedder/overview (commit 3ade6d7d67188b0ab82949397ea5da62e4d9c845, 2018-05-02)
poincare-embeddings: from https://github.com/facebookresearch/poincare-embeddings (commit ff1d846db3a64a759e56173d7846c164a37654f9, 2021-09-16)
hyperrogue/DHRG: from https://github.com/zenorogue/hyperrogue (commit 5a33967711b017c1453d108ffeeb18d1cf912c6d, 2023-04-01)
mercator: from https://github.com/networkgeometry/mercator (commit a5dd4a05f4d77f92c32ee7750efd450cee0d3014, 2022-06-21)
TreeRep: from https://github.com/rsonthal/TreeRep (commit 8ed4d830b5d0da41aeecf786d5be650ed75b8d59, 2023-06-22)
HyperbolicTiling_Learning: from https://github.com/ydtydr/HyperbolicTiling_Learning (commit c77f0d1a1b32ed5437a59d7cdeb8426ff03ea70b, 2020-03-19)
hypviewer: from https://graphics.stanford.edu/~munzner/h3/download.html (not git -- last modified in 2003)

See `diffs` for the changes from the original commits listed above. We have done: (also some files included in the repo such as datasets and helper tools are not removed)

In hyperbolic-embedder:
- fix a compilation error on newer C++

In poincare-embeddings:
- add a CLI option `-initial` (not actually used in the final paper)
- create kx-evaluate.py, mostly to evaluate embeddings using MAP
  (including BFKL embedding, but as mentioned in the paper, it did not work due to numerical precision errors) and export embeddings to a format recognized by DHRG
- create wordnets/transitive_closure_verb.py to export the verb hierarchy by analogy to wordnets/transitive_closure.py
- an option to change the seed via the `SEED` environmental variable (not discussed in the paper)

In hyperrogue/dhrg:
- some irrelevant files (e.g., music) were removed
- create maprank.cpp, which is a computation of mAP and MeanRank
- create compare.cpp for various analyses (load distance tables, Poincare 2D and 3D embeddings)
- various minor changes to access necessary tools (access compute-map.cpp and dhrg/routing via commandline, access landscape from dhrg, etc.)
- code to simplify the visualization output, and some fixes to visualization

In HyperbolicTiling_Learning:
- remove HalfspaceManifold which was referred to but seems not actually present in the repo; also train-grqc.py refers to `group_rie` which is not available
- added an option to produce a table of distances that can be analyzed using other tools (specifically we use dhrg)

In TreeRep:
- implemented `experiment.jl` which runs the experiments and outputs 

Datasets included: (graphs/*/graph-orig.txt)
astroph, condmat, grqc, hepph, facebook: from http://snap.stanford.edu/data/
brain maps data: https://github.com/networkgeometry/navigable_brain_maps_data

Our setup and compute
=====================

Hardware:
[1] Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz, NVIDIA GeForce GTX 1060 6GB/PCIe/SSE2
[2] 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz, OpenGL renderer string: NVIDIA RTX A3000 Laptop GPU/PCIe/SSE2

Software: Arch Linux, g++ 12.2.1

The times reported in the paper have been obtained on [1]. Some experiments have been run on [2].

How to reproduce:
=================

Note: scripts are designed to be called from the main directory. (e.g. `bash scripts/compile-all.sh` not `cd scripts; bash compile-all.sh`)

- create and activate the poincare environment, as described in poincare-embeddings/README.org
- compile all included projects (`bash scripts/compile-all.sh`)
- convert graphs/*/graph-orig.txt to the correct formats graphs/*/graph.txt and graphs/*/graph.csv (`bash scripts/read-networks.sh`)
- create WordNet hierarchies and convert them to the correct formats (`bash scripts/hierarchies.sh`)
- perform the experiments (for full experiments: `bash scripts/process.sh abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRS name`, replacing `name` with every graph in graphs; you might want less steps)
- build the tables `tables/*.tex` using `bash scripts/generate-rw-table.sh`
- compute the experiments on simulated networks: `bash scripts/simulate.sh`
- build the CSV data `tables/for-table.csv` using `bash scripts/generate-for-table.sh`
- compute the precise BFKL time data `tables/precise-times.csv` using `bash scripts/compute-precise-simulated-times.sh`
- scripts/analysis.R was used to create the graphs and statistical analysis

Explanation of files:
=====================

In graphs/[graph name]:
- `graph-orig.txt`: original data from the source
- `graph.txt`: data in the BFKL format
- `graph.csv`: data in the poincare-embeddings format
- `notrans.txt`: for hierarchies, edges without transitive closure (created by `scripts/create-notrans.sh`, used by the visualizer `scripts/visualize.sh`)

In results/[graph name]:
- `log.txt` logs all times
- `log-*.txt` contain the output of various steps
- `*.bin` and `*.bin.best` are embeddings obtained from poincare-embeddings
- `*-coordinates.txt` and `*/*-links.txt` are various embeddings
- `*-dhrg.txt` are these embeddings improved by BFKL
