# Hitting Time Estimation

This repository contains code for the research project: "Estimating Hitting Times Locally at Scale", accepted for presentation at NeurIPS 2025.
[Click here to view the paper](https://neurips.cc/virtual/2025/poster/115502)

## Overview

This project implements multiple algorithms for estimating hitting times locally in undirected graphs. The **hitting time** from node `u` to node `v` is the expected number of steps a random walker starting at `u` takes to reach node `v` for the first time.

### Implemented Algorithms

#### Hitting Time Algorithms

1. **Exact Algorithm** (`exact`/`exact-sparse`)
   - Solves a linear system to compute exact hitting times
   - Uses sparse matrix operations for efficiency on large graphs
   - Best for: Small to medium graphs where exact values are needed

2. **Meeting Time Algorithm** (`local-delete`)
   - Efficiently estimates hitting times using collision detection between random walks
   - Deletes walks that collide to reduce computation
   - Best for: Large graphs where approximate values are acceptable
   - Based on the local estimation approach

3. **Cutoff Algorithm** (`cutoff`)
   - Estimates hitting times using random walks up to a maximum length
   - Counts visits to target nodes
   - Best for: Graphs with relatively short hitting times

4. **Sampling Algorithm** (`sampling`)
   - Simple Monte Carlo approach: run random walks until they hit the target
   - Best for: Baseline comparisons, simple use cases

#### Effective Resistance Algorithms

- **Exact Effective Resistance**: Computed using the pseudoinverse of the Laplacian matrix
- **Local Effective Resistance**: Estimated using random walks with optional collision deletion

## Installation

```bash
pip install -r requirements.txt
```

### Requirements
- Python 3.7+
- networkx
- numpy
- pandas
- matplotlib
- scipy
- numba
- joblib
- datasets (HuggingFace)

## Quick Start

### Basic Usage

```python
import networkx as nx
import ht

# Create a graph
G = nx.karate_club_graph()

# Compute exact hitting time from node 0 to node 33
u, v = 0, 33
hitting_time = ht.exact_ht(G, u, v, sparse=True)
print(f"Hitting time from {u} to {v}: {hitting_time:.2f}")

# Estimate hitting time using the meeting time algorithm
h_est, num_samples, walltime = ht.estimate_local_ht(
    G, u, v, 
    num_random_walks=1000, 
    max_len=10000
)
print(f"Estimated hitting time: {h_est:.2f}")
print(f"Computation time: {walltime:.4f}s")
```

### Running Experiments

The project includes several pre-configured experiment scripts:

```bash
# Run hitting time experiments on different graph types
python run_hitting_times.py er    # Erdős-Rényi graphs
python run_hitting_times.py ba    # Barabási-Albert graphs  
python run_hitting_times.py com   # Community structure graphs
python run_hitting_times.py fb-sm # Facebook (small)
python run_hitting_times.py fb    # Facebook (large)
```

### Custom Experiments

```python
from experiments_hitting_time import test_hitting_time
import networks

# Create a graph
G = networks.get_graph(
    graph_type="barabasi-albert",
    n=1000,
    barabasi_albert_m=10,
    seed=42
)

# Test different algorithms
result = test_hitting_time(
    graph_type="barabasi-albert",
    n=1000,
    barabasi_albert_m=10,
    node_strategy="random",
    algorithm="local-delete",
    num_random_walks=1000,
    max_len=10000,
    seed=42
)

print(f"True hitting time: {result['true_hitting_time']:.2f}")
print(f"Estimated hitting time: {result['hitting_time']:.2f}")
print(f"Relative error: {result['relative_error']:.4f}")
print(f"Computation time: {result['time']:.4f}s")
```

## File Structure

### Core Modules

- **`ht.py`**: Main hitting time algorithms
  - `exact_ht()`: Exact computation via linear system
  - `estimate_local_ht()`: Meeting time algorithm with collision deletion
  - `ht_via_cutoff()`: Cutoff-based estimation
  - `sampling_ht()`: Simple sampling approach

- **`eff_res.py`**: Effective resistance algorithms
  - `exact_eff_res()`: Exact computation via Laplacian pseudoinverse
  - `estimate_local_eff_res()`: Local estimation with random walks

- **`networks.py`**: Graph generation utilities
  - Supports various graph types: Erdős-Rényi, Barabási-Albert, communities, grid, real-world networks
  - Loads real-world datasets: football, Facebook, Twitter

- **`random_walk.py`**: Random walk generator utility

- **`utils.py`**: Utility functions for plotting, caching, and dataset management

### Experiment Modules

- **`experiments_hitting_time.py`**: Core functions for running and visualizing hitting time experiments
  - `test_hitting_time()`: Run a single hitting time test
  - `plot_hitting_time()`, `plot_hitting_time2()`: Plotting functions with customizable metrics
  - `plot_hitting_time_distr()`: Distribution analysis across node pairs
  - `print_hitting_time_info()`: Statistical summaries and LaTeX output

- **`experiments_eff_res.py`**: Functions for effective resistance experiments
  - `test_eff_res()`: Run effective resistance tests
  - `plot_eff_res()`: Visualization of effective resistance results

### Executable Scripts

- **`run_hitting_times.py`**: Main CLI script for running hitting time experiments on different graph types (Erdős-Rényi, Barabási-Albert, Communities, Facebook networks)
- **`run_eff_res.py`**: Script for running effective resistance experiments

**Note:** You can create custom experiment scripts by importing from `experiments_hitting_time` and `experiments_eff_res` modules and using the utility functions provided.

## Supported Graph Types

### Synthetic Graphs
- **Erdős-Rényi** (`erdos-renyi`): Random graphs with edge probability `p`
- **Barabási-Albert** (`barabasi-albert`): Scale-free graphs with preferential attachment
- **Communities** (`communities`): Stochastic block model with community structure
- **Grid** (`grid`): 2D grid graphs
- **Lollipop** (`lollipop`): Lollipop graphs (complete graph + path)
- **Karate Club** (`karate-club`): Zachary's karate club network

### Real-World Networks
- **Football** (`football`): American college football network
- **Facebook** (`facebook`, `facebook-small`): Facebook social networks
- **Twitter** (`twitter`): Twitter social network

## Algorithm Parameters

### `estimate_local_ht(G, u, v, num_random_walks=10000, max_len=100000, verbose=False)`

- `G`: NetworkX graph
- `u`: Source node
- `v`: Target node  
- `num_random_walks`: Number of random walks to simulate (higher = more accurate)
- `max_len`: Maximum length of each random walk
- `verbose`: Print progress information

**Returns:** `(hitting_time, num_samples, walltime)`

### `exact_ht(G, u, v, sparse=False)`

- `G`: NetworkX graph
- `u`: Source node
- `v`: Target node
- `sparse`: Use sparse matrix operations (recommended for large graphs)

**Returns:** Exact hitting time value

### `sampling_ht(G, u, v, num_random_walks=10000)`

- `G`: NetworkX graph
- `u`: Source node
- `v`: Target node
- `num_random_walks`: Number of random walks to average

**Returns:** `(hitting_time, num_samples, walltime)`

## Node Selection Strategies

When running experiments, you can specify how to select node pairs:

- `first-last`: Select first and last nodes
- `first-second`: Select first two nodes
- `uniform`: Uniformly random node pairs
- `deg-prod-prop`: Sample proportional to degree product
- `deg-prod-invprop`: Sample inverse proportional to degree product
- `pagerank-prod-prop`: Sample proportional to PageRank product
- `pagerank-prod-invprop`: Sample inverse proportional to PageRank product

## Performance Considerations

- **Exact algorithms** scale poorly with graph size (O(n³) for dense graphs)
- **Meeting time algorithm** is efficient for large graphs with moderate hitting times
- **Cutoff algorithm** works well when hitting times are short relative to graph size
- **Sampling algorithm** can be slow for large hitting times

For large graphs (>10,000 nodes), use:
1. `exact_ht()` with `sparse=True` for exact values
2. `estimate_local_ht()` for fast approximations

## Datasets

The `datasets/` directory contains several network datasets:
- `football.gml`: American college football network
- `facebook_combined.txt`: Facebook social network
- `twitter_combined.txt`: Twitter social network  
- `1912.edges`: Smaller Facebook network

## Citation

If you use this code in your research, please cite our paper!
