================================================================================
README.txt - Matrix Approximation Algorithms and Bounds Analysis (Figure 4)
================================================================================

### PROJECT TITLE
Matrix Approximation Algorithms and Bounds Analysis: Investigating Performance and Impact of Matrix Structure (rho_G)

### OBJECTIVES
This project implements and analyzes various algorithms for approximating the matrix product A*B^T, typically using a rank-k approximation of the form A_S * B_S^T, where A_S and B_S are constructed by selecting k columns from A and B. The primary goals are:

*   **Implement Core Algorithms**: Provide Python implementations for several common matrix sketching and column selection techniques, including Leverage Score Sampling, CountSketch, Subsampled Randomized Hadamard Transform (SRHT), Gaussian Projection, and a Greedy Orthogonal Matching Pursuit (OMP) style selection.
*   **Compute Theoretical Bounds**: Implement functions to calculate theoretical error bounds for these approximations, including novel bounds developed by the user and standard literature bounds.
*   **Empirical Evaluation**: Conduct experiments to compare the actual performance (approximation error) of these algorithms against each other and against their respective theoretical bounds.
*   **Analyze Impact of Matrix Structure**: Investigate how structural properties of the matrices, specifically characterized by the rho_G metric, influence the approximation error and the tightness of bounds.
*   **Visualization**: Generate plots to visually represent experimental results, facilitating the comparison of different methods and bounds across various conditions.
*   **Reproducibility**: Structure the code to allow for easy reproduction of results and straightforward modification of experiment parameters.

### FILE STRUCTURE
The project is organized into two main Python files and two directories for outputs, which are created automatically:

1.  `matrix_product_approximations_exp4.py` (Core Library):
    *   This file is the core library containing all essential functions.
    *   Includes:
        *   Matrix generation functions, including methods to generate matrices (A, B) aiming for specific target rho_G values.
        *   Implementations of the matrix approximation algorithms (Leverage Score, CountSketch, SRHT, Gaussian Projection, Greedy OMP).
        *   Functions for calculating theoretical error bounds (user-defined and standard).
        *   Helper utilities (e.g., Frobenius norm calculation, plotting configurations like `IMPROVED_STYLES`).
        *   Functions for orchestrating comprehensive experiments (e.g., `run_experiments_flexible`, `run_experiment_rho_vs_error`) and plotting results.

2.  `run_experiment4.py` (Main Executable Script):
    *   This script utilizes the `matrix_product_approximations_exp4.py` to define, configure, and execute specific experiments.
    *   It is currently set up to run "Experiment 4: Impact of rho_G" but can be adapted for other experimental setups.
    *   Handles running the experiments, saving numerical results (typically to JSON files), and generating/saving plots.

3.  `plots/` (Output Directory):
    *   This directory is automatically created by the library upon its first import (if it doesn't already exist).
    *   All generated plots from the experiments (e.g., .png, .pdf files) are saved here.

4.  `results/` (Output Directory):
    *   This directory is also automatically created by the library.
    *   Numerical results from the experiments, typically in JSON format, are saved here, allowing for later analysis or re-plotting.

### SETUP / REQUIREMENTS
1.  **Python Version**:
    *   The code is developed and tested with Python 3.9+. It has been updated for compatibility with NumPy 2.0, which implies a relatively modern Python environment.
2.  **Required Libraries**:
    *   NumPy: For numerical operations, especially matrix manipulations.
    *   Matplotlib: For generating plots.
    *   SciPy: For scientific and technical computing (e.g., linear algebra, sparse matrices if extended).
    *   CVXPY: For convex optimization, used in calculating one of the theoretical bounds.
    *   Pandas: Used internally by CVXPY and potentially for data handling if the project is extended.
    *   tqdm: For displaying progress bars during long computations.
3.  **Installation**:
    *   Ensure Python 3 and pip are installed.
    *   It's recommended to use a Python virtual environment.
    *   Install the necessary libraries using pip:
        `pip install numpy matplotlib scipy cvxpy pandas tqdm`

### EXECUTION INSTRUCTIONS
1.  **Save Files**: Ensure both `matrix_product_approximations_exp4.py` and `run_experiment4.py` are saved in the same directory.
2.  **Run Script**: Open a terminal or command prompt, navigate to the directory where you saved the files, and execute the main script:
    `python run_experiment4.py`

The script will then run the experiments configured within `run_experiment4.py` (e.g., "Experiment 4").

### CONFIGURATION (within `run_experiment4.py`)
Experiment parameters are primarily configured within the `if __name__ == "__main__":` block of `run_experiment4.py`:

*   **Experiment Selection**: The script is typically set up to run a specific experiment (e.g., "Experiment 4"). You can switch between or add new experiment calls here.
*   **Matrix Dimensions**: Parameters like `n_exp4`, `m_exp4`, `p_exp4` define the dimensions of matrices A and B for Experiment 4.
*   **Sparsity/Sketch Size (k)**: `k_fractions_exp4` (or similar variables) define the k values to be tested, often as fractions of the common dimension `n`.
*   **Matrix Characteristics**: `target_rho_values_exp4` specifies the rho_G values to target during matrix generation for Experiment 4.
*   **Experiment Control**:
    *   `n_trials_exp4`: Number of repetitions for randomized algorithms to average results.
    *   `base_seed_exp4`: Master seed for random number generation to ensure reproducibility.
*   **Output Control**: Flags or settings to control saving of plots and numerical results.

### OUTPUT
The script generates several types of output:

1.  **Console Output**:
    *   Progress information, including the current experiment being run.
    *   Status updates on matrix generation attempts (especially for target rho_G).
    *   Completion messages and paths to saved files.
    *   Warnings if any issues occur (e.g., failure to achieve target rho_G).

2.  **Plots**:
    *   Visualizations are saved in the `plots/` directory.
    *   For "Experiment 4: Impact of rho_G", example plots include:
        *   `experiment4_rho_vs_error_n<value>.png` (and `.pdf`): A multi-panel plot showing approximation error versus rho_G for different k values. Each panel might represent a different k, or lines within a panel might represent different k values.
        *   `experiment4_rho_vs_error_legend.png` (and `.pdf`): A separate legend file for clarity, corresponding to the main plot.
    *   Plots typically show relative squared errors of algorithms and normalized bound values.

3.  **Numerical Results**:
    *   Detailed numerical data from each experiment run are saved in JSON format in the `results/` directory.
    *   Example for Experiment 4: `experiment4_rho_impact_n<value>_k_values_<k_list>.json`.
    *   These files contain data like squared relative errors and bound values for each algorithm and bound type, across the tested range of k values and rho_G values.

### CODE OVERVIEW / CUSTOMIZATION

*   **`matrix_product_approximations_exp4.py` (Core Library)**:
    *   **Matrix Generation**:
        *   `generate_matrices()`: Creates basic random matrices.
        *   `generate_matrices_for_rho()`: Key function that attempts to generate matrices A and B such that their product characteristics match a target `rho_G` value.
    *   **Algorithm Implementations**: Functions like `run_leverage_score_sampling()`, `run_countsketch()`, `run_srht_new()`, `run_gaussian_projection()`, `run_greedy_selection_omp()`.
    *   **Bound Calculations**:
        *   `compute_theoretical_bounds()`: Calculates user-defined bounds (Binary, QP Analytical, QP CVXPY Best).
        *   `compute_standard_bounds()`: Calculates standard literature bounds.
        *   `compute_optimal_vk_star()`: Computes the true optimal error (computationally intensive).
    *   **Experiment Orchestration**:
        *   `run_experiments_flexible()`: A versatile function to run a suite of algorithms and bounds.
        *   `run_experiment_rho_vs_error()`: Specific function designed for "Experiment 4" to study the impact of rho_G.
    *   **Plotting**: `plot_rho_vs_error_multi_k()` generates plots for Experiment 4. `IMPROVED_STYLES` dictionary defines consistent plotting aesthetics.

*   **`run_experiment4.py` (Main Script)**:
    *   This script acts as a driver, demonstrating how to use the library. It's currently focused on "Experiment 4".
    *   **Adding New Experiments**:
        1.  Define new experiment-driving logic, potentially by adding new functions in `matrix_product_approximations_exp4.py` if the setup is complex or intended for reuse.
        2.  Call these new functions from `run_experiment4.py` with the desired parameters.
        3.  Ensure appropriate plotting and result-saving calls are made for the new experiment.

*   **Modifying Algorithms or Bounds**:
    *   **New Algorithm**: Implement the algorithm as a function in `matrix_product_approximations_exp4.py` (typically taking A, B, k as input and returning the approximated product or relevant components). Integrate this new function into the `run_experiments_flexible` workflow or a similar experiment orchestration function.
    *   **New Bound**: Implement the bound calculation function in `matrix_product_approximations_exp4.py`. Add it to the relevant bound computation aggregator (e.g., `compute_theoretical_bounds` or `compute_standard_bounds`) and ensure its results are processed and collected.
    *   Remember to add a style definition to `IMPROVED_STYLES` if the new algorithm/bound is to be plotted.

### NOTES AND POTENTIAL ISSUES
*   **Computational Cost**: Be mindful that some calculations, particularly `compute_optimal_vk_star` (due to its combinatorial nature) or experiments involving large matrix dimensions (`n`), numerous `k` values, or many trials, can be computationally intensive and time-consuming.
*   **CVXPY Solvers**: The CVXPY library, used for one of the theoretical bounds, relies on underlying numerical solvers (e.g., SCS, ECOS). Ensure these are correctly installed and accessible. If solver issues arise, you might need to specify a different solver or adjust solver options within the CVXPY problem definition.
*   **Numerical Stability**: While the code aims for numerical robustness (e.g., using `np.float64`, clipping small values), operations on matrices with extreme properties or very specific parameter choices might occasionally lead to numerical precision issues.
*   **rho_G Target Generation**: The `generate_matrices_for_rho` function attempts to iteratively find matrices A and B that match a target rho_G value. This process might not always succeed within the predefined number of attempts, especially for extreme rho_G values or very tight tolerance levels. The script is designed to issue a warning if it fails to converge to the target rho_G.

================================================================================