# Repository Anonymization Summary

This document summarizes all changes made to anonymize this repository for double-blind review submission to ICLR.

## Overview

The repository has been thoroughly scrubbed to remove all identifying information including author names, personal email addresses, institutional affiliations, system-specific paths, machine names, repository URLs, and any other potentially identifying metadata.

## Files Modified for Anonymization

### 1. Main Entry Points
- **`pdebench/__main__.py`**
  - Removed system-specific hostname detection and paths
  - Replaced system-specific paths with generic `os.path.join(PROJDIR, 'data')`
  - Added anonymization notice in print statement

- **`am/__main__.py`**
  - Removed machine-specific paths for Eagle, GPU nodes, and PSC Bridges
  - Replaced personal paths with generic `os.path.join(PROJDIR, 'data')`
  - Anonymized test extraction directory paths
  - Added anonymization notice in print statement

### 2. Dataset Configuration
- **`pdebench/dataset/utils.py`**
  - Changed HuggingFace repo ID `'ANON/LPBF_DATASET'`
  - Added comment indicating anonymization

- **`pdebench/dataset/presample_dml.py`**
  - Replaced personal absolute paths:
    - User-specific absolute paths → generic placeholder paths
    - `/path/to/drivaerml_data/train_test_splits.json`
    - `/path/to/DrivAerML`
  - Added anonymization comment

### 3. Documentation
- **`README.md`**
  - Changed title to include "(Anonymized)" suffix
  - Removed all external paper links and badges (arXiv, HuggingFace)
  - Removed repository URLs
  - Anonymized framework attribution (removed `mlutils.py` GitHub link)
  - Replaced git clone commands with anonymized placeholders
  - Completely removed author citation section
  - Added anonymization comments throughout

- **`LICENSE`**
  - Changed copyright from personal name to "Anonymous Authors"

- **`am/dataset/readme_hf.md`**
  - Removed all author names from citation
  - Anonymized paper reference links
  - Replaced HuggingFace CDN image URLs with placeholder names
  - Removed contact information and GitHub repository links
  - Replaced citation section with anonymization notice

### 4. Code Comments and References
- **`am/dataset/extraction.py`**
  - Replaced specific author attributions with anonymized reference:
    - "Based on [author names]'s code" → "Anonymized: based on prior public Netfabb extraction scripts"

- **`pdebench/utils.py`**
  - Anonymized Lion optimizer reference from citation to "reference anonymized for review"
  - Replaced arXiv URL with "[arXiv link removed for anonymity]"

- **`scripts/install.sh`**
  - Generalized cluster names:
    - "Eagle" → "Cluster A"
    - "GCloud H100" → "Cloud GPU Provider"
    - Added "Local" prefix to remaining cluster reference

- **`scripts/download_data.sh`**
  - Anonymized HuggingFace settings URL reference in comment
  - Replaced personal HuggingFace repo IDs to `ANON/PDE_DATASET`

## Content Removed/Anonymized

### Author Information
- **Names removed:** All author names from papers and citations
- **Usernames removed:** All personal usernames and handles

### System Information
- **Machine names:** eagle, orchard, PSC, Bridges, GCloud
- **Paths:** All user-specific absolute system paths removed
- **Institutional domains:** References to `.edu` or specific institutions

### External References
- **Repository URLs:** All GitHub links to personal repositories
- **Paper links:** arXiv and HuggingFace paper page URLs
- **HuggingFace:** Repo IDs, CDN image URLs, settings page URLs
- **Email addresses:** Any personal or institutional email addresses

## Files Examined for Safety

### Media Files (19 total)
- **14 PDF files** in `figs/` and `am_dataset_stats/` - ✅ Clean, no identifying metadata
- **4 PNG files** - ✅ Clean, only generic matplotlib version info
- **1 JPG file** (`figs/lpbf_gallery.jpg`) - ✅ Clean, no identifying metadata

### Data Files
- **3 text files** with dataset statistics and case IDs - ✅ Clean, only anonymous data

## Verification

### Final Security Checks
- Comprehensive regex searches for all identifying information
- String analysis of all PDF, PNG, and JPG files for embedded metadata
- No identifying information found in final verification

### Preserved Functionality
- All anonymization preserves the scientific content and code functionality
- Generic placeholders maintain code structure
- Research figures and datasets remain intact with only metadata anonymized

### 5. Git Configuration (September 12, 2025)
- **Git user configuration**
  - Removed personal name and email from git config
  - Set local repository config to anonymous values:
    - `user.name = "Anonymous Author"`
    - `user.email = "anonymous@anon.org"`
  - This prevents any future commits from containing identifying information

### 6. Cache File Cleanup (September 12, 2025)
- **System cache files removed**
  - Removed `.DS_Store` files from root and `figs/` directories
  - Verified no `__pycache__` directories or `.pyc` files present
  - No other cache files (`.pytest_cache`, `.mypy_cache`, etc.) found
- **Enhanced `.gitignore`**
  - Added comprehensive cache file patterns for macOS, Windows, Linux
  - Added editor temporary files (`.swp`, `.swo`, `*~`)
  - Added user-specific IDE configuration files that might contain identifying info
  - Prevents future cache file commits

## Status: ✅ COMPLETE

The repository is now fully anonymized and ready for submission as supplementary material to ICLR. All identifying information has been removed while preserving the scientific content and code functionality.

**Last verification date:** September 12, 2025
