# AGENTS.md

This file guides agentic tools working in this repo.
Keep advice grounded in the current codebase.

## Quick repo map
- `main.py`: entrypoint for pruning methods via argparse.
- `methods/`: pruning method implementations; `methods_call` dispatch.
- `utils/`: shared utilities (logging, model, eval, latency).
- `TALE/`: additional analysis and iterative pruning scripts.
- `tools/`: plotting and analysis helpers.
- `scripts/`: runnable `.sh` pipelines for common experiments.
- `models_unit/`: local model definitions (llama, mistral, qwen).

## Environment and setup
- Python project; install deps from `requirements.txt`.
- GPU is assumed in most evaluation/latency scripts.
- No lockfile or virtualenv config in repo; use your own.

Suggested setup:
```
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -r requirements.txt
```

## Build, lint, test
No formal build/lint/test tooling is configured.
There is no `pyproject.toml`, `setup.cfg`, or `pytest.ini`.
There are also no `tests/` or pytest/unittest references.

If you add tooling, document it here and in scripts.

### Build
- Not applicable; this is a script-driven research repo.

### Lint / format
- Not configured. Follow existing style manually.

### Test
- No unit tests found.
- For quick sanity checks, run entry scripts below.

### Single test
- Not applicable (no test runner configured).

## Common commands (from code/scripts)
Run pruning pipeline:
```
python main.py --method sleb --model-name <hf-model> --target-layers <N> --save-path <dir>
```

Perplexity / zero-shot eval:
```
python eval.py --model_name <hf-model> --removal_list 1 2 3 --save_results True
```

Latency measurement:
```
python latency.py --model_name <hf-model> --removal_ratio 0.2
```

Batch experiment scripts (Linux shell):
```
bash scripts/llama3-8b-instruct/shortgpt_evaluate.sh
```

Plotting / analysis tools:
```
python tools/plot_margin_phase_transition.py
```

## Code style guidelines
These reflect existing patterns across `main.py`, `utils/`, and `methods/`.

### Imports
- Group imports: standard library, third-party, local.
- Prefer explicit imports (avoid `import *`).
- Keep unused imports out.

### Formatting
- 4-space indentation; no tabs.
- Keep lines readable; wrap long calls with parentheses.
- Use blank lines to separate logical blocks.

### Types and annotations
- Type hints appear in utility functions and CLIs.
- Add hints for public functions and complex data structures.
- Use `List`, `Dict`, etc. from `typing` when helpful.

### Naming conventions
- Functions/variables: `snake_case`.
- Constants: `UPPER_SNAKE_CASE`.
- Modules/packages: lowercase with underscores.
- Class names: `CamelCase`.

### CLI patterns
- `main.py` uses `argparse` with modular argument groups.
- `eval.py` and `latency.py` use `fire` for simple CLIs.
- When adding new CLI args, follow existing argument grouping.
- Keep defaults explicit and documented in help text.

### Error handling
- Prefer early validation of inputs and arguments.
- Use `assert` for internal invariants (as in args parsing).
- Raise specific exceptions for recoverable errors.
- Avoid silent failures; log important conditions.

### Logging
- Use `utils.util.init_logging()` where possible.
- Standard log format is configured there.
- Prefer `logging.info/warning/error` over `print` for new code.

### Randomness and reproducibility
- Use `utils.util.set_seed(seed)` for deterministic runs.
- Avoid hidden global randomness in new code paths.

### Torch / model handling
- Call `model.eval()` before evaluation.
- Clear GPU cache with `torch.cuda.empty_cache()` when done.
- Use model handler methods (see `utils.model_utils`).

### Data and artifacts
- Cache directories and results should be created if missing.
- Write results in append mode when running multiple experiments.
- Do not hardcode personal file paths; use args.

### File organization
- Put new methods in `methods/` and register in `methods/__init__.py`.
- Put shared helpers in `utils/`.
- Add analysis/plotting scripts to `tools/`.
- Add experiment pipelines to `scripts/`.

## Existing script conventions
- `.sh` scripts assume Linux + module/conda.
- Scripts typically run model-specific experiments.
- Paths in scripts may be cluster-specific; keep them parameterized.

## Notes on this codebase
- `utils/argments_utils.py` uses dynamic `globals()` lookup; keep naming consistent.
- `methods_call` dispatch uses strings; update both choices and dict.
- Some files mix `logging` and `print`; prefer logging in new code.

## Cursor/Copilot rules
- No `.cursor/rules/`, `.cursorrules`, or `.github/copilot-instructions.md` found.
- If you add any, mirror their guidance here.

## When adding new dependencies
- Update `requirements.txt`.
- Note any GPU/CUDA requirements in this file.

## Suggested minimal smoke checks
- Run `python -m py_compile` on edited modules.
- Run one evaluation or latency script with a small model.

## Agent behavior expectations
- Keep changes minimal and consistent with existing code style.
- Prefer small, targeted edits over broad refactors.
- Avoid introducing heavy frameworks unless required.
- Document new commands or conventions here.
