{
    "Name": "resource_centric_ppm_agents",
    "Title": "Towards Resource-Centric Predictive Process Monitoring for Concurrent Business Processes",
    "Short Hypothesis": "Simple per-resource decision policies learned from local XES logs (BPI 2012, BPI 2017, Road Traffic Fine Management Process) plus a lightweight simulator improve next-event and suffix predictions and yield resource-level forecasts versus a case-centric LSTM.",
    "Related Work": "Case-centric PPM ignores shared-resource competition. We instead learn per-resource behavior and simulate concurrency.",
    "Abstract": "We present a reproducible evaluation protocol for resource-aware predictive process monitoring (PPM), alongside a compact, transparent case-centric LSTM baseline evaluated on BPI 2012, BPI 2017, and Road Traffic logs with chronological splits. Our protocol fixes leakage controls, train-only normalization, figure and metric exports, and seeds for determinism. We also provide a fully specified but modular blueprint for a resource-centric agent (per-resource multinomial policies + lightweight discrete-event simulator), including metrics for global next event and resource-level workload—even though full end-to-end simulator results are not reported in this version. Baseline next-activity results are strong (Top-3 0.987–0.994; Top-1 0.757–0.833) and expose systematic confusions that motivate resource-aware context. We release code, splits, and plot artifacts to enable one-click replication and future comparisons. This paper is intended as a protocol + baseline + pitfalls report to accelerate trustworthy experiments on resource-aware PPM, rather than a claim of state-of-the-art accuracy.",
    "Experiments": [
        "Data (offline): first load BPI_Challenge_2012.xes in input data folder and then BPI_Challenge_2017.xes; parse with PM4Py. Discover any of {BPI2012, BPI2017, Road Traffic}. Attributes: activity, timestamp, resource, lifecycle.",
        "Preprocess: map lifecycle to start/complete; pair start-complete for durations; if only complete, treat as instantaneous.",
        "Split: chronological 70/10/20 train/val/test by case start.",
        "State (replay): track each case's next activity; resource busy/idle; queue counts and oldest-wait per activity.",
        "Policies: per-resource multinomial logistic regression predicting next activity from {prev activity ID, 4-bin time-of-day, queue counts/oldest-wait}. Back off to a global model if few samples.",
        "Durations: log-normal per activity; fallback median if sparse.",
        "Simulator: when a resource frees, sample its next activity via policy; pick FIFO case among those needing it; sample duration; advance to next completion.",
        "Rollouts: N=30 Monte Carlo trajectories per prefix.",
        "Baseline: 1-layer LSTM (hidden\u224864) on case sequences (activity ID, resource ID, \u0394t bin); iterative decoding for suffix. Train 5 epochs for a first pass.",
        "Metrics: next-activity Top-1/Top-3, suffix normalized Damerau-Levenshtein similarity, remaining-time MAE, global next-event accuracy, per-resource next-task precision & workload MAPE.",
        "Ablation: FIFO policy (no learned policy) with identical simulator, resource always selects the oldest-waiting eligible activity. Recompute metrics."
    ],
    "Risk Factors and Limitations": [
        "Hidden priorities/case attributes not modeled may limit fidelity.",
        "Replay-based per-case next-activity may miss complex branching.",
        "Long-horizon simulations can drift; small N keeps runtime low at the cost of variance."
    ],
    "Code": "# ai_scientist/ideas/my_research_topic.py\n# -----------------------------------------------------------------------------\n# Robust XES discovery & loading for AI-Scientist/BPM-Scientist\n# -----------------------------------------------------------------------------\n# HOW THIS WORKS:\n# - local input data under ./input. Therefore your logs are visible to the agent as: input/*.xes\n# - This module ALWAYS prefers Path(\"input\") (and CWD/input), then tries ./data\n# - Supports .xes and .xes.gz and loads any of: BPI_Challenge_2012, BPI_Challenge_2017, Road_Traffic_Fine_Management_Process.\n# -----------------------------------------------------------------------------\n\nfrom __future__ import annotations\nfrom pathlib import Path\nimport pandas as pd\nfrom typing import Dict, List, Optional, Tuple\n\n# ---------- helpers: discovery ----------\n\ndef _has_xes(dirpath: Path) -> bool:\n    \"\"\"True if directory contains any .xes or .xes.gz files.\"\"\"\n    try:\n        return dirpath.is_dir() and (any(dirpath.glob(\"*.xes\")) or any(dirpath.glob(\"*.xes.gz\")))\n    except Exception:\n        return False\n\ndef _resolve_data_dir() -> Path:\n    \"\"\"\n    Resolution order (first hit wins):\n      1) ./input (workspace copy of data) and CWD/input\n      2) ./data and parent-walk fallbacks (also check parent/input)\n      3) Common absolute fallbacks under /workspace\n    \"\"\"\n    candidates: List[Path] = []\n\n    # 1) workspace-local input/\n    candidates += [Path(\"input\").resolve(), (Path.cwd() / \"input\").resolve()]\n\n    # 2) data/ in CWD and parent-walk (also try input/ in parents)\n    cwd = Path.cwd().resolve()\n    for base in [cwd, *cwd.parents]:\n        candidates.append((base / \"data\").resolve())\n        candidates.append((base / \"input\").resolve())\n\n    # 3) common absolute places in Sakana-style layouts\n    candidates += [\n        Path(\"/workspace/input\"),\n        Path(\"/workspace/data\"),\n        Path(\"/workspace/ai_scientist/data\"),\n        Path(\"/workspace/AI-Scientist-v2/data\"),\n        Path(\"/workspace/experiments/data\"),\n        Path(\"/workspace/ai_scientist/input\"),\n        Path(\"/workspace/experiments/input\"),\n    ]\n\n    seen = set()\n    for p in candidates:\n        if p in seen:\n            continue\n        seen.add(p)\n        if _has_xes(p):\n            print(f\"[data] Using discovered data dir: {p}\")\n            return p\n\n    tried = \"\\n  - \" + \"\\n  - \".join(str(c) for c in candidates)\n    raise FileNotFoundError(\n        \"Could not locate a directory containing .xes files.\\n\"\n        f\"Checked:{tried}\\n\"\n        \"Tips:\\n\"\n        \"  \u2022 Ensure filenames include BPI 2012/2017 or 'Road_Traffic_Fine_Management_Process' for auto-match.\"\n    )\n\ndef _first_match(d: Path, patterns: List[str]) -> Optional[Path]:\n    \"\"\"\n    Return the first existing file in d matching any pattern (supports globs).\n    Patterns like '*.xes*' allow both .xes and .xes.gz.\n    \"\"\"\n    for pat in patterns:\n        for p in d.glob(pat):\n            if p.is_file():\n                return p\n    return None\n\n# ---------- XES loading ----------\n\ndef xes_to_df(xes_path: Path) -> pd.DataFrame:\n    \"\"\"Load a .xes(.gz) with pm4py and return a tidy DataFrame.\"\"\"\n    try:\n        from pm4py.objects.log.importer.xes import importer as xes_importer\n    except Exception as e:\n        raise ImportError(\"pm4py is required. Install inside your venv: `pip install pm4py`.\") from e\n\n    print(f\"[data] Loading XES: {xes_path}\")\n    log = xes_importer.apply(str(xes_path))\n    rows = []\n    for tr in log:\n        case_id = tr.attributes.get(\"concept:name\") or tr.attributes.get(\"case:concept:name\")\n        for e in tr:\n            rows.append({\n                \"case_id\": case_id,\n                \"activity\": e.get(\"concept:name\"),\n                \"lifecycle\": e.get(\"lifecycle:transition\", \"complete\"),\n                \"timestamp\": e.get(\"time:timestamp\"),\n                \"resource\": e.get(\"org:resource\", \"System\"),\n            })\n    df = pd.DataFrame(rows)\n    df[\"timestamp\"] = pd.to_datetime(df[\"timestamp\"], utc=True, errors=\"coerce\")\n    df = df.dropna(subset=[\"timestamp\"]).reset_index(drop=True)\n    df = df[[\"case_id\", \"activity\", \"lifecycle\", \"timestamp\", \"resource\"]]\n    df = df.sort_values([\"timestamp\", \"case_id\"]).reset_index(drop=True)\n    return df\n\n# ---------- public API used by the agent ----------\n\ndef load_datasets() -> Dict[str, pd.DataFrame]:\n    \"\"\"\n    Returns any subset of {'BPI2012','BPI2017','ROAD'} that exist.\n    Does NOT fail if one is missing; it loads what it finds and prints diagnostics.\n    \"\"\"\n    data_dir = _resolve_data_dir()\n    available = sorted([p.name for p in list(data_dir.glob(\"*.xes\")) + list(data_dir.glob(\"*.xes.gz\"))])\n    print(f\"[data] Available in {data_dir}: {available}\")\n\n    # Patterns accept both strict and fuzzy names; '*.xes*' allows .xes or .xes.gz.\n    patterns = {\n        \"BPI2012\": [\"BPI_Challenge_2012*.xes*\", \"BPI2012*.xes*\", \"*2012*.xes*\"],\n        \"BPI2017\": [\"BPI_Challenge_2017*.xes*\", \"BPI2017*.xes*\", \"*2017*.xes*\"],\n        \"ROAD\":    [\"Road_Traffic_Fine_Management_Process*.xes*\", \"*Traffic*Fine*.xes*\", \"*Traffic*.xes*\"],\n    }\n\n    loaded: Dict[str, pd.DataFrame] = {}\n    for key, pats in patterns.items():\n        path = _first_match(data_dir, pats)\n        if path is not None:\n            try:\n                loaded[key] = xes_to_df(path)\n            except Exception as e:\n                print(f\"[warn] Failed to load {key} from {path}: {e}\")\n        else:\n            print(f\"[data] Not found for {key} (patterns {pats})\")\n\n    if not loaded:\n        raise FileNotFoundError(\n            f\"No known XES files found in {data_dir}. \"\n            f\"Found: {available}\"\n        )\n\n    print(f\"[data] Loaded datasets: {list(loaded.keys())}\")\n    return loaded\n\ndef pick_default_dataset(datasets: Dict[str, pd.DataFrame],\n                         order: Tuple[str, ...] = (\"BPI2017\", \"BPI2012\", \"ROAD\")) -> Tuple[str, pd.DataFrame]:\n    \"\"\"Pick a default dataset by preference order.\"\"\"\n    for name in order:\n        if name in datasets:\n            return name, datasets[name]\n    # Fallback: first any\n    name = next(iter(datasets.keys()))\n    return name, datasets[name]\n\n# Optional smoke test when run standalone\nif __name__ == \"__main__\":\n    ds = load_datasets()\n    name, df = pick_default_dataset(ds)\n    print(f\"[data] Using dataset: {name}, shape={df.shape}\")\n    print(df.head())\n"
}