**Article 15**

### Ensuring Accuracy, Robustness, and Lifecycle Performance

The Emergency Dispatch Prioritization Engine (EDPE) has been designed and developed to achieve a sustained, high level of accuracy and robustness throughout its operational lifecycle. Accuracy is quantitatively assessed using standard metrics tailored to the dual-modality nature of the system: for the CNN processing geographic and imagery data, metrics include top-1 and top-5 classification accuracy on labeled emergency event spatial datasets, while for the LSTM sequence prediction, accuracy is reflected in precision, recall, and F1-score on temporal event pattern recognition. Initial training was conducted on a curated dataset comprising 250,000 geo-referenced emergency incident records and 1 million sensor images collected over a three-year period from three mid-sized urban areas. Model validation achieved a combined overall prediction accuracy of 93.6%, with temporal predictions attaining an F1-score of 0.89, indicating reliable event sequence forecasting. 

Robustness was systematically evaluated using domain-specific stress tests that simulated typical environmental and operational variances, such as sensor noise, incomplete incident data, and temporally irregular event reporting. The system maintained performance degradation of less than 5% under these perturbations, meeting predefined robustness thresholds. Model monitoring tools integrated within the system provide continuous performance tracking, triggering alerts if metrics fall below established bounds, facilitating timely maintenance interventions.

Technical architecture incorporates redundant data pipelines and model ensembling strategies, reducing susceptibility to individual component failures. Updates to the models follow strict version control and retraining protocols, aligned with continuous integration/continuous deployment (CI/CD) pipelines, preserving consistency in performance post-update.

### Benchmarks and Measurement Methodologies

The provider collaborated with independent metrology entities and participated in benchmarking initiatives coordinated by the European AI Benchmark Consortium (EAIBC). Benchmarking utilized harmonized datasets and evaluation protocols reflecting real-world urban emergency scenarios, ensuring comparability with peer systems. Evaluation frameworks incorporate cross-validation with diverse urban datasets and incorporate adversarial robustness testing, leveraging simulated attack vectors modeled on known AI vulnerabilities.

The metrics employed conform to evolving EU-endorsed standards outlining accuracy (e.g., average precision, temporal sequence resolution), robustness (e.g., resilience to input perturbations), and cybersecurity parameters. These standards informed both pre-market validation and ongoing performance monitoring methodologies implemented in the system’s deployment environment.

### Declaration of Accuracy and Related Performance Metrics

The EDPE’s accompanying Instructions for Use include a detailed, transparent declaration of its performance characteristics. Specifically, it states:

- Classification accuracy on spatial data: 92.8% (top-1)
- Temporal event sequence F1-score: 0.89
- Robustness margin under environmental variances: ≥ 95%
- Model update frequency and retraining protocol ensuring <2% performance variability across update cycles

This documentation is presented alongside guidance for operational thresholds and caveats relating to deployment contexts, such as data quality assumptions and sensor configurations.

### Measures to Enhance System Resilience and Mitigate Feedback Loops

To ensure resilience against operational faults, the system employs a layered technical redundancy framework. Redundant data acquisition subsystems capture parallel sensor streams, enabling failover capabilities. In the event of sensor or data feed failure, the system automatically switches to alternative trusted data sources without interrupting prioritization outputs. Real-time internal consistency checks compare outputs between the CNN and LSTM subsystems to detect anomalies, flagging inconsistent or implausible prioritization suggestions for operator review.

Organisationally, Urban Safety Analytics mandates periodic retraining cycles with updated, curated datasets that exclude outputs from automated system decisions to prevent self-reinforcing bias (feedback loops). Data pipelines incorporate filters and human-in-the-loop validation steps to eliminate the risk of biased or incorrectly labeled incident data feeding back into model updates. Post-deployment models undergo adversarial simulation testing to detect emerging feedback-based distortions, with established rollback and fine-tuning procedures if degradations are detected.

Fail-safe planning includes a secondary rule-based emergency prioritization module designed to activate automatically in case of major model or infrastructure failure, ensuring continuous baseline functionality.

### Cybersecurity and Protection Against AI-Specific Threats

Cybersecurity measures of the EDPE are engineered to counteract both conventional and AI-specific attack vectors. The system infrastructure employs industry-standard security protocols including end-to-end data encryption (AES-256), secure API gateways with OAuth 2.0 authentication, and network segmentation isolating AI components from external access, minimizing attack surface.

Technical defenses against AI-specific vulnerabilities include:

- Data poisoning: Training datasets are cryptographically signed and validated against tampering before ingestion. A robust auditing process monitors dataset provenance with anomaly detection to identify suspicious input patterns indicative of poisoning attempts.
- Model poisoning: Model updates are signed and verified via digital certificates. Any retraining event is logged with immutable audit trails, enabling forensic review.
- Adversarial examples: The system incorporates adversarial training, augmenting training data with perturbed inputs generated via state-of-the-art attack methods (e.g., FGSM, PGD), improving model resilience to evasion tactics. Detection mechanisms flag suspicious input patterns at runtime, enabling dynamic rejection or operator alerting.
- Confidentiality attacks: Hardware-level security modules (Trusted Platform Modules) protect model parameters against extraction or reverse engineering. Access controls enforce the principle of least privilege, limiting internal access to model internals.
- Model flaws: Continuous integration pipelines deploy automated testing incorporating fuzzing and robustness analyses, uncovering latent model vulnerabilities prior to release.

Incident response protocols integrate automated intrusion detection systems (IDS) and continuous security monitoring, with predefined escalation and containment workflows. Security patches and countermeasures are deployed routinely with a maximum patch latency of 72 hours post-vulnerability identification.

Collectively, these technical and organisational controls support an integrated security posture calibrated to the risk profile of high-risk AI systems functioning within critical public safety infrastructures.