**Article 14**

### Design for Effective Human Oversight

The Academic Compliance Monitor incorporates a dual-model architecture consisting of ensemble-based Random Forest classifiers and sequence-aware recurrent neural networks (RNNs). This hybrid design enables comprehensive pattern recognition on both static tabular features (e.g., frequency of detected keyboard input anomalies) and temporal sequences (e.g., progression of detected audio patterns) within monitored exam sessions. The system is explicitly engineered to support continuous human oversight by embedding real-time interpretability components accessible through a dedicated human-machine interface (HMI). This interface displays anomaly scores, confidence metrics, and event timelines, ensuring exam supervisors can effectively track system decisions during examination periods.

The architecture’s modular nature permits the separation of detection outcomes by modality and feature source, thereby offering layered transparency. For example, behavioral flags generated by the Random Forest model are accompanied by feature importance indicators, while the RNN outputs include temporal attention maps illustrating which segments contributed most strongly to anomaly detection. These design choices facilitate oversight by providing supervisors with contextualized, digestible insights rather than opaque alerts. This approach mitigates operator fatigue associated with “black-box” systems, allowing natural persons to verify and evaluate system outputs continuously.

### Risk Prevention Through Oversight Capabilities

Human oversight functionalities target timely identification and mitigation of risks to academic fairness and student integrity, corresponding to potential adverse impacts on fundamental rights related to equal treatment and privacy. The system is engineered to provide early warnings for behavioral anomalies that may indicate exam misconduct or unauthorized assistance, minimizing false positives with a balanced sensitivity and specificity calibrated through extensive validation testing.

To minimize the risk of misclassification and consequential procedural injustices, the provider conducted benchmark evaluations using a dataset comprising approximately 25,000 anonymized exam session records, with synthetic injections of known anomaly patterns. The Random Forest classifiers achieved a precision rate of 92% and recall of 88% on hold-out tests, while the RNN models demonstrated a 90% precision and 85% recall in temporal anomaly detection. These performance levels ensure that supervisors receive robust signals while retaining the capability to perform informed judgment and intervention.

Human oversight thus serves as the final arbiter, empowered to evaluate alerts in the broader context of exam conditions or known external factors, preventing unwarranted actions based solely on automated outputs.

### Proportionate Oversight Measures Embedded in the System

Recognizing the medium autonomy level of the AI system and its operational context within supervised examinations, multiple oversight mechanisms are embedded prior to deployment:

- **Interactive Alert Interface:** Supervisors receive ranked alerts with clear explanations of contributing behavioral indicators and temporal context, enabling prioritization and rapid response.

- **Anomaly Explanation Module:** Accompanying each alert with model-internal interpretability features, including local feature importance scores and temporal attention heatmaps, promotes understanding of both static and temporal evidence.

- **Override and Intervention Controls:** The interface provides supervisors with capabilities to flag false positives, confirm or dismiss alerts, and initiate immediate exam session reviews. An emergency “stop” button is available to temporarily suspend AI monitoring where anomalies are suspected to arise from technical or environmental failures, ensuring that the system can be halted safely without disrupting ongoing assessments.

- **Automatic Logs and Audit Trails:** Detailed records of AI decisions, user interactions, and manual overrides are generated continuously, supporting ongoing evaluation of the oversight process and enabling traceability.

These embedded measures were defined through provider-led risk assessments aligned with commonly used standards (e.g., ISO/IEC 25010 for system quality and IEC 61508 for safety functions) to match the system’s risk profile and autonomy, thereby appropriately calibrating the scope and granularity of human intervention capabilities.

### Enabling the Deployer for Effective Oversight

The system is delivered with comprehensive documentation and training materials tailored to the deployer’s operators, primarily examination supervisors and integrity officers. These include:

- **Technical User Manuals:** Detailed guides clarifying the algorithmic principles underlying the system’s detection methods, anticipated limitations such as false positive conditions, and instructions on interpreting system outputs. The manuals also explain residual risks related to automation bias, explicitly cautioning supervisors about reliance on automated alerts and urging adherence to verification protocols.

- **Interactive Training Modules:** E-learning scenarios simulate realistic exam monitoring cases incorporating false positives and emerging behaviors, reinforcing the judgment skills required for autonomy in oversight.

- **Contextual Awareness Tools:** The interface features prompts and warnings regarding the potential for over-reliance on system outputs, supported by usage metrics dashboards that allow supervisors to monitor trends in alert generation and personal response patterns, mitigating automation bias.

- **Override and Disregard Functionality:** Supervisors maintain full authority to disregard AI alerts after due consideration or to suspend monitoring temporarily, with all such decisions and justifications captured within the log system for post-exam audits.

- **Processing Records on Special Data:** The system processes certain special categories of data, such as audio cues potentially containing sensitive contextual information, strictly under a documented bias mitigation protocol. The provider applied a necessity assessment demonstrating that such data were indispensable for detecting systematic deviations indicative of cheating patterns that could not be reliably identified through non-sensitive data alone. This approach is transparently documented in the processing records, compliant with EU data protection requirements, and available for deployer review.

Together, these deployment-enabling elements ensure that natural persons assigned to human oversight understand the AI system’s capacities and restrictions, retain critical decision-making authority, and have access to tools designed to prevent automation bias and ineffective monitoring.