Abstract: In the context of disaster warning and monitoring within the Internet of Things (IoT), the utilization of unmanned aerial vehicles (UAVs) as relays to gather time-sensitive data from disaster monitoring sensors and transmit it to the base station (BS) has emerged as a highly promising application. In UAV-assisted Data Time-Sensitive IoT (DTIoT), the Age of Information (AoI) serves as a critical performance metric that quantifies the timeliness of data collection, specifically referring to the duration it takes for data to travel from the sensor to the BS. Controlling the flight trajectories of multiple UAVs to minimize AoI is a challenge under energy constraints. Existing work typically uses deep reinforcement learning (DRL) algorithms to address UAV trajectory control problems. However, the task of controlling multi-agent continuous trajectories in complex DTIoT network states is hindered by sparse rewards, posing challenges in training deep neural network-based control policies using standard DRL methods. In this paper, we propose a network-oriented hierarchical reinforcement learning (NO-HRL) algorithm to control the UAVs’ flight trajectory in DTIoT networks for minimizing the AoI. We design the control policy based on a two-layer hierarchical DRL, where the upper layer selects the target and the lower layer executes it. We further propose a decoupled sequential training scheme for effectively training the mutually coupled two-layer DRL network of NO-HRL. The experiment results show that our algorithm outperforms other baselines in AoI optimization for DTIoT.