Unmanned Aerial Vehicle (UAV) networks are increasingly deployed in military and civilian applications, serving as critical platforms for data collection. Users frequently require aggregated statistical information derived from historical sensory data within specific spatial and temporal boundaries. To address this, users submit aggregation query requests with spatial-temporal constraints to target UAVs that store the relevant data. These UAVs process and return the query results, which can be aggregated within the network during transmission to conserve energy and bandwidth-resources that are inherently limited in UAV networks. However, the dynamic topology caused by UAV mobility, coupled with these resource constraints, makes efficient in-network aggregation challenging without compromising user query delay. To the best of our knowledge, existing research has yet to adequately explore spatial-temporal range aggregation queries in the context of UAV networks. In this paper, we propose ESTA, an Efficient Spatial-Temporal range Aggregation query processing algorithm tailored for UAV networks. ESTA leverages pre-planned UAV trajectories to construct a topology change graph that models the network’s evolving connectivity. It then employs an efficient shortest path algorithm to determine the minimum query response delay. Subsequently, while adhering to user-specified delay constraints, ESTA transforms the in-network aggregation process into a series of set cover problems, which are solved recursively to build a Spatial-Temporal Aggregation Tree (STAT). This tree enables the identification of an energy-efficient routing path for aggregating and delivering query results. Extensive simulations demonstrate that ESTA reduces energy consumption by more than 50% compared to the baseline algorithm, all while satisfying the required query delay.