Full text

Turn on search term navigation

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

In this paper, we propose an improved version of the Pyramidal Predictive Network (PPNV2), a theoretical framework inspired by predictive coding, which addresses the limitations of its predecessor (PPNV1) in the task of future perception prediction. While PPNV1 employed a temporal pyramid architecture and demonstrated promising results, its innate signal processing led to aliasing in the prediction, restricting its application in robotic navigation. We analyze the signal dissemination and characteristic artifacts of PPNV1 and introduce architectural enhancements and training strategies to mitigate these issues. The improved architecture focuses on optimizing information dissemination and reducing aliasing in neural networks. We redesign the downsampling and upsampling components to enable the network to construct images more effectively from low-frequency-input Fourier features, replacing the simple concatenation of different inputs in the previous version. Furthermore, we refine the training strategies to alleviate input inconsistency during training and testing phases. The enhanced model exhibits increased interpretability, stronger prediction accuracy, and improved quality of predictions. The proposed PPNV2 offers a more robust and efficient approach to future video-frame prediction, overcoming the limitations of its predecessor and expanding its potential applications in various robotic domains, including pedestrian prediction, vehicle prediction, and navigation.

Details

Title
Pyramidal Predictive Network V2: An Improved Predictive Architecture and Training Strategies for Future Perception Prediction
Author
Ling Chaofan 1 ; Zhong Junpei 2   VIAFID ORCID Logo  ; Li, Weihua 1   VIAFID ORCID Logo  ; Dong Ran 3   VIAFID ORCID Logo  ; Dai Mingjun 4 

 School of Shien-ming Wu Intelligent Engineering, South China University of Technology, Guangzhou 510641, China; [email protected] (C.L.); [email protected] (W.L.) 
 Faculty of Science and Technology, University of Wollongong (College Hong Kong), Hong Kong, China 
 School of Engineering, Chukyo University, Nagoya 466-0825, Japan; [email protected] 
 College of Information Engineering, Shenzhen University, Shenzhen 518060, China; [email protected] 
First page
79
Publication year
2025
Publication date
2025
Publisher
MDPI AG
e-ISSN
25042289
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3194488652
Copyright
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.