ABSTRACT

Autonomous underwater vehicle (AUV) path planning is an important problem. To solve the problem, conventional methods require explicit environment model. However, marine environment is complex and highly dynamic, which leads to great difficulties for building an explicit environment model and thus results in the inefficiency or even infeasibility of the conventional methods. In this paper, we propose a novel deep reinforcement learning based method to solve the AUV path planning problem. Specifically, we focus on the AUV path planning problem where the environment map is unknown and the AUV is required to reach a target point with the minimum time and obey motion constraints including both collision avoidance and steering angle minimization. We model the problem as a Markov decision process and propose a novel Hybrid policy based Policy-Constrained PPO (HyPoC-PPO) algorithm to solve the problem. HyPoC-PPO can directly guide the learned policy to minimize the steering angle of the path with a new objective function that incorporates policy regularization rather than depends on indirect guidance from a reward function. In addition, HyPoC-PPO can generate a hybrid policy that is composed of DRL-based and non-DRL-based policies. The two kinds of policies are applied to different situations and can be integrated well by optimizing our proposed objective function. Simulation results demonstrate the capability and efficiency of our proposed method for AUV path planning in unknown environments.

INTRODUCTION

Autonomous underwater vehicle (AUV) path planning is an important problem in many applications, such as underwater exploration, ocean development, underwater rescue, etc. To solve this problem, conventional path planning methods require an explicit environment map based on which a path planning algorithm can be executed online(Zhang et al., 2018; Yan et al., 2021). When the environment is unknown, these methods need to gradually build a map according to the perception of the environment and re-plan path periodically, which is prohibitively time-consuming and unacceptable in urgent tasks(Chowdhury and Schwartz, 2020). Additionally, marine environment is complex and dynamic, which leads to great difficulties to build an explicit environment map and thus results in the inefficiency or even infeasibility of the conventional methods.

This content is only available via PDF.
You can access this article if you purchase or spend a download.