The relevant research of deep-sea mining system is the active area of research of deep-sea resources development, among which the path following performance of deep-sea mining vehicles (DSMV) is directly related to the mining efficiency and operation safety of the whole mining system. In this paper, a deep reinforcement learning (DRL)-based controller for DSMV is trained and verified. An actor-critic-based reinforcement learning algorithm for continuous action space problems is adapted. A Markov decision process (MDP) model including the state, action, and reward is designed. The proposed method's path-following and self-learning capabilities are validated through DSMV path following simulation test.
The ocean, which accounts for 71% of the earth's surface area, is the largest untapped mineral resource on the earth. In generally, there are three main types of deposits of deep-sea mineral resources on the sea floor (reviewed by Leng et al., 2021): polymetallic nodules (also termed as manganese nodules, resting up to 4000–6000 m water depth), massive Sulfides (SMS, located near hot water vents on volcanic rocks) and cobalt-rich crusts (found on the slopes and summits of undersea mountains). Once this rich mineral resource can be exploited effectively and economically, it will alleviate the shortage of mineral resources.
As the key equipment of deep-sea ore mining in the future, deep-sea mining vehicle (DSMV) is the carrier of mining equipment. DSMV carries mining devices to collect deep-sea ore as much as possible through the ore area, and its ability to follow the predetermined mining path directly influences the effective of the ore collecting device in the mining process, which leads to the change of mining efficiency of the whole system. To improve mining efficiency and ensure operation safety, researchers have been working to find more efficient, accurate and robust path following controller.
Rankin et al. (1996) proposed the follow-the-carrot method to solve the problem of path following for ground walking robot. Yeu et al. (2012) and Yoon et al. (2014) combined the following algorithm with the positioning of DSMV, proposed the traditional path following algorithm and designed a PI controller to realize speed control. On this foundation, to make DSMV show more stable speed and steering ratio control performance in the process of path following control, subsequent researchers have made some optimization design and improvement. In 2015, Yeu et al. improved the PI controller and proposed the Approximately Millisecond Constrained Integral Gain Optimization (AMIGO) rule with validation of this model. Dai et al. (2019) further established the dynamic models for the DSMV, adopted fuzzy control algorithm and fuzzy adaptive PID control algorithm to control track speed and speed ratio separately. This work solves some shortcomings of the PI controller.