Abstract
Operators drilling a directional well may suffer the uncertainties from the downhole environment, the equipment, and the human decision-making process. We proposed a method to evaluate steering decisions of multiple reinforcement learning agents by calculating the estimated total future reward (ETR) through a validation agent using a simulator. The final decision would be chosen as the one with the maximum total reward among the multi-agent decisions. The multi-agent approach stabilizes the final decision-making for directional drilling and results in a higher success rate and better overall steering performance.
In our previous work (Yu et. al, 2021), a single agent was trained to make decisions for directional drilling through the deep reinforcement learning framework. To improve the stability of the decisions, we extend the single-agent approach to this multi-agent framework including a working environment and a validation environment. The system includes a working environment that includes a pool of working agents. The working environment represents the current physical drilling environment (e.g., rig state, geology, bit location, and planned trajectory). In addition, a validation agent is working in a validation environment using a hole propagation simulator for projecting the ETR.
When the system is in action, the working agents receive inputs such as surveys and planned information in the working environment and each of them makes their own decisions. Their decisions, as action proposals, are passed to the validation agent while the validation environment synchronizes the state from the working environment. The validation agent then carries out each action proposal in the virtual environment through the simulator and continues drilling to the target multiple times given a variety of formation priors. An average score of ETRs is then calculated for each action proposal. As a result, the action proposal with the maximum average score wins the competition and goes back to the working environment for execution.
The multi-agent framework could optimize and stabilize the sequence of decisions while minimizing uncertainties. Our validation results demonstrate that the total rewards were successfully improved than the single-agent decision-making.