The solution of the linear system of equations for a large scale reservoir simulation has several challenges. Preconditioners are used to speed up the convergence rate of the solution of such systems. In theory, a preconditioner defines a matrix M that can be inexpensively inverted and represents a good approximation of a given matrix A. In this work, two-stage preconditioners consisting of the approximated inverses M1 and M2 are investigated for multiphase flow in porous media. The first-stage preconditioner, M1, is approximated from Ausing four different solution methods: (1) constrained pressure residuals (CPR), (2) lower block Gauss-Seidel, (3) upper block Gauss-Seidel, and (4) one iteration of block Gauss-Seidel. The pressure block solution in each of these different schemes is calculated using the Algebraic Multi Grid (AMG) method. The inverse of the saturation (or more generally, the nonpressure) blocks are approximated using Line Successive Over Relaxation (LSOR). The second stage preconditioner, M2, is a global preconditioner based on LSOR iterations for the matrix A that captures part of the original interaction of different coefficient blocks. Several techniques are also employed to weaken the coupling between the pressure block and the nonpressure blocks. Effective decoupling is achievedby: (1) an IMPES-like approach designed to preserve the integrity of pressure coefficients, (2) Householder transformations, (3) the alternate block factorization (ABF), and (4) the balanced decoupling strategy (BDS) based on least squares. The fourth method is a new technique developed in this work. The aforementioned preconditioning techniques were implemented in a parallel reservoir simulation environment, and tested for large-scale two-phase and three-phase black oil simulation models. This study demonstrates that a two-stage preconditioner based on balanced decoupling strategy (BDS) or ABF combined with Gauss-Seidel sweeps, that also incorporate nonpressure solutions for M,delivers both the fastest convergence rate and the most robust option overall without compromising parallel scalability.