In this study, we investigate the use of deep learning-based and kernel-based proxy models in nonlinearly constrained production optimization and compare their performances with directly using the high-fidelity simulators (HFS) for such optimization in terms of computational cost and optimal results obtained. One of the proxy models is embed to control and observe (E2CO), a deep learning-based model, and the other model is a kernel-based proxy, least-squares support-vector regression (LS-SVR). Both proxies have the capability of predicting well outputs. The sequential quadratic programming (SQP) method is used to perform nonlinearly constrained production optimization. The objective function considered here is the net present value (NPV), and the nonlinear state constraints are field liquid production rate (FLPR) and field water production rate (FWPR). NPV, FLPR, and FWPR are constructed by using two different types of proxy models. The gradient of the objective function as well as the Jacobian matrix of constraints are computed analytically for the LS-SVR, whereas the method of stochastic simplex approximated gradient (StoSAG) is used for optimization with E2CO and HFS. The reservoir model considered in this study is a two-phase, three-dimensional reservoir with heterogeneous permeability which is taken from the SPE10 benchmark case. Well controls are optimized to maximize the NPV in an oil-water waterflooding scenario. It is observed that all proxy models can find optimal NPV results like optimal NPV obtained by HFS with much less computational effort. Among proxy models, LS-SVR is found to be less computationally demanding in the training process. Overall, both proxy models are orders of magnitude faster than numerical models in the prediction. We provide new insights into the accuracy and prediction performances of these machine learning-based proxy models for 3D oil-water systems as well as their efficiency in nonlinearly constrained production optimization for waterflooding applications.