The commonly used method of analyzing 3D seismic facies classification data by stitching together 2D cross-sections can produce unrealistic discontinuities in geological features. Depending on the direction in which the 2D cross-sections are taken, some features such as depositional geomorphology, channel boundaries, and faults might not be fully visible, resulting in misleading labeled data and incorrect interpretations. Hence, in this work, we propose the application of 3D machine learning models to solve the problem of seismic facies classification. This introduces a two-fold challenge: first, using 3D models substantially increases the memory requirements of the computational framework; second, neural network design becomes increasingly challenging due to the higher number of parameters in the model and its larger training time. We utilize distributed deep learning techniques in order to address these challenges, and efficiently train 3D deep learning models for seismic facies classification on Microsoft Azure High-Performance Computing (HPC) clusters. Using those techniques, we were able to train 3D networks with millions of trainable parameters within a span of 3 hours, enabling rapid hyperparameter tuning and different network architecture evaluation. We found that the networks performed better when the 3D seismic input cuboids (and their corresponding labels) were longer along the depth dimension compared to the X and Y axes. Data augmentation through the non-uniform overlap of the training cuboids (with more overlap in areas of greater geological heterogeneities) was also shown to improve training performance. Overall, domain knowledge of the problem along with distributed computing techniques helped improve the efficiency and performance of deep learning-based 3D seismic facies classification.