Accelerating Model Training with Habana® Gaudi® Processors and SigOpt Hyperparameter Optimization
The Habana® Gaudi® compute efficiency and integration bring new levels of price-performance to both cloud and on-premises data center customers. SigOpt is a model development platform that makes it easy to track runs, visualize training, and scale hyperparameter optimization for any type of model built with any library on any infrastructure. Habana and SigOpt, both Intel companies, recently collaborated on improving model training time and reducing computational resources required to achieve the optimal hyperparameters for the model. The result was a reduction in training time on an MLPerf model on top of the grid search benefits, while utilizing less Gaudi®-hours with respect to grid search approach. This post highlights how Habana and SigOpt achieved these results. We hope AI developers can take advantage of the cost efficiency of Habana® Gaudi® and leverage SigOpt’s hyperparameter optimizations to accelerate model development on Gaudi®.
Habana Labs Gaudi Processor
The Gaudi® processor has been designed from the ground up for accelerating deep learning training workloads. Its heterogeneous architecture comprises a cluster of fully programmable Tensor Processing Cores (TPC) and a configurable Matrix Math Engine (MME). Gaudi® is the industry’s first AI training processor that has integrated ten 100 RDMA over Converged Ethernet (RoCE v2) engines on-chip. Habana’s SynapseAI® Software Suite enables efficient mapping of neural network topologies onto Gaudi® hardware. It includes Habana’s graph compiler and runtime, TPC kernel library, firmware and drivers, and developer tools such as the TPC SDK for custom kernel development and SynapseAI Profiler. SynapseAI is integrated with the popular frameworks, TensorFlow and PyTorch, and performance-optimized for Gaudi®. Gaudi provides Deep Learning developers the flexibility to build and scale with data parallel paradigms for the training workloads. Overall, the scalable performance lowers the cost of training from a Capex and Opex perspective.
For the MLPerf 1.1 submission, Habana chose to submit workload results at large-scale (up to 256 Gaudi® processors). Training a large scale deep learning training workload optimization is complex, requires large computational infrastructure designed for training, contributions from a multidisciplinary team for a variety of optimizations, and relies on computationally intensive hyperparameter optimization (HPO) efforts. Habana developed a computationally cost-efficient methodology to reduce the MLPerf training workload’s runtime, namely the reduction of the number of training epochs required to reach target accuracy. A “home-grown” grid search optimization reduced the time to converge by 28%, but utilized a significant amount of computational resources: over 85,000 Gaudi®-hours.
To reduce the training time further, Habana chose SigOpt Intelligent Experimentation Platform. It was quick and easy (a few days-1 developer) to integrate SigOpt into Habana’s training environment thanks to a clear API that communicates with cloud based SigOpt services and backed by excellent customer support. The parallel evaluations allowed for multiple experiments to run concurrently which led to better utilization of the training cluster, and therefore, a faster HPO turnaround.
Habana ran HPO experiments on a training cluster with hundreds of Gaudi® processors. SigOpt allowed the Habana to specify which metrics to optimize while doing HPO, as well as to limit how many runs could be done. Habana chose to optimize how quickly the epoch converged and limited the amount of experiments in different ways. A diagram of the HPO execution with SigOpt is shown below:
Figure 1: HPO diagram with SigOpt and Evaluator
Habana specified the experiments it would run until a predefined budget (eg. 100 experiments) was reached. After the experiments concluded, Habana assessed the best SigOpt found Hyperparameter values for further exploration and exploitation of the Hyperparatemers through our own evaluator building block. If the new converged epochs resulted in a runtime reduction larger than 1% with respect to the converged epochs associated with previously found good hyperparameters, a new batch of experiments would be carried out with the refined hyperparameters out of the evaluator. If no further reduction of the converged epochs is found, the HPO loop is stopped and the best hyperparameters and converged epochs are saved and reported.
Habana compared the results with their home-grown Grid Search. During the comparison of these two optimization methods, SigOpt provided a very clear advantage over the home-grown Grid Search. SigOpt managed to provide an additional 6% reduction in training time to reach the same target accuracy on top of the already obtained 28% time reduction found despite a 75% more computationally expensive home grown grid search. That is, SigOpt significantly lower computational effort consumed 21,333 Gaudi®-hours, vs 85,413 Gaudi®-hours via grid search HPO. And to end the list of advantages, SigOpt comes equipped with a dashboard that enables the insight needed to identify the relationships (sensitivity analysis) among the hyperparameters that affect the convergence speed up.
As a recap, Hyper-parameter optimization is a key component of AI developer workflows for model training and optimization. The combination of Gaudi’s price-performance and SigOpt’s Intelligent Experimentation Platform for HPO enables greater productivity and cost savings for AI developers. We hope developers will take advantage of Habana® Gaudi® processors and SigOpt platform to accelerate their model development.
To try SigOpt today, you can access it for free at sigopt.com/signup.