NVIDIA Developing RAPIDS Rapidly
Most of you now on NVIDIA’s latest development direction – the so called RAPIDS libraries, targeted at accelerating the workflow of data scientists. NVIDIA aims to reach the three main phases of this workflow – data preparation, model training and data visualization.
At the current point a large number of these operations for data scientists happen outside of the GPU processing. In order to further push its involvement in the entire part of the process, the company introduced cuDF and cuML libraries with cuGRAPH awaiting further announcement.
cuDF is NVIDIA’s accelerated software for doing data manipulation and data preparation. With it and with combination of its DGX specialized nodes, the company claims to have achieved nearly a 100x increase in load and data preparation on a 200GB csv dataset.
cuML is built upon traditional and largely used libraries Kalman, K-means, KNN, DBScan, PCA, TSVD and especially the most prominent one – XGBoost. With cuML and DGX you can accelerate your machine learning with 4x less physical nodes by more than 10x times, NVIDIA analysis shows.
RAPIDS libraries base their advantages on the company’s most advanced technologies – CUDA, NVLink and NVSwitch as well as the memory architecture of its GPU products. From our view the libraries requires a very little change to the code itself and no necessity to learn new tools. These are all open-source libraries and tools, supported by NVIDIA and built on Apache Arrow.
The new RAPIDS products are extremely beneficial and can show really quick results to experts using heavily XGBoost as it is the leading machine learning algorithm for tabular data. XGBoost is really easy to use and outperforms most of the ML algorithms for ranking, classification, regression but suffers from the normal weakness of CPU-processed algorithms – slow implementation, slow hyperparam search and reduced accuracy when scaling.
With the new 0.7 version RAPIDS is getting rapidly improved.
- XGBoost is now easier to use on multiple GPUs
- Libraries are available as conda packages
- cuDF receives a number of new functionalities like cumulative sum, product, min, and max functions for series
- new methods are added to the cuML library as well including completely rewritten single-GPU version of k-means
- etc.
If you are interested in testing NVIDIA’s RAPIDS package, please let us know. L3C is the only official DGX cloud provider in Europe and we can answer your questions and provide you an environment to test on.
You can learn more about our GPU Cloud Service here or contact us here.