NVIDIA Accelerates Apache Spark, World’s Leading Data Analytics Platform - Seite 2
“We’re seeing significantly faster performance with NVIDIA-accelerated Spark 3.0 compared to running Spark on CPUs,” said William Yan, senior director of Machine Learning at Adobe. “With these game-changing GPU performance gains, entirely new possibilities open up for enhancing AI-driven features in our full suite of Adobe Experience Cloud apps.”
Databricks and NVIDIA Bring More Speed to Spark
Apache Spark was originally created by the founders of Databricks, whose cloud-based Unified Data Analytics Platform runs on over 1 million virtual machines every day.
NVIDIA and Databricks have collaborated to optimize Spark with the RAPIDS software suite for Databricks, bringing GPU acceleration to data science and machine learning workloads running on Databricks across
healthcare, finance, retail and many other industries.
“Our continued work with NVIDIA improves performance with RAPIDS optimizations for Apache Spark 3.0 and Databricks to benefit our joint customers like Adobe,” said Matei Zaharia, original creator of Apache Spark and chief technologist at Databricks. “These contributions lead to faster data pipelines, model training and scoring, that directly translate to more breakthroughs and insights for our community of data engineers and data scientists.”
Faster ETL and Data Transfers in Spark with NVIDIA GPUs
NVIDIA is contributing a new open source RAPIDS Accelerator for
Apache Spark to help data scientists increase the performance of their pipelines from end to end. The accelerator intercepts functions previously operated on by CPUs and instead uses GPUs
to:
- Accelerate ETL pipelines in Spark by dramatically improving the performance of Spark SQL and DataFrame operations without requiring any code changes.
- Accelerate data preparation and model training on the same set of infrastructure, where a separate cluster is not required for machine learning and deep learning.
- Accelerate data transfer performance across nodes in a Spark distributed cluster. These libraries leverage the open source Unified Communication X (UCX) framework of the UCF Consortium and minimize latency by enabling data to move directly between GPU memory.
Lesen Sie auch
A preview release of Spark 3.0 is now available from the Apache Software Foundation, with general availability expected in the coming months. More information is available at www.nvidia.com/spark.