NVIDIA TensorRT 3 Dramatically Accelerates AI Inference for Hyperscale Data Centers - Seite 2
It offers highly accurate INT8 and FP16 network execution, which can save data center operators tens of millions of dollars in acquisition and annual energy costs. A developer can use it to take a trained neural network and, in just one day, create a deployable inference solution that runs 3-5x faster than their training framework.
To further accelerate AI, NVIDIA introduced additional software, including:
-
DeepStream SDK: NVIDIA DeepStream SDK delivers real-time, low-latency video analytics at scale. It helps developers integrate advanced video inference capabilities, including
INT8 precision and GPU-accelerated transcoding, to support AI-powered services like object classification and scene understanding for up to 30 HD streams in real time on a single Tesla P4 GPU
accelerator.
- CUDA 9: The latest version of CUDA®, NVIDIA's accelerated computing software platform, speeds up HPC and deep learning applications with support for NVIDIA Volta architecture-based GPUs, up to 5x faster libraries, a new programming model for thread management and updates to debugging and profiling tools. CUDA 9 is optimized to deliver maximum performance on Tesla V100 GPU accelerators.
Inference for the Data Center
Data center managers constantly balance performance and efficiency to keep their server fleets at maximum productivity. Tesla GPU accelerated
servers can replace over a hundred hyperscale CPU servers for deep learning inference applications and services, freeing up precious rack space, reducing energy and cooling
requirements, and reducing cost as much as 90 percent.
NVIDIA Tesla GPU accelerators provide the optimal inference solution -- combining the highest throughput, best efficiency and lowest latency on deep learning inference workloads to power new AI-driven experiences.
Inference for Self-Driving Cars and Embedded Applications
With NVIDIA's unified architecture, deep neural networks on every deep learning framework can be trained on NVIDIA
DGX™ systems in the data center, and then deployed into all types of devices -- from robots to autonomous vehicles -- for real-time inferencing at the edge.
Lesen Sie auch
TuSimple, a startup developing autonomous trucking technology, increased inferencing performance by 30 percent after TensorRT optimization. In June, the company successfully completed a 170-mile Level 4 test drive from San Diego to Yuma, Arizona, using NVIDIA GPUs and cameras as the primary sensor. The performance gains from TensorRT allow TuSimple to analyze additional camera data, and add new AI algorithms to their autonomous trucks, without sacrificing response time.