Oracle and NVIDIA Meet the Demands for GenAI with Latest Infrastructure Releases

Aug 7, 2024

Oracle and NVIDIA are expanding their partnership to meet the increased demands for generative AI, large language models (LLMs), advanced graphics, and digital twins to hasten operational efficiencies, reduce costs, and drive innovation.

Oracle Cloud Infrastructure (OCI) announced the release of NVIDIA L40S GPU bare-metal instances, available to order and the upcoming availability of a new virtual machine accelerated by a single NVIDIA H100 Tensor Core GPU.

This new VM expands OCI’s existing H100 portfolio, which includes an NVIDIA HGX H100 8-GPU bare-metal instance.

Paired with NVIDIA networking and running the NVIDIA software stack, these platforms deliver powerful performance and efficiency, enabling enterprises to advance generative AI, according to the vendors.

The NVIDIA L40S is a universal data center GPU designed to deliver breakthrough multi-workload acceleration for generative AI, graphics, and video applications.

Equipped with fourth-generation Tensor Cores and support for the FP8 data format, the L40S GPU excels in training and fine-tuning small- to mid-size LLMs and in inference across a wide range of generative AI use cases.

The L40S GPU also has best-in-class graphics and media acceleration. Its third-generation NVIDIA Ray Tracing Cores (RT Cores) and multiple encode/decode engines make it ideal for advanced visualization and digital twin applications.

The L40S GPU delivers up to 3.8x the real-time ray-tracing performance of its predecessor, and supports NVIDIA DLSS 3 for faster rendering and smoother frame rates.

This makes the GPU ideal for developing applications on the NVIDIA Omniverse platform, enabling real-time, photorealistic 3D simulations and AI-enabled digital twins.

With Omniverse on the L40S GPU, enterprises can develop advanced 3D applications and workflows for industrial digitalization that will allow them to design, simulate and optimize products, processes and facilities in real time before going into production.

OCI will offer the L40S GPU in its BM.GPU.L40S.4 bare-metal compute shape, featuring four NVIDIA L40S GPUs, each with 48GB of GDDR6 memory. This shape includes local NVMe drives with 7.38TB capacity, 4th Generation Intel Xeon CPUs with 112 cores and 1TB of system memory.

These shapes eliminate the overhead of any virtualization for high-throughput and latency-sensitive AI or machine learning workloads with OCI’s bare-metal compute architecture. The accelerated compute shape features the NVIDIA BlueField-3 DPU for improved server efficiency, offloading data center tasks from CPUs to accelerate networking, storage and security workloads. The use of BlueField-3 DPUs furthers OCI’s strategy of off-box virtualization across its entire fleet.

OCI Supercluster with NVIDIA L40S enables ultra-high performance with 800Gbps of internode bandwidth and low latency for up to 3,840 GPUs. OCI’s cluster network uses NVIDIA ConnectX-7 NICs over RoCE v2 to support high-throughput and latency-sensitive workloads, including AI training.

The VM.GPU.H100.1 compute virtual machine shape, accelerated by a single NVIDIA H100 Tensor Core GPU, is coming soon to OCI. This will provide cost-effective, on-demand access for enterprises looking to use the power of NVIDIA H100 GPUs for their generative AI and HPC workloads.

A single H100 provides a good platform for smaller workloads and LLM inference. For example, one H100 GPU can generate more than 27,000 tokens per second for Llama 3 8B (up to 4x more throughput than a single A100 GPU at FP16 precision) with NVIDIA TensorRT-LLM at an input and output sequence length of 128 and FP8 precision.

The VM.GPU.H100.1 shape includes 2×3.4TB of NVMe drive capacity, 13 cores of 4th Gen Intel Xeon processors and 246GB of system memory, making it well-suited for a range of AI tasks.

OCI has also made available the BM.GPU.GH200 compute shape for customer testing. It features the NVIDIA Grace Hopper Superchip and NVLink-C2C, a high-bandwidth, cache-coherent 900GB/s connection between the NVIDIA Grace CPU and NVIDIA Hopper GPU. This provides over 600GB of accessible memory, enabling up to 10x higher performance for applications running terabytes of data compared to the NVIDIA A100 GPU.

Enterprises have a wide variety of NVIDIA GPUs to accelerate their AI, HPC, and data analytics workloads on OCI. However, maximizing the full potential of these GPU-accelerated compute instances requires an optimized software layer, according to the vendors.

NVIDIA NIM, part of the NVIDIA AI Enterprise software platform available on the OCI Marketplace, is a set of easy-to-use microservices designed for secure, reliable deployment of high-performance AI model inference to deploy world-class generative AI applications.

For more information about this news, visit www.oracle.com.

Newsletters

Oracle and NVIDIA Meet the Demands for GenAI with Latest Infrastructure Releases

White Papers

Sponsors