Snowflake, the AI Data Cloud company, is announcing that it will host Meta’s Llama 3.1—a collection of multilingual open source large language models (LLMs)—in Snowflake Cortex AI, the solution providing instant access to industry-leading LLMs. With Snowflake’s AI research team having optimized Llama 3.1 405B for both inference and fine-tuning, this offering pairs Meta’s powerful, open source LLM with Snowflake’s inference system stack for real-time, high-throughput inference.
The demands of AI and its models are seemingly impossible to address. Despite massive scale and memory requirements, users are aiming for low-latency inference, high throughput, and long context support, all while maintaining cost efficiency.
Acknowledging this complexity, Snowflake’s expansion of its Cortex AI solution brings Meta’s Llama 3.1 405B into its fold, ensuring that Snowflake users can easily, efficiently, confidently access, fine-tune, and deploy Meta’s latest models—with trust and security built-in at its core. Additionally, Snowflake’s Massive LLM Inference and Fine-Tuning System Optimization Stack addresses the issues of AI demands, using advanced parallelism techniques and memory optimizations to deliver fast, efficient AI processing without requiring complex infrastructure, according to Snowflake.
“Snowflake’s world-class AI Research Team is blazing a trail for how enterprises and the open source community can harness state-of-the-art open models like Llama 3.1 405B for inference and fine-tuning in a way that maximizes efficiency,” said Vivek Raghunathan, VP of AI engineering, Snowflake. “We’re not just bringing Meta’s cutting-edge models directly to our customers through Snowflake Cortex AI. We’re arming enterprises and the AI community with new research and open source code that supports 128K context windows, multi-node inference, pipeline parallelism, 8-bit floating point quantization, and more to advance AI for the broader ecosystem.”
“Safety and trust are a business imperative when it comes to harnessing generative AI, and Snowflake provides us with the assurances we need to innovate and leverage industry-leading large language models at scale,” said Ryan Klapper, an AI leader at Hakkoda. “The powerful combination of Meta’s Llama models within Snowflake Cortex AI unlocks even more opportunities for us to service internal RAG-based applications. These applications empower our stakeholders to interact seamlessly with comprehensive internal knowledgebases, ensuring they have access to accurate and relevant information whenever needed.”
Snowflake’s team of AI researchers have optimized the Llama 3.1 infrastructure for fine-tuning, featuring model distillation, safety guardrails, retrieval augmented generation (RAG), and synthetic data generation. On just a single GPU, Snowflake’s system stack can deliver real-time, high-throughput performance while supporting massive 128k context windows across multi-node setups—applicable to both next-gen and legacy hardware.
“As a leader in the hospitality industry, we rely on generative AI to deeply understand and quantify key topics within our Voice of the Customer platform. Gaining access to Meta’s industry-leading Llama models within Snowflake Cortex AI empowers us to further talk to our data and glean the necessary insights we need to move the needle for our business,” said Dave Lindley, senior director of data products, E15 Group. “We’re looking forward to fine-tuning and testing Llama to drive real-time action in our operations based on live guest feedback.”
To learn more about Snowflake’s latest model support, please visit https://www.snowflake.com/en/.