Red Hat launches the llm-d community, driving distributed Gen AI inference at scale

28/05/2025

The llm-d, a new open-source project has just been launched with support from companies like CoreWeave, Google Cloud, IBM Research, and NVIDIA. The initiative focuses on accelerating the most crucial need for the future of generative AI (gen AI): inference at scale. Based on a native Kubernetes architecture, the program uses distributed inference with vLLM and AI-aware intelligent network routing, enabling the creation of robust inference clouds for large-scale language models (LLMs) that meet the most demanding service-level objectives (SLOs) in production.

While training remains vital, the true impact of gen AI depends on more efficient and scalable inference—the mechanism that turns AI models into practical insights and user experiences. According to Gartner, by 2028, as the market matures, over 80% of workload accelerators in data centers will be deployed specifically for inference rather than training. This means the future of gen AI lies in execution capability. The growing resource demands of increasingly sophisticated and complex reasoning models limit the feasibility of centralized inference and threaten to create bottlenecks in AI innovation due to prohibitive costs and crippling latency.

Addressing the need for scalable inference

Red Hat and its industry partners are directly tackling this challenge with llm-d, a visionary project that extends the power of vLLM to overcome single-server limitations and unlock production-scale AI inference. Leveraging Kubernetes’ proven orchestration power, llm-d integrates advanced inference capabilities into existing corporate IT infrastructures. This unified platform empowers IT teams to meet diverse service demands of business-critical workloads while implementing innovative techniques to maximize efficiency and drastically reduce the total cost of ownership (TCO) associated with high-performance AI accelerators.

llm-d offers a powerful set of innovations, including:

vLLM, which quickly became the standard open-source inference server,providing day-zero model support for emerging frontier models and support for a wide range of accelerators, now including Google Cloud’s Tensor Processing Units (TPUs).
Disaggregated prefill and decodingto separate input context from AI token generation into distinct operations that can be distributed across multiple servers.
KV (key-value) cache offloading, based on LMCache, this feature transfers the memory load of the KV cache from GPU memory to more economical and abundant standard storage, such as CPU memory or network storage.
Kubernetes-based clusters and controllersfor more efficient scheduling of computing and storage resources as workload demands fluctuate, ensuring optimal performance and minimal latency.
AI-aware network routingto schedule incoming requests to servers and accelerators likely to have recent caches from previous inference calculations.
High-performance communication APIsfor faster and more efficient data transfer between servers, with support for NVIDIA Inference Xfer Library (NIXL).

llm-d: Unanimous among industry leaders

This new open-source project already has support from a formidable coalition of leading gen AI model providers, AI accelerator pioneers, and top AI-focused cloud platforms. CoreWeave, Google Cloud, IBM Research, and NVIDIA are founding contributors, with AMD, Cisco, Hugging Face, Intel, Lambda, and Mistral AI as partners, highlighting strong industry collaboration to architect the future of large-scale LLM execution. The llm-d community is also supported by academic institutions like UC Berkeley’s Sky Computing Lab, creators of vLLM, and the University of Chicago’s LMCache Lab, creators ofLMCache.

True to its unwavering commitment to open collaboration, Red Hat recognizes the critical importance of vibrant, accessible communities in the rapidly evolving gen AI inference landscape. Red Hat will actively support the growth of the llm-d community, fostering an inclusive environment for new members and driving its continuous evolution.

Red Hat’s vision: Any model, any accelerator, any cloud

The future of AI should be defined by limitless opportunities, not constrained by infrastructure silos. Red Hat envisions a horizon where organizations can deploy any model, on any accelerator, in any cloud, delivering exceptional and more consistent user experiences without exorbitant costs. To unlock the true potential of gen AI investments, businesses need a universal inference platform—a new standard for continuous, high-performance AI innovations, both now and in the coming years.

Just as Red Hat pioneered transforming Linux into the foundation of modern IT, the company is now poised to architect the future of AI inference. vLLM has the potential to become a cornerstone for standardized gen AI inference, and Red Hat is committed to building a thriving ecosystem not just around the vLLM community but also llm-d, focused on large-scale distributed inference. The vision is clear: regardless of the AI model, underlying accelerator, or deployment environment, Red Hat aims to make vLLM the definitive open standard for inference in the new hybrid cloud.

Red Hat Summit

Join the Red Hat Summit keynotes to hear the latest updates from Red Hat executives, customers, and partners:

Modern infrastructure aligned with enterprise AI—Tuesday, May 20, 8 AM–10 AM EDT (YouTube)
The hybrid cloud evolves to drive enterprise innovation—Wednesday, May 21, 8 AM–9:30 AM EDT (YouTube)

MATÉRIAS RELACIONADAS

DEIXE UMA RESPOSTA Cancelar resposta

Por favor digite seu comentário!

Por favor, digite seu nome aqui

Você digitou um endereço de e-mail incorreto!

Por favor, digite seu endereço de e-mail aqui

Red Hat launches the llm-d community, driving distributed Gen AI inference at scale

Automatic Pix takes the stage and challenges regulation in Brazil

Technology condenses months of WhatsApp conversations into a few lines so sales teams can improve customer experience

Retail Media: The Billion-Dollar Advertising That Is Redefining E-Commerce

DEIXE UMA RESPOSTA Cancelar resposta

RECENTES

Is Social Commerce in Charge Now? What to Expect from the TikTok Shop Boom

Rise in e-commerce boosts logistics automation and strengthens demand for Águia Sistemas solutions

Mari Maria Makeup Debuts on TikTok Shop and Reaches 220 Thousand People Online

Unprecedented research shows how college students use social media and choose brands

MAIS POPULARES

Corporate events establish themselves as a strategic branding tool

Poorly applied feedback can sabotage teams instead of strengthening them

Only 9% of influencers rely on the internet as their sole source of income, research shows

Organic traffic or paid traffic, who wins the battle?

ÚLTIMAS MATÉRIAS

Is Social Commerce in Charge Now? What to Expect from the TikTok Shop Boom

Rise in e-commerce boosts logistics automation and strengthens demand for Águia Sistemas solutions

Mari Maria Makeup Debuts on TikTok Shop and Reaches 220 Thousand People Online

MATÉRIAS POPULARES

Corporate events establish themselves as a strategic branding tool

Poorly applied feedback can sabotage teams instead of strengthening them

Only 9% of influencers rely on the internet as their sole source of income, research shows

CATEGORIAS POPULARES

SOBRE NÓS