Red Hat Unveils llm-d for Scalable AI Inference

May 21, 2025: Red Hat Unveils llm-d for Scalable AI Inference - Red Hat has launched llm-d, an open source project for scalable generative AI inference. Utilizing a native Kubernetes architecture and leveraging vLLM, llm-d supports distributed AI inference across hybrid clouds, reducing costs and latency.

Key features include Prefill and Decode Disaggregation, KV Cache Offloading, and AI-Aware Network Routing. Supported by NVIDIA and Google Cloud, llm-d strives to set the standard for AI model deployment on any cloud infrastructure, aligning with Red Hat's vision for a universal, high-performance AI inference platform.

GENERATIVE AI NEW LAUNCH

Red Hat Launches the llm-d Community, Powering Distributed Gen AI Inference at Scale

GENERATIVE AI NEW LAUNCH

Red Hat Launches the llm-d Community, Powering Distributed Gen AI Inference at Scale

Stay Current on AI in Minutes Weekly