May 21, 2025:
Red Hat Unveils llm-d for Scalable AI Inference - Red Hat has launched llm-d, an open source project for scalable generative AI inference. Utilizing a native Kubernetes architecture and leveraging vLLM, llm-d supports distributed AI inference across hybrid clouds, reducing costs and latency.
Key features include Prefill and Decode Disaggregation, KV Cache Offloading, and AI-Aware Network Routing. Supported by NVIDIA and Google Cloud, llm-d strives to set the standard for AI model deployment on any cloud infrastructure, aligning with Red Hat's vision for a universal, high-performance AI inference platform.