Dynamo 0.4 is here and delivers 4x inference performance on Blackwell with disaggregated serving. ⚡️ New features include: • SLO-based disaggregated autoscaling • New disaggregated sizing tool • Real time LLM specific observability metrics • Fault tolerance inflight request re-routing • GB200 NVL72 large-scale expert parallel developer guides These features help AI Factories reduce inference serving costs, consistently meet service level objectives, remove the guess work for setting up disaggregated serving environments, and enhance resiliency of inference system. 🔗 We are building NVIDIA Dynamo in the open and value your contribution 👇 Check out our repo on GitHub and join the NVIDIA Dynamo community ➡️
1,37K