Version: Latest

Fleet Sim Overview

Fleet Sim is the maintained fleet simulator for vLLM Semantic Router. The vllm-sr-sim package is its CLI and service entrypoint. It helps you plan GPU fleets before deployment, compare routing and split strategies, and expose those workflows inside the dashboard without reviving a separate simulator frontend.

What Fleet Sim is for

sizing homogeneous, heterogeneous, or disaggregated fleets against a latency target
comparing annualized cost across GPU choices, routing policies, and threshold choices
validating planning assumptions with simulation runs, trace replay, and what-if analysis
surfacing those workflows in the dashboard through a maintained backend proxy

What Fleet Sim is not for

it is not the router's live request path
it is not a runtime autoscaler or burst controller
it is not a per-kernel profiler for one deployment replica
it is not a replacement for the router configuration docs

Deployment modes

vllm-sr-sim can run as:

a standalone Python CLI for local sizing and what-if analysis
an HTTP service with vllm-sr-sim serve
a sidecar container that vllm-sr serve starts by default on the shared vllm-sr-network

Read this section in order

Getting started for local sidecar, standalone CLI, and external service setup
Dashboard integration for the proxy path and UI surfaces
Capacity planning scenarios for example-driven decision workflows
Simulation model reference and power model reference when you need the underlying mechanics
Guide PDF and guide assets when you want the printable version or source files

Overview

What Fleet Sim is for​

What Fleet Sim is not for​

Deployment modes​

Read this section in order​

What Fleet Sim is for

What Fleet Sim is not for

Deployment modes

Read this section in order