Documentation Index
Fetch the complete documentation index at: https://mintlify.com/sgl-project/sglang/llms.txt
Use this file to discover all available pages before exploring further.
Overview
SGLang provides comprehensive monitoring capabilities through Prometheus metrics and Grafana dashboards. This allows you to track performance, resource usage, and request patterns in real-time.Quick Start
Enable Metrics
To enable metrics collection, start your SGLang server with the--enable-metrics flag:
http://localhost:30000/metrics.
Verify Metrics
You can verify that metrics are being collected by querying the metrics endpoint:Docker-based Monitoring Stack
SGLang includes a pre-configured monitoring stack with Prometheus and Grafana in theexamples/monitoring directory.
Prerequisites
- Docker and Docker Compose installed
- SGLang server running with metrics enabled
Setup Steps
-
Start your SGLang server with metrics enabled:
-
Navigate to the monitoring directory:
-
Start the monitoring stack:
-
Access the interfaces:
- Grafana: http://localhost:3000
- Prometheus: http://localhost:9090
-
Log in to Grafana:
- Default Username:
admin - Default Password:
admin - You will be prompted to change the password on first login
- Default Username:
-
View the Dashboard:
Navigate to
Dashboards→Browse→SGLang Monitoring→SGLang Dashboard
Configuration Files
The monitoring setup is defined by these files inexamples/monitoring:
docker-compose.yaml: Defines Prometheus and Grafana servicesprometheus.yaml: Prometheus configuration, including scrape targetsgrafana/datasources/datasource.yaml: Configures Prometheus as a data sourcegrafana/dashboards/config/dashboard.yaml: Tells Grafana where to load dashboardsgrafana/dashboards/json/sglang-dashboard.json: The Grafana dashboard definition
Customizing Prometheus Scrape Configuration
If your SGLang server runs on a different host or port, update theprometheus.yaml file:
host.docker.internal (Docker Desktop) or your machine’s network IP instead of localhost.
Troubleshooting
Port Conflicts
If ports 3000 or 9090 are already in use: Option 1: Change Grafana port with environment variable:No Data on Dashboard
-
Generate traffic to produce metrics:
-
Verify Prometheus is scraping the SGLang endpoint:
- Go to Prometheus UI: http://localhost:9090
- Check
Status→Targets - Ensure the SGLang endpoint shows as “UP”
-
Check label matching:
- Verify
model_nameandinstancelabels in Prometheus match dashboard variables - You may need to adjust Grafana dashboard variables
- Verify
Connection Issues
-
Verify containers are running:
-
Check Prometheus data source in Grafana:
- Go to
Connections→Data sources→Prometheus - URL should be
http://prometheus:9090
- Go to
-
Test metrics endpoint accessibility:
From inside the Prometheus container:
Advanced Configuration
Extra Metric Labels
Add custom labels to all metrics using the--extra-metric-labels flag:
Multiprocess Metrics
For multi-GPU or distributed setups, SGLang automatically handles multiprocess metrics collection. Each process exports metrics with appropriate labels:tp_rank: Tensor parallel rankpp_rank: Pipeline parallel rankdp_rank: Data parallel rank (if applicable)moe_ep_rank: MoE expert parallel rank
CPU Monitoring
SGLang includes CPU usage monitoring via thesglang:process_cpu_seconds_total metric, which tracks total CPU time (user + system) consumed by each process component.
Grafana Dashboard
The pre-configured dashboard provides visualization for:- Request Metrics: Throughput, latency distributions (TTFT, TPOT, E2E)
- Token Metrics: Prompt tokens, generation tokens, cache hit rates
- Resource Utilization: Token usage, queue sizes, running requests
- Performance: Generation throughput, inter-token latency
- Speculative Decoding: Acceptance rates and lengths (if enabled)
- PD Disaggregation: KV transfer speeds, queue depths (if using prefill-decode separation)
Next Steps
- Learn about available Prometheus metrics
- Set up request tracing with OpenTelemetry
- Run benchmarks to test performance
