This guide helps you migrate between major versions of SGLang and understand breaking changes.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/sgl-project/sglang/llms.txt
Use this file to discover all available pages before exploring further.
Overview
SGLang follows semantic versioning (MAJOR.MINOR.PATCH):- Major versions: Breaking changes that require code modifications
- Minor versions: New features with backward compatibility
- Patch versions: Bug fixes with backward compatibility
Migrating to v0.5.x
Environment Variables
Several environment variables have been deprecated in favor of CLI flags:| Deprecated Env Var | Replacement CLI Flag |
|---|---|
SGLANG_ENABLE_FLASHINFER_FP8_GEMM | --fp8-gemm-backend=flashinfer_trtllm |
SGLANG_ENABLE_FLASHINFER_GEMM | --fp8-gemm-backend=flashinfer_trtllm |
SGLANG_SUPPORT_CUTLASS_BLOCK_FP8 | --fp8-gemm-backend=cutlass |
SGLANG_FLASHINFER_FP4_GEMM_BACKEND | --fp4-gemm-backend |
SGLANG_SCHEDULER_DECREASE_PREFILL_IDLE | --enable-prefill-delayer |
SGLANG_PREFILL_DELAYER_MAX_DELAY_PASSES | --prefill-delayer-max-delay-passes |
SGLANG_PREFILL_DELAYER_TOKEN_USAGE_LOW_WATERMARK | --prefill-delayer-token-usage-low-watermark |
Timeout Configuration
Timeout environment variables have changed from milliseconds to seconds:| Old (milliseconds) | New (seconds) |
|---|---|
SGLANG_QUEUED_TIMEOUT_MS | SGLANG_REQ_WAITING_TIMEOUT |
SGLANG_FORWARD_TIMEOUT_MS | SGLANG_REQ_RUNNING_TIMEOUT |
Prefix Migration: SGL_ to SGLANG_
AllSGL_ prefixed environment variables are deprecated in favor of SGLANG_:
Before:
The old
SGL_ prefix still works but will show deprecation warnings.Migrating to v0.4.x
Deterministic Inference
A new deterministic inference mode was introduced. If you need reproducible results: Before (v0.3.x):MoE Backend Changes
TheSGLANG_CUTLASS_MOE environment variable is deprecated:
Before:
Migrating from Other Frameworks
From vLLM
SGLang provides a similar API to vLLM with enhanced performance: vLLM:Key Differences from vLLM
- Prefix Caching: SGLang uses RadixAttention by default (more efficient)
- Chunked Prefill: Different default chunk sizes
- Memory Management: Different memory fraction defaults
- API Compatibility: SGLang is OpenAI-compatible but has additional features
From Text Generation Inference (TGI)
TGI uses a Docker-based approach, while SGLang can run directly: TGI:From LiteLLM
LiteLLM is a proxy/router, while SGLang is an inference engine. You can use LiteLLM with SGLang:Breaking Changes by Version
v0.5.0
- Environment variable prefix changes (
SGL_→SGLANG_) - Timeout units changed from milliseconds to seconds
- Several FP8/quantization env vars deprecated for CLI flags
- Memory pool configuration changes
v0.4.0
- Introduction of deterministic inference mode
- MoE backend configuration moved to CLI flags
- FlashInfer becomes the default attention backend
- Changes to RadixAttention cache behavior
v0.3.0
- Initial support for DeepSeek V3
- New multi-node deployment options
- Changes to expert parallelism configuration
Best Practices for Migration
1. Test in Staging First
Always test new versions in a staging environment before production deployment.2. Review Deprecation Warnings
Pay attention to deprecation warnings in logs:3. Pin Versions in Production
Use specific versions in your requirements:4. Check Release Notes
Always review release notes before upgrading.5. Update Configuration Files
If you use configuration files, update them according to the new format:6. Monitor Performance
After migration, monitor key metrics:- Throughput (requests/second)
- Latency (p50, p95, p99)
- GPU memory usage
- Error rates
Backward Compatibility
SGLang maintains backward compatibility within minor versions:- 0.5.0 → 0.5.6: Fully compatible
- 0.4.x → 0.5.x: Deprecation warnings, but works
- 0.3.x → 0.5.x: May require configuration updates
Getting Help with Migration
If you encounter issues during migration:- Check migration issues: Search GitHub Issues with label
migration - Ask in Slack: Join https://slack.sglang.io/ and ask in #general or #help
- Consult documentation: Check version-specific docs
- Report problems: File an issue with your migration scenario
See Also
- Environment Variables - Full configuration reference
- Server Arguments - CLI options
- Troubleshooting - Common issues and solutions
- Release Notes - Detailed changelogs
