Documentation Index
Fetch the complete documentation index at: https://mintlify.com/sgl-project/sglang/llms.txt
Use this file to discover all available pages before exploring further.
Models
The models endpoint provides information about available models. This endpoint is compatible with OpenAI’s/v1/models API.
List Models
Retrieve a list of all available models.Request
Response
Always
"list".Array of model objects.
Model identifier (e.g.,
"meta-llama/Llama-3.1-8B-Instruct").Always
"model".Unix timestamp when the model was added.
Organization that owns the model (always
"sglang").Root model identifier.
Parent model identifier.
Maximum context length supported by the model.
Example Response
Retrieve Model
Get information about a specific model.Request
Response
Model identifier.
Always
"model".Unix timestamp when the model was added.
Organization that owns the model.
Root model identifier.
Parent model identifier.
Maximum context length.
Example Response
LoRA Adapters
When using LoRA adapters, you can reference them using the syntaxbase-model:adapter-name:
Multi-Model Serving
SGLang supports serving multiple models simultaneously using different methods:Data Parallelism (DP)
Multiple replicas of the same model for higher throughput:Multiple LoRA Adapters
Serve a base model with multiple LoRA adapters:Examples
List All Models
Check Model Capabilities
Verify Model Before Request
Error Handling
Model Not Found
If you request a model that doesn’t exist:Supported Models
SGLang supports a wide range of models including:Language Models
- Llama: Llama 2, Llama 3, Llama 3.1, Llama 3.2
- Qwen: Qwen, Qwen2, Qwen2.5
- Mistral: Mistral 7B, Mixtral 8x7B, Mixtral 8x22B
- DeepSeek: DeepSeek V2, DeepSeek V3
- Gemma: Gemma 2B, Gemma 7B, Gemma 2
Vision-Language Models
- Llama 3.2 Vision: 11B, 90B
- Qwen2-VL: 2B, 7B, 72B
- InternVL: 2, 2.5
- LLaVA: 1.5, 1.6, OneVision
Other Models
- Embedding Models: BGE, E5, etc.
- Reasoning Models: GPT-OSS models with reasoning support
See Also
- Chat Completions - Generate chat responses
- Completions - Generate text completions
- Embeddings - Generate embeddings
- Server Args - Server configuration options
