Documentation Index
Fetch the complete documentation index at: https://mintlify.com/sgl-project/sglang/llms.txt
Use this file to discover all available pages before exploring further.
SGLang provides a powerful frontend language that makes it easy to program LLM applications with intuitive primitives and control flow. The frontend language is designed to simplify complex prompting workflows, enable advanced features like parallel sampling and streaming, and work seamlessly with various backend providers.
Key Features
Intuitive Programming Model
The SGLang frontend uses Python decorators and a state-based programming model that feels natural for Python developers:
import sglang as sgl
@sgl.function
def multi_turn_question(s, question_1, question_2):
s += sgl.user(question_1)
s += sgl.assistant(sgl.gen("answer_1", max_tokens=256))
s += sgl.user(question_2)
s += sgl.assistant(sgl.gen("answer_2", max_tokens=256))
Advanced Control Flow
Parallel Sampling: Fork execution to generate multiple responses in parallel
@sgl.function
def tip_suggestion(s):
s += "Here are two tips for staying healthy: "
s += "1. Balanced Diet. 2. Regular Exercise.\n\n"
forks = s.fork(2)
for i, f in enumerate(forks):
f += f"Now, expand tip {i+1} into a paragraph:\n"
f += sgl.gen(f"detailed_tip", max_tokens=256, stop="\n\n")
s += "Tip 1:" + forks[0]["detailed_tip"] + "\n"
s += "Tip 2:" + forks[1]["detailed_tip"] + "\n"
s += "In summary" + sgl.gen("summary")
Conditional Logic: Use Python’s native control flow with generated outputs
@sgl.function
def tool_use(s, question):
s += "To answer this question: " + question + ". "
s += "I need to use a " + sgl.gen("tool", choices=["calculator", "search engine"]) + ". "
if s["tool"] == "calculator":
s += "The math expression is" + sgl.gen("expression")
elif s["tool"] == "search engine":
s += "The key word to search is" + sgl.gen("word")
Execution Modes
Single Execution: Run a single request and get results
state = multi_turn_question.run(
question_1="What is the capital of the United States?",
question_2="List two local attractions."
)
print(state["answer_1"])
Batch Processing: Process multiple inputs efficiently
states = text_qa.run_batch(
[
{"question": "What is the capital of the United Kingdom?"},
{"question": "What is the capital of France?"},
{"question": "What is the capital of Japan?"},
],
progress_bar=True,
)
Streaming: Stream outputs in real-time
state = text_qa.run(
question="What is the capital of France?",
stream=True
)
for out in state.text_iter():
print(out, end="", flush=True)
Async Streaming: Asynchronous iteration for concurrent applications
import asyncio
async def async_stream():
state = multi_turn_question.run(
question_1="What is the capital of the United States?",
question_2="List two local attractions.",
stream=True,
)
async for out in state.text_async_iter(var_name="answer_2"):
print(out, end="", flush=True)
asyncio.run(async_stream())
Core Concepts
State Object
The state object (s) is the central construct in SGLang functions. It maintains:
- The conversation history
- Generated variables and their values
- Role context (system, user, assistant)
- Images and video data for multimodal models
Variables
Generated text is automatically stored in named variables:
s += sgl.gen("answer", max_tokens=100)
print(s["answer"]) # Access the generated text
Composition
SGLang functions can be composed and reused:
@sgl.function
def inner_function(s, topic):
s += f"Tell me about {topic}: "
s += sgl.gen("description", max_tokens=50)
@sgl.function
def outer_function(s, topic1, topic2):
s += inner_function(topic=topic1)
s += inner_function(topic=topic2)
Constrained Generation
SGLang supports various forms of constrained generation:
Choice Selection: Choose from predefined options
s += sgl.gen("tool", choices=["calculator", "search engine"])
Regular Expressions: Constrain output format with regex
s += sgl.gen(
"ip_address",
regex=r"((25[0-5]|2[0-4]\d|[01]?\d\d?)\.)" +
r"{3}(25[0-5]|2[0-4]\d|[01]?\d\d?)"
)
JSON Schema: Generate structured JSON output
from pydantic import BaseModel
from sglang.srt.constrained.outlines_backend import build_regex_from_object
class Character(BaseModel):
name: str
age: int
role: str
s += sgl.gen("character", regex=build_regex_from_object(Character))
Next Steps