Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/sgl-project/sglang/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The sglang generate command runs inference on multimodal diffusion models. This command is currently supported only for diffusion models and provides a convenient way to generate images, videos, or other outputs without starting a server.

Basic Usage

sglang generate --model-path <model-name-or-path> --prompt "your prompt" [options]
Alternatively, you can use a configuration file:
sglang generate --config config.json

Required Arguments

--model-path
string
required
Path or name of the diffusion model to use. Can be:
  • HuggingFace model ID (e.g., stabilityai/stable-diffusion-xl-base-1.0)
  • Local path to model directory
  • ModelScope model ID (when using SGLANG_USE_MODELSCOPE=1)
--prompt
string
required
Text prompt describing what to generate.

Model Configuration

--config
string
Path to a JSON or YAML configuration file containing model and generation parameters. When provided, --model-path and --prompt become optional.
--model-id
string
Explicit model ID override (e.g., “Qwen-Image”).
--backend
string
default:"auto"
Model backend to use. Options:
  • auto: Automatically select backend (prefer sglang native, fallback to diffusers)
  • sglang: Use sglang’s native optimized implementation
  • diffusers: Use vanilla diffusers pipeline (supports all diffusers models)
--trust-remote-code
boolean
default:"false"
Trust remote code from HuggingFace.
--revision
string
Model revision (branch/tag name or commit ID).

Sampling Parameters

Generation Settings

--negative-prompt
string
Negative prompt to guide what not to generate.
--num-inference-steps
integer
default:"50"
Number of denoising steps. More steps generally produce higher quality but take longer.
--guidance-scale
float
default:"7.5"
Guidance scale for classifier-free guidance. Higher values follow the prompt more closely.
--height
integer
Output height in pixels.
--width
integer
Output width in pixels.
--seed
integer
Random seed for reproducibility.

Batch Generation

--num-samples
integer
default:"1"
Number of samples to generate.

Parallelism Options

--num-gpus
integer
default:"1"
Number of GPUs to use for inference.
--tp-size
integer
Tensor parallelism size.
--sp-degree
integer
Sequence parallelism degree.
--ulysses-degree
integer
Ulysses sequence parallelism degree for long sequences.
--ring-degree
integer
Ring sequence parallelism degree.
--dp-size
integer
default:"1"
Data parallelism size (number of data parallel groups).
--dp-degree
integer
default:"1"
Number of GPUs in a data parallel group.
--enable-cfg-parallel
boolean
default:"false"
Enable classifier-free guidance parallelism.

Attention Backend

--attention-backend
string
Attention backend to use for the model.
--attention-backend-config
string
Additional configuration for the attention backend (JSON format).
--cache-dit-config
string
Cache-DIT configuration for diffusers backend.

CPU Offloading

--dit-cpu-offload
boolean
Offload DiT (Diffusion Transformer) model to CPU to save GPU memory.
--dit-layerwise-offload
boolean
Enable layer-wise offloading for DiT model.
--text-encoder-cpu-offload
boolean
Offload text encoder to CPU.
--image-encoder-cpu-offload
boolean
Offload image encoder to CPU.
--vae-cpu-offload
boolean
Offload VAE (Variational AutoEncoder) to CPU.

LoRA Adapters

--lora-path
string
Path to LoRA adapter weights.
--lora-nickname
string
default:"default"
Nickname for the LoRA adapter (for swapping adapters in the pipeline).
--lora-scale
float
default:"1.0"
LoRA scale for merging (e.g., 0.125 for Hyper-SD).
--lora-target-modules
string
List of module names to apply LoRA to (e.g., “q_proj,k_proj”).

Quantization

--transformer-weights-path
string
Path to pre-quantized transformer weights (single .safetensors file or directory).
--nunchaku-config
string
Nunchaku SVDQuant configuration for model quantization.

Performance Options

--enable-torch-compile
boolean
default:"false"
Enable PyTorch compilation for faster inference.
--warmup
boolean
default:"false"
Run warmup iterations before generation.
--warmup-steps
integer
default:"1"
Number of warmup steps to run.
--disable-autocast
boolean
Disable automatic mixed precision.

Output Options

--output-path
string
default:"outputs/"
Directory path to save generated outputs.
--perf-dump-path
string
Path to dump performance metrics (JSON) for the run.

Advanced Options

--diffusers-kwargs
string
Additional keyword arguments to pass to the diffusers pipeline (JSON format).Example: --diffusers-kwargs '{"eta": 0.5, "use_karras_sigmas": true}'
--component-paths
string
Override paths for specific pipeline components (JSON format).Example: --component-paths '{"vae": "path/to/custom/vae"}'
--pipeline-class-name
string
Override the pipeline class from model_index.json.

Examples

Basic Image Generation

sglang generate \
  --model-path stabilityai/stable-diffusion-xl-base-1.0 \
  --prompt "A serene landscape with mountains and a lake at sunset"

High-Quality Generation with Custom Settings

sglang generate \
  --model-path stabilityai/stable-diffusion-xl-base-1.0 \
  --prompt "A futuristic city with flying cars" \
  --negative-prompt "blurry, low quality, distorted" \
  --num-inference-steps 100 \
  --guidance-scale 9.0 \
  --height 1024 \
  --width 1024 \
  --seed 42

Multi-GPU Inference

sglang generate \
  --model-path stabilityai/stable-diffusion-xl-base-1.0 \
  --prompt "A beautiful forest scene" \
  --num-gpus 4 \
  --sp-degree 2 \
  --tp-size 2

Batch Generation

sglang generate \
  --model-path stabilityai/stable-diffusion-xl-base-1.0 \
  --prompt "Abstract art with vibrant colors" \
  --num-samples 4 \
  --seed 42

Using LoRA Adapters

sglang generate \
  --model-path stabilityai/stable-diffusion-xl-base-1.0 \
  --prompt "Anime style character portrait" \
  --lora-path path/to/anime-lora \
  --lora-scale 0.8

CPU Offloading for Large Models

sglang generate \
  --model-path stabilityai/stable-diffusion-xl-base-1.0 \
  --prompt "A detailed photograph of nature" \
  --dit-cpu-offload \
  --vae-cpu-offload \
  --text-encoder-cpu-offload

Using Configuration File

Create a config file generation_config.json:
{
  "model_path": "stabilityai/stable-diffusion-xl-base-1.0",
  "prompt": "A majestic dragon flying over mountains",
  "negative_prompt": "blurry, low quality",
  "num_inference_steps": 50,
  "guidance_scale": 7.5,
  "height": 1024,
  "width": 1024,
  "seed": 12345,
  "num_samples": 2
}
Then run:
sglang generate --config generation_config.json

Performance Benchmarking

sglang generate \
  --model-path stabilityai/stable-diffusion-xl-base-1.0 \
  --prompt "Test image" \
  --perf-dump-path performance_metrics.json \
  --warmup \
  --warmup-steps 3

Output

Generated outputs are saved to the specified output directory (default: outputs/). The command will display generation progress and save:
  • Generated images/videos in the output directory
  • Performance metrics (if --perf-dump-path is specified)
Example output:
INFO: Loading model: stabilityai/stable-diffusion-xl-base-1.0
INFO: Model loaded successfully
INFO: Generating with prompt: "A serene landscape..."
INFO: Progress: 100% [50/50 steps]
INFO: Generated image saved to: outputs/generated_image_0.png
INFO: Total generation time: 3.45s

Limitations

The generate command is currently only supported for diffusion models. For language models, use the sglang serve command to start a server and make API requests.

Help

To see all available options:
sglang generate --help