- 环境与设备配置:H20*8(96G)
MODEL_ID=Qwen/Qwen3-VL-30B-A3B-Instruct
MODEL_NAME=Qwen3-VL-30B-A3B-Instruct
python3 -m vllm.entrypoints.openai.api_server \--model $MODEL_ID \--served-model-name $MODEL_NAME \--tensor-parallel-size 8 \--mm-encoder-tp-mode data \--limit-mm-per-prompt.video 0 \--mm-processor-cache-type shm \--enable-expert-parallel \--host 0.0.0.0 \--port 22002 \--dtype bfloat16 \--gpu-memory-utilization 0.75 \--quantization fp8 \--distributed-executor-backend mp
请求推理