关于INTEL 的B70 PRO。
-
请把俩代码合一块。就可以了。
{ "id": 26, "type": "SaveImage", "pos": [ 1145.1501729206636, 195.07454751045992 ], "size": [ 267.9266338657344, 433.279270302052 ], "flags": {}, "order": 21, "mode": 0, "inputs": [ { "label": "图像组", "name": "images", "type": "IMAGE", "link": 29 }, { "label": "文件名前缀", "name": "filename_prefix", "type": "STRING", "widget": { "name": "filename_prefix" }, "link": null } ], "outputs": [], "properties": { "cnr_id": "comfy-core", "ver": "0.6.0", "Node name for S&R": "SaveImage", "ue_properties": { "widget_ue_connectable": {}, "input_ue_unconnectable": {}, "version": "7.5.2" }, "ttNbgOverride": { "color": "#332922", "bgcolor": "#593930", "groupcolor": "#b06634" } }, "widgets_values": [ "ComfyUI" ], "color": "#332922", "bgcolor": "#593930" }, { "id": 11, "type": "PreviewImage", "pos": [ 1487.4226159090695, -382.86645688700804 ], "size": [ 861.0544444444449, 1199.7606666666666 ], "flags": {}, "order": 20, "mode": 0, "inputs": [ { "label": "图像组", "name": "images", "type": "IMAGE", "link": 17 } ], "outputs": [], "properties": { "cnr_id": "comfy-core", "ver": "0.6.0", "Node name for S&R": "PreviewImage", "ue_properties": { "widget_ue_connectable": {}, "version": "7.5.2", "input_ue_unconnectable": {} }, "ttNbgOverride": { "color": "#332922", "bgcolor": "#593930", "groupcolor": "#b06634" } }, "widgets_values": [], "color": "#332922", "bgcolor": "#593930" }, { "id": 25, "type": "Note", "pos": [ 2890.282933565041, -663.9119545246833 ], "size": [ 421.42547299979583, 1472.8439775316037 ], "flags": {}, "order": 3, "mode": 0, "inputs": [], "outputs": [], "properties": { "ue_properties": { "widget_ue_connectable": {}, "version": "7.5.2", "input_ue_unconnectable": {} } }, "widgets_values": [ "正面视图低角度特写:<sks> front view low-angle shot close-up\n\n右前侧视图低角度特写:<sks> front-right quarter view low-angle shot close-up\n\n右侧视图低角度特写:<sks> right side view low-angle shot close-up\n\n右后侧视图低角度特写:<sks> back-right quarter view low-angle shot close-up\n\n背面视图低角度特写:<sks> back view low-angle shot close-up\n\n左后侧视图低角度特写:<sks> back-left quarter view low-angle shot close-up\n\n左侧视图低角度特写:<sks> left side view low-angle shot close-up\n\n左前侧视图低角度特写:<sks> front-left quarter view low-angle shot close-up\n\n正面视图平视特写:<sks> front view eye-level shot close-up\n\n右前侧视图平视特写:<sks> front-right quarter view eye-level shot close-up\n\n右侧视图平视特写:<sks> right side view eye-level shot close-up\n\n右后侧视图平视特写:<sks> back-right quarter view eye-level shot close-up\n\n背面视图平视特写:<sks> back view eye-level shot close-up\n\n左后侧视图平视特写:<sks> back-left quarter view eye-level shot close-up\n\n左侧视图平视特写:<sks> left side view eye-level shot close-up\n\n左前侧视图平视特写:<sks> front-left quarter view eye-level shot close-up\n\n正面视图高位拍摄特写:<sks> front view elevated shot close-up\n\n右前侧视图高位拍摄特写:<sks> front-right quarter view elevated shot close-up\n\n右侧视图高位拍摄特写:<sks> right side view elevated shot close-up\n\n右后侧视图高位拍摄特写:<sks> back-right quarter view elevated shot close-up\n\n背面视图高位拍摄特写:<sks> back view elevated shot close-up\n\n左后侧视图高位拍摄特写:<sks> back-left quarter view elevated shot close-up\n\n左侧视图高位拍摄特写:<sks> left side view elevated shot close-up\n\n左前侧视图高位拍摄特写:<sks> front-left quarter view elevated shot close-up\n\n正面视图高角度特写:<sks> front view high-angle shot close-up\n\n右前侧视图高角度特写:<sks> front-right quarter view high-angle shot close-up\n\n右侧视图高角度特写:<sks> right side view high-angle shot close-up\n\n右后侧视图高角度特写:<sks> back-right quarter view high-angle shot close-up\n\n背面视图高角度特写:<sks> back view high-angle shot close-up\n\n左后侧视图高角度特写:<sks> back-left quarter view high-angle shot close-up\n\n左侧视图高角度特写:<sks> left side view high-angle shot close-up\n\n左前侧视图高角度特写:<sks> front-left quarter view high-angle shot close-up\n\n正面视图低角度中景:<sks> front view low-angle shot medium shot\n\n右前侧视图低角度中景:<sks> front-right quarter view low-angle shot medium shot\n\n右侧视图低角度中景:<sks> right side view low-angle shot medium shot\n\n右后侧视图低角度中景:<sks> back-right quarter view low-angle shot medium shot\n\n背面视图低角度中景:<sks> back view low-angle shot medium shot\n\n左后侧视图低角度中景:<sks> back-left quarter view low-angle shot medium shot\n\n左侧视图低角度中景:<sks> left side view low-angle shot medium shot\n\n左前侧视图低角度中景:<sks> front-left quarter view low-angle shot medium shot\n\n正面视图平视中景:<sks> front view eye-level shot medium shot\n\n右前侧视图平视中景:<sks> front-right quarter view eye-level shot medium shot\n\n右侧视图平视中景:<sks> right side view eye-level shot medium shot\n\n右后侧视图平视中景:<sks> back-right quarter view eye-level shot medium shot\n\n背面视图平视中景:<sks> back view eye-level shot medium shot\n\n左后侧视图平视中景:<sks> back-left quarter view eye-level shot medium shot\n\n左侧视图平视中景:<sks> left side view eye-level shot medium shot\n\n左前侧视图平视中景:<sks> front-left quarter view eye-level shot medium shot\n\n正面视图高位拍摄中景:<sks> front view elevated shot medium shot\n\n右前侧视图高位拍摄中景:<sks> front-right quarter view elevated shot medium shot\n\n右侧视图高位拍摄中景:<sks> right side view elevated shot medium shot\n\n右后侧视图高位拍摄中景:<sks> back-right quarter view elevated shot medium shot\n\n背面视图高位拍摄中景:<sks> back view elevated shot medium shot\n\n左后侧视图高位拍摄中景:<sks> back-left quarter view elevated shot medium shot\n\n左侧视图高位拍摄中景:<sks> left side view elevated shot medium shot\n\n左前侧视图高位拍摄中景:<sks> front-left quarter view elevated shot medium shot\n\n正面视图高角度中景:<sks> front view high-angle shot medium shot\n\n右前侧视图高角度中景:<sks> front-right quarter view high-angle shot medium shot\n\n右侧视图高角度中景:<sks> right side view high-angle shot medium shot\n\n右后侧视图高角度中景:<sks> back-right quarter view high-angle shot medium shot\n\n背面视图高角度中景:<sks> back view high-angle shot medium shot\n\n左后侧视图高角度中景:<sks> back-left quarter view high-angle shot medium shot\n\n左侧视图高角度中景:<sks> left side view high-angle shot medium shot\n\n左前侧视图高角度中景:<sks> front-left quarter view high-angle shot medium shot\n\n正面视图低角度广角:<sks> front view low-angle shot wide shot\n\n右前侧视图低角度广角:<sks> front-right quarter view low-angle shot wide shot\n\n右侧视图低角度广角:<sks> right side view low-angle shot wide shot\n\n右后侧视图低角度广角:<sks> back-right quarter view low-angle shot wide shot\n\n背面视图低角度广角:<sks> back view low-angle shot wide shot\n\n左后侧视图低角度广角:<sks> back-left quarter view low-angle shot wide shot\n\n左侧视图低角度广角:<sks> left side view low-angle shot wide shot\n\n左前侧视图低角度广角:<sks> front-left quarter view low-angle shot wide shot" ], "color": "#223", "bgcolor": "#335" }, { "id": 27, "type": "Note", "pos": [ -1009.7430949133999, -667.218084639018 ], "size": [ 3344.867777777771, 172.79096969696934 ], "flags": {}, "order": 4, "mode": 0, "inputs": [], "outputs": [], "properties": { "ue_properties": { "widget_ue_connectable": {}, "version": "7.8", "input_ue_unconnectable": {} } }, "widgets_values": [ "可以使用comfuyi-lumi-batcher 来跑各个角度的,一下子跑出去90条。\n在这个流里,直接换位置,也就是把位置选成节点19,之后把参数选成左边复制的提示信息就行. william" ], "color": "#232", "bgcolor": "#353" }, { "id": 20, "type": "Note", "pos": [ 2392.2425113410113, -661.9303860055464 ], "size": [ 466.503515625, 1468.5173727560402 ], "flags": {}, "order": 5, "mode": 0, "inputs": [], "outputs": [], "title": "All prompt possible for the Lora Qwen image edit multiple angles", "properties": { "ue_properties": { "widget_ue_connectable": {}, "version": "7.5.2", "input_ue_unconnectable": {} } }, "widgets_values": [ "<sks> front view low-angle shot close-up\n<sks> front-right quarter view low-angle shot close-up\n<sks> right side view low-angle shot close-up\n<sks> back-right quarter view low-angle shot close-up\n<sks> back view low-angle shot close-up\n<sks> back-left quarter view low-angle shot close-up\n<sks> left side view low-angle shot close-up\n<sks> front-left quarter view low-angle shot close-up\n<sks> front view eye-level shot close-up\n<sks> front-right quarter view eye-level shot close-up\n<sks> right side view eye-level shot close-up\n<sks> back-right quarter view eye-level shot close-up\n<sks> back view eye-level shot close-up\n<sks> back-left quarter view eye-level shot close-up\n<sks> left side view eye-level shot close-up\n<sks> front-left quarter view eye-level shot close-up\n<sks> front view elevated shot close-up\n<sks> front-right quarter view elevated shot close-up\n<sks> right side view elevated shot close-up\n<sks> back-right quarter view elevated shot close-up\n<sks> back view elevated shot close-up\n<sks> back-left quarter view elevated shot close-up\n<sks> left side view elevated shot close-up\n<sks> front-left quarter view elevated shot close-up\n<sks> front view high-angle shot close-up\n<sks> front-right quarter view high-angle shot close-up\n<sks> right side view high-angle shot close-up\n<sks> back-right quarter view high-angle shot close-up\n<sks> back view high-angle shot close-up\n<sks> back-left quarter view high-angle shot close-up\n<sks> left side view high-angle shot close-up\n<sks> front-left quarter view high-angle shot close-up\n<sks> front view low-angle shot medium shot\n<sks> front-right quarter view low-angle shot medium shot\n<sks> right side view low-angle shot medium shot\n<sks> back-right quarter view low-angle shot medium shot\n<sks> back view low-angle shot medium shot\n<sks> back-left quarter view low-angle shot medium shot\n<sks> left side view low-angle shot medium shot\n<sks> front-left quarter view low-angle shot medium shot\n<sks> front view eye-level shot medium shot\n<sks> front-right quarter view eye-level shot medium shot\n<sks> right side view eye-level shot medium shot\n<sks> back-right quarter view eye-level shot medium shot\n<sks> back view eye-level shot medium shot\n<sks> back-left quarter view eye-level shot medium shot\n<sks> left side view eye-level shot medium shot\n<sks> front-left quarter view eye-level shot medium shot\n<sks> front view elevated shot medium shot\n<sks> front-right quarter view elevated shot medium shot\n<sks> right side view elevated shot medium shot\n<sks> back-right quarter view elevated shot medium shot\n<sks> back view elevated shot medium shot\n<sks> back-left quarter view elevated shot medium shot\n<sks> left side view elevated shot medium shot\n<sks> front-left quarter view elevated shot medium shot\n<sks> front view high-angle shot medium shot\n<sks> front-right quarter view high-angle shot medium shot\n<sks> right side view high-angle shot medium shot\n<sks> back-right quarter view high-angle shot medium shot\n<sks> back view high-angle shot medium shot\n<sks> back-left quarter view high-angle shot medium shot\n<sks> left side view high-angle shot medium shot\n<sks> front-left quarter view high-angle shot medium shot\n<sks> front view low-angle shot wide shot\n<sks> front-right quarter view low-angle shot wide shot\n<sks> right side view low-angle shot wide shot\n<sks> back-right quarter view low-angle shot wide shot\n<sks> back view low-angle shot wide shot\n<sks> back-left quarter view low-angle shot wide shot\n<sks> left side view low-angle shot wide shot\n<sks> front-left quarter view low-angle shot wide shot\n<sks> front view eye-level shot wide shot\n<sks> front-right quarter view eye-level shot wide shot\n<sks> right side view eye-level shot wide shot\n<sks> back-right quarter view eye-level shot wide shot\n<sks> back view eye-level shot wide shot\n<sks> back-left quarter view eye-level shot wide shot\n<sks> left side view eye-level shot wide shot\n<sks> front-left quarter view eye-level shot wide shot\n<sks> front view elevated shot wide shot\n<sks> front-right quarter view elevated shot wide shot\n<sks> right side view elevated shot wide shot\n<sks> back-right quarter view elevated shot wide shot\n<sks> back view elevated shot wide shot\n<sks> back-left quarter view elevated shot wide shot\n<sks> left side view elevated shot wide shot\n<sks> front-left quarter view elevated shot wide shot\n<sks> front view high-angle shot wide shot\n<sks> front-right quarter view high-angle shot wide shot\n<sks> right side view high-angle shot wide shot\n<sks> back-right quarter view high-angle shot wide shot\n<sks> back view high-angle shot wide shot\n<sks> back-left quarter view high-angle shot wide shot\n<sks> left side view high-angle shot wide shot\n<sks> front-left quarter view high-angle shot wide shot" ], "color": "#232", "bgcolor": "#353" }, { "id": 13, "type": "TextEncodeQwenImageEditPlus", "pos": [ 346.3903358490907, -56.08507473664949 ], "size": [ 400.4109260819468, 258.660770021565 ], "flags": {}, "order": 13, "mode": 0, "inputs": [ { "label": "CLIP", "name": "clip", "type": "CLIP", "link": 18 }, { "label": "VAE", "name": "vae", "shape": 7, "type": "VAE", "link": 19 }, { "label": "图像1", "name": "image1", "shape": 7, "type": "IMAGE", "link": 20 }, { "label": "图像2", "name": "image2", "shape": 7, "type": "IMAGE", "link": null }, { "label": "图像3", "name": "image3", "shape": 7, "type": "IMAGE", "link": null }, { "label": "提示词", "name": "prompt", "type": "STRING", "widget": { "name": "prompt" }, "link": 30 } ], "outputs": [ { "label": "条件", "name": "CONDITIONING", "type": "CONDITIONING", "links": [ 3 ] } ], "title": "TextEncodeQwenImageEditPlus (Positive)", "properties": { "cnr_id": "comfy-core", "ver": "0.5.1", "Node name for S&R": "TextEncodeQwenImageEditPlus", "ue_properties": { "widget_ue_connectable": { "prompt": true }, "version": "7.5.2", "input_ue_unconnectable": {} }, "enableTabs": false, "tabWidth": 65, "tabXOffset": 10, "hasSecondTab": false, "secondTabText": "Send Back", "secondTabOffset": 80, "secondTabWidth": 65 }, "widgets_values": [ "<sks> front view low-angle shot close-up" ], "color": "#232", "bgcolor": "#353" }, { "id": 19, "type": "VNCCS_VisualPositionControl", "pos": [ -135.64643889289317, 113.93573315518134 ], "size": [ 377.25527938354924, 400.69737743530504 ], "flags": {}, "order": 6, "mode": 0, "inputs": [], "outputs": [ { "name": "prompt", "type": "STRING", "links": [ 30 ] } ], "properties": { "cnr_id": "vnccs-utils", "ver": "e8899e8fda5e72744198efecdc6f74f7d88a3b6a", "Node name for S&R": "VNCCS_VisualPositionControl", "ue_properties": { "widget_ue_connectable": { "camera_data": true }, "version": "7.5.2", "input_ue_unconnectable": {} }, "ttNbgOverride": { "color": "#332922", "bgcolor": "#593930", "groupcolor": "#b06634" } }, "widgets_values": [ "{\"azimuth\":225,\"elevation\":-30,\"distance\":\"close-up\",\"include_trigger\":true}", "" ], "color": "#332922", "bgcolor": "#593930" }, { "id": 12, "type": "LoadImage", "pos": [ -1045.7126666666695, -374.68955555555544 ], "size": [ 850, 1220 ], "flags": {}, "order": 7, "mode": 0, "inputs": [ { "label": "图像", "name": "image", "type": "COMBO", "widget": { "name": "image" }, "link": null }, { "label": "上传", "name": "upload", "type": "IMAGEUPLOAD", "widget": { "name": "upload" }, "link": null } ], "outputs": [ { "label": "图像", "name": "IMAGE", "type": "IMAGE", "links": [ 10 ] }, { "label": "遮罩", "name": "MASK", "type": "MASK", "links": null } ], "properties": { "cnr_id": "comfy-core", "ver": "0.5.1", "Node name for S&R": "LoadImage", "ue_properties": { "widget_ue_connectable": { "image": true, "upload": true }, "version": "7.5.2", "input_ue_unconnectable": {} }, "enableTabs": false, "tabWidth": 65, "tabXOffset": 10, "hasSecondTab": false, "secondTabText": "Send Back", "secondTabOffset": 80, "secondTabWidth": 65, "ttNbgOverride": { "color": "#332922", "bgcolor": "#593930", "groupcolor": "#b06634" }, "#sdppp_variant": "default", "#sdppp_simple_content": "canvas", "#sdppp_simple_mask": "canvas", "#sdppp_simple_boundary": "canvas", "#sdppp_label": "" }, "widgets_values": [ "微信图片_20260515114607_5418_3.png", "image" ], "color": "#332922", "bgcolor": "#593930" } ], "links": [ [ 1, 15, 0, 1, 0, "MODEL" ], [ 2, 5, 0, 2, 0, "CONDITIONING" ], [ 3, 13, 0, 3, 0, "CONDITIONING" ], [ 4, 1, 0, 4, 0, "MODEL" ], [ 5, 18, 0, 5, 0, "CLIP" ], [ 6, 14, 0, 5, 1, "VAE" ], [ 7, 8, 0, 5, 2, "IMAGE" ], [ 8, 8, 0, 7, 0, "IMAGE" ], [ 9, 14, 0, 7, 1, "VAE" ], [ 10, 12, 0, 8, 0, "IMAGE" ], [ 11, 4, 0, 9, 0, "MODEL" ], [ 12, 3, 0, 9, 1, "CONDITIONING" ], [ 13, 2, 0, 9, 2, "CONDITIONING" ], [ 14, 7, 0, 9, 3, "LATENT" ], [ 15, 9, 0, 10, 0, "LATENT" ], [ 16, 14, 0, 10, 1, "VAE" ], [ 17, 10, 0, 11, 0, "IMAGE" ], [ 18, 18, 0, 13, 0, "CLIP" ], [ 19, 14, 0, 13, 1, "VAE" ], [ 20, 8, 0, 13, 2, "IMAGE" ], [ 22, 16, 0, 15, 0, "MODEL" ], [ 23, 17, 0, 16, 0, "MODEL" ], [ 29, 10, 0, 26, 0, "IMAGE" ], [ 30, 19, 0, 13, 5, "STRING" ] ], "groups": [], "config": {}, "extra": { "workflowRendererVersion": "LG", "ue_links": [], "links_added_by_ue": [], "ds": { "scale": 0.40909090909091006, "offset": [ 1435.9227173859654, 918.2021927921285 ] }, "frontendVersion": "1.43.18", "VHS_latentpreview": false, "VHS_latentpreviewrate": 0, "VHS_MetadataImage": true, "VHS_KeepIntermediate": true }, "version": 0.4 } -

模型用的官方原版的模型,没有量化。 下载地址:https://huggingface.co/Qwen/Qwen3.6-27B/tree/main 一共 55.6G
-
T terry 固定了该主题
-
4080s 带宽比较大 算例更好 应该比较好
4080 快过3090
B70 不如3090
4080还支持FP8
缺点没有保家以上资料来源都是AI
-
手里有INTEL 的 B70PRO 显卡,新发布的 32G显存。
可以用comfyui,用 z-image 生图,会强过4090, 但LTX/WAN上边,没办法720视频,适配的一塌糊涂。我都快没有信心去测试了。 comfyui也没办法更新。我正在调试。调试完之后第一时间来发报告。 -
你的这个信息很有价值,如果仅仅生图就能强过4090的话,已经可以有很多本地的事情能做了。LTX/WAN这边,不知道480P的测试数据如何?如果480P已经可行的话,对于一些手机短视频,我觉得已经满足了。
-
分享一下单卡跑llmscaler数据
周末把 Qwen3.6-27B 调到了一个对于 Agentic Loop 来说还算能接受的状态。比较系统的跑了一下单请求和并行 5 rep的benchmark。pp速度还可以,但 tg还是有点慢。不过配合 vLLM 的 continuous batching,并行 token 生成整体还比较稳定。目前专门用来给Hermes agent的delegate task去收集代码库context打下手目前唯一比较大的问题是:KV Cache 必须使用 BF16,才能达到可用的 token generation 速度,但ctx就只有43000了。另外还需要骗 vLLM,让它识别 layer architecture。希望未来能有优化过的 FP8 dequant kernel去支持fp8的kvcache。fp8的dequant比Q8_0慢很多,可惜官方docker的vllm版本还不支持除了fp8和bf16以外的kvcache dtype。可惜它和7900xtx都没有fp8的硬件支持,好像r9700有。另外autoround质量还是稍微比不过Q4的gguf
硬件比较旧 64g的ddr4 虽然比较慢,但总比 pcie4x16 快。proxmox 9.1
vLLM 单请求 qwen/qwen3.6-27b(int4 AutoRound):
PP TTFT:1,685 ms
PP2048 TPS:1,686 ± 66 tok/s
TG512:13.7 ± 1.4 tok/s
并行测试 pp2048 tg512
Conc: 1
• TTFT(ms): 1,261
• Prefill(tok/s): 1,400
• Decode(tok/s): 13.3
• Output(tok/s): 12.9• Conc: 2
• TTFT(ms): 1,907
• Prefill(tok/s): 925
• Decode(tok/s): 12.9
• Output(tok/s): 24.7• Conc: 4
• TTFT(ms): 3,319
• Prefill(tok/s): 532
• Decode(tok/s): 12.7
• Output(tok/s): 46.7• Conc: 8
• TTFT(ms): 6,231
• Prefill(tok/s): 283
• Decode(tok/s): 11.9
• Output(tok/s): 82.7docker run 命令:
docker run -it --rm --name vllmb70 --ipc=host --shm-size=32g
--device=/dev/dri:/dev/dri --privileged -p 1234:8000
-v ~/.cache/huggingface:/root/.cache/huggingface
-e VLLM_TARGET_DEVICE=xpu
--entrypoint /bin/bash intel/llm-scaler-vllm:0.14.0-b8.2.1 -c "
source /opt/intel/oneapi/setvars.sh --force &&
sed -i 's/image_processor.max_pixels/getattr(image_processor, "max_pixels", 12845056)/g'
/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen2_vl.py &&
python3 -m vllm.entrypoints.openai.api_server
--model Intel/Qwen3.6-27B-int4-AutoRound
--tokenizer Qwen/Qwen3.6-27B
--served-model-name qwen/qwen3.6-27b
--kv-cache-dtype auto
--max-model-len 65536
--gpu-memory-utilization 0.9
--enable-auto-tool-choice
--tool-call-parser qwen3_xml
--allow-deprecated-quantization
--trust-remote-code
--port 8000
--tensor-parallel-size 1
--pipeline-parallel-size 1
--enforce-eager
"也跑了一下ltx2.3 full gpu offload比4070需要dynamic loading快10%左右 custom node很多不支持 暂时不值得折腾
-
分享一下单卡跑llmscaler数据
周末把 Qwen3.6-27B 调到了一个对于 Agentic Loop 来说还算能接受的状态。比较系统的跑了一下单请求和并行 5 rep的benchmark。pp速度还可以,但 tg还是有点慢。不过配合 vLLM 的 continuous batching,并行 token 生成整体还比较稳定。目前专门用来给Hermes agent的delegate task去收集代码库context打下手目前唯一比较大的问题是:KV Cache 必须使用 BF16,才能达到可用的 token generation 速度,但ctx就只有43000了。另外还需要骗 vLLM,让它识别 layer architecture。希望未来能有优化过的 FP8 dequant kernel去支持fp8的kvcache。fp8的dequant比Q8_0慢很多,可惜官方docker的vllm版本还不支持除了fp8和bf16以外的kvcache dtype。可惜它和7900xtx都没有fp8的硬件支持,好像r9700有。另外autoround质量还是稍微比不过Q4的gguf
硬件比较旧 64g的ddr4 虽然比较慢,但总比 pcie4x16 快。proxmox 9.1
vLLM 单请求 qwen/qwen3.6-27b(int4 AutoRound):
PP TTFT:1,685 ms
PP2048 TPS:1,686 ± 66 tok/s
TG512:13.7 ± 1.4 tok/s
并行测试 pp2048 tg512
Conc: 1
• TTFT(ms): 1,261
• Prefill(tok/s): 1,400
• Decode(tok/s): 13.3
• Output(tok/s): 12.9• Conc: 2
• TTFT(ms): 1,907
• Prefill(tok/s): 925
• Decode(tok/s): 12.9
• Output(tok/s): 24.7• Conc: 4
• TTFT(ms): 3,319
• Prefill(tok/s): 532
• Decode(tok/s): 12.7
• Output(tok/s): 46.7• Conc: 8
• TTFT(ms): 6,231
• Prefill(tok/s): 283
• Decode(tok/s): 11.9
• Output(tok/s): 82.7docker run 命令:
docker run -it --rm --name vllmb70 --ipc=host --shm-size=32g
--device=/dev/dri:/dev/dri --privileged -p 1234:8000
-v ~/.cache/huggingface:/root/.cache/huggingface
-e VLLM_TARGET_DEVICE=xpu
--entrypoint /bin/bash intel/llm-scaler-vllm:0.14.0-b8.2.1 -c "
source /opt/intel/oneapi/setvars.sh --force &&
sed -i 's/image_processor.max_pixels/getattr(image_processor, "max_pixels", 12845056)/g'
/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen2_vl.py &&
python3 -m vllm.entrypoints.openai.api_server
--model Intel/Qwen3.6-27B-int4-AutoRound
--tokenizer Qwen/Qwen3.6-27B
--served-model-name qwen/qwen3.6-27b
--kv-cache-dtype auto
--max-model-len 65536
--gpu-memory-utilization 0.9
--enable-auto-tool-choice
--tool-call-parser qwen3_xml
--allow-deprecated-quantization
--trust-remote-code
--port 8000
--tensor-parallel-size 1
--pipeline-parallel-size 1
--enforce-eager
"也跑了一下ltx2.3 full gpu offload比4070需要dynamic loading快10%左右 custom node很多不支持 暂时不值得折腾


