<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[論 迷你電腦 配合 RTX Pro 4500 的簡單測試, 以及Blackwell架構下的一些嘗試]]></title><description><![CDATA[<p dir="auto">TLDR</p>
<p dir="auto">先上個實體圖</p>
<p dir="auto">Beelink Ser 8 8745HS 用Oculink連接 RTX Pro 4500</p>
<p dir="auto">跑在Ubuntu 26.04, Kernel 7.0</p>
<p dir="auto"><img src="https://upload.lcz.me/uploads/bed300a5-8333-4c33-b410-ed432f9860a9.jpg" alt="unamed.jpg" class=" img-fluid img-markdown" /></p>
<p dir="auto">啓動咒語, <em><strong>注意這個是我在vLLM cu130 nightly (0.20)設立的, cu129 0.22估計會有更多優化, 我會試試看其他版本</strong></em></p>
<pre><code>docker run -d \
  --name vllm-Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP \
  --restart unless-stopped \
  --ipc host \
  --gpus '"device=0"' \
  -p 0.0.0.0:7380:8000 \
  -v "~/vllm/models:/models:ro" \
  -v "~/vllm/.cache/huggingface:/root/.cache/huggingface" \
  -e GPU_MEMORY_UTILIZATION="0.95" \
  -e HF_HUB_OFFLINE="1" \
  -e KV_CACHE_DTYPE="fp8" \
  -e MAX_MODEL_LEN="230400" \
  -e MODEL_PATH="/models/sakamakismile/Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP" \
  -e PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True" \
  -e SERVED_MODEL_NAME="Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP" \
  -e VLLM_ATTENTION_BACKEND="FLASHINFER" \
  -e VLLM_EXTRA_ARGS='--quantization modelopt --trust-remote-code --enable-chunked-prefill --reasoning-parser qwen3 --tool-call-parser qwen3_coder --enable-auto-tool-choice --max-num-seqs 1 --max-num-batched-tokens 4096 --speculative-config {"method":"qwen3_5_mtp","num_speculative_tokens":3} --language-model-only --performance-mode interactivity --attention-backend flashinfer --skip-mm-profiling --enable-prefix-caching --no-disable-hybrid-kv-cache-manager' \
  -e VLLM_LOGGING_LEVEL="INFO" \
  -e VLLM_NVFP4_GEMM_BACKEND="flashinfer-cutlass" \
  -e VLLM_USE_FLASHINFER_MOE_FP4="0" \
  -e VLLM_USE_FLASHINFER_SAMPLER="1" \
  --health-cmd 'curl -fsS http://localhost:8000/v1/models || exit 1' \
  --health-timeout 5s \
  --health-interval 30s \
  --health-retries 5 \
  --health-start-period 5m \
  --entrypoint /bin/bash \
  vllm/vllm-openai:cu130-nightly \
  -lc 'exec vllm serve "$MODEL_PATH" --served-model-name "$SERVED_MODEL_NAME" --host 0.0.0.0 --port 8000 --max-model-len "$MAX_MODEL_LEN" --gpu-memory-utilization "$GPU_MEMORY_UTILIZATION" --kv-cache-dtype "$KV_CACHE_DTYPE" $VLLM_EXTRA_ARGS'
</code></pre>
<p dir="auto">llama-benchy benchmark</p>
<pre><code>llama-benchy \
  --base-url "http://localhost:7380/v1" \
  --model "Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP" \
  --tokenizer "$HOME/vllm/models/sakamakismile/Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP" \
  --pp 2048 \
  --tg 480 \
  --depth 0 1000 5000 10000 20000 50000 100000 150000 200000 \    #(不同上下文長度)
  --latency-mode generation \
  --skip-coherence \
  --concurrency 1 \
</code></pre>
<p dir="auto">效果</p>
<pre><code>| model                                    |             test |               t/s |     peak t/s |         ttfr (ms) |      est_ppt (ms) |     e2e_ttft (ms) |
|:-----------------------------------------|-----------------:|------------------:|-------------:|------------------:|------------------:|------------------:|
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |           pp2048 | 7741.01 ± 1375.30 |              |    373.94 ± 54.49 |    274.26 ± 54.49 |    373.94 ± 54.49 |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |            tg480 |      68.87 ± 6.65 | 81.33 ± 3.68 |                   |                   |                   |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |   pp2048 @ d1000 |   8136.73 ± 32.84 |              |     474.32 ± 1.44 |     374.64 ± 1.44 |     474.32 ± 1.44 |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |    tg480 @ d1000 |      67.73 ± 5.06 | 88.00 ± 5.72 |                   |                   |                   |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |   pp2048 @ d5000 |   6615.23 ± 22.79 |              |    1165.21 ± 3.86 |    1065.53 ± 3.86 |    1165.21 ± 3.86 |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |    tg480 @ d5000 |      72.92 ± 3.56 | 89.33 ± 3.77 |                   |                   |                   |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |  pp2048 @ d10000 |   6008.73 ± 10.16 |              |    2104.88 ± 3.47 |    2005.20 ± 3.47 |    2104.88 ± 3.47 |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |   tg480 @ d10000 |      65.25 ± 2.21 | 82.00 ± 4.32 |                   |                   |                   |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |  pp2048 @ d20000 |    5152.21 ± 0.52 |              |    4379.13 ± 0.52 |    4279.45 ± 0.52 |    4380.19 ± 0.46 |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |   tg480 @ d20000 |      70.45 ± 1.27 | 89.67 ± 0.47 |                   |                   |                   |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |  pp2048 @ d50000 |    3690.36 ± 5.88 |              |  14203.66 ± 22.59 |  14103.98 ± 22.59 |  14205.86 ± 22.80 |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |   tg480 @ d50000 |      67.03 ± 1.67 | 84.67 ± 0.47 |                   |                   |                   |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP | pp2048 @ d100000 |    2528.58 ± 0.55 |              |   40457.51 ± 8.72 |   40357.83 ± 8.72 |   40461.50 ± 8.69 |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |  tg480 @ d100000 |      60.96 ± 0.75 | 78.33 ± 3.68 |                   |                   |                   |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP | pp2048 @ d150000 |    1922.36 ± 0.98 |              |  79194.84 ± 39.68 |  79095.17 ± 39.68 |  79201.49 ± 39.50 |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |  tg480 @ d150000 |      62.53 ± 3.29 | 76.33 ± 1.89 |                   |                   |                   |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP | pp2048 @ d200000 |    1556.00 ± 0.99 |              | 129951.65 ± 82.49 | 129851.97 ± 82.49 | 129959.72 ± 82.53 |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |  tg480 @ d200000 |      59.58 ± 1.31 | 69.67 ± 1.70 |                   |                   |                   |
</code></pre>
<hr />
<h2>碎碎唸, 講一下參數選擇邏輯</h2>
<pre><code>GPU_MEMORY_UTILIZATION =&gt; 0.95, Headless伺服器, 顯示輸出由iGPU負責
KV_CACHE_DTYPE =&gt; FP8, Ada架構以後基本統一FP8
MAX_MODEL_LEN =&gt; 230K, 之前有嘗試試過極限拉到240K左右, 但是會在部分長上下文出現OOM, 穩定點用230K

PYTORCH_CUDA_ALLOC_CONF =&gt; Pytorch實驗性參數, 透過呼叫CUDA内核API整理VRAM碎塊, 降低OOM機會
VLLM_ATTENTION_BACKEND =&gt; FLASHINFER, 很奇怪的是vLLM是推薦用這個而不是Flash Attention, 理論上在NVFP4在sm 12X (Desktop Blackwell)還沒完善下的情況用FA估計會比較好, 在sm 10X (Datacenter Blackwell)則FLASHINFER比較好

quantization =&gt; modelopt, vllm會跑去讀hf_quant_config.json裏的quant_algo, 這個模型是nvfp4
enable-chunked-prefill =&gt; 必開不解釋, 優化VRAM避免Spike導致OOM
speculative-config =&gt; 2 或者 3 都可, 激進點就用了3
skip-mm-profiling =&gt; 因爲這個模型只支持Text, 所以不需要multi model設定,省點VRAM
enable-prefix-caching =&gt; 降低TTRT
no-disable-hybrid-kv-cache-manager =&gt; 避免因爲Qwen模型的混合Attention導致挂掉

VLLM_NVFP4_GEMM_BACKEND =&gt; 叫vLLM 使用 FlashInfer/Cutlass NVFP4 kernels進行矩陣計算, Blackwell特點
VLLM_USE_FLASHINFER_MOE_FP4 (0) + VLLM_USE_FLASHINFER_SAMPLER (1) =&gt; 優化CUDA内核
</code></pre>
]]></description><link>https://lcz.me/topic/441/論-迷你電腦-配合-rtx-pro-4500-的簡單測試-以及blackwell架構下的一些嘗試</link><generator>RSS for Node</generator><lastBuildDate>Fri, 05 Jun 2026 23:48:35 GMT</lastBuildDate><atom:link href="https://lcz.me/topic/441.rss" rel="self" type="application/rss+xml"/><pubDate>Fri, 05 Jun 2026 16:13:05 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to 論 迷你電腦 配合 RTX Pro 4500 的簡單測試, 以及Blackwell架構下的一些嘗試 on Fri, 05 Jun 2026 23:16:14 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/566656661" aria-label="Profile: 566656661">@<bdi>566656661</bdi></a>  可以許願 <a href="https://microsoft.github.io/TRELLIS.2/" rel="nofollow ugc">https://microsoft.github.io/TRELLIS.2/</a> 測試嗎？<br />
剛剛跑 ROCm版堪用，但踩雷不少，等下也丟上來<br />
<a href="https://lcz.me/post/5275">https://lcz.me/post/5275</a></p>
]]></description><link>https://lcz.me/post/5273</link><guid isPermaLink="true">https://lcz.me/post/5273</guid><dc:creator><![CDATA[CS6]]></dc:creator><pubDate>Fri, 05 Jun 2026 23:16:14 GMT</pubDate></item><item><title><![CDATA[Reply to 論 迷你電腦 配合 RTX Pro 4500 的簡單測試, 以及Blackwell架構下的一些嘗試 on Fri, 05 Jun 2026 19:09:28 GMT]]></title><description><![CDATA[<p dir="auto">v0.22.1-cu129-ubuntu2404</p>
<p dir="auto"><strong><em>VLLM_NVFP4_GEMM_BACKEND 因爲deprecated, 將由linear-backend自動選擇</em></strong></p>
<p dir="auto"><strong><em>VLLM_USE_FLASHINFER_MOE_FP4 因爲deprecated, 將由moe-backend自動選擇</em></strong></p>
<p dir="auto">測試結果</p>
<pre><code>| model                                    |             test |               t/s |     peak t/s |          ttfr (ms) |       est_ppt (ms) |      e2e_ttft (ms) |
| :--------------------------------------- | ---------------: | ----------------: | -----------: | -----------------: | -----------------: | -----------------: |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |           pp2048 | 3815.72 ± 2638.08 |              |   1066.49 ± 675.13 |    946.43 ± 675.13 |   1066.49 ± 675.13 |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |            tg480 |      71.54 ± 3.67 | 89.33 ± 1.70 |                    |                    |                    |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |   pp2048 @ d1000 |  7097.86 ± 469.13 |              |     551.38 ± 27.36 |     431.33 ± 27.36 |     551.38 ± 27.36 |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |    tg480 @ d1000 |      72.91 ± 1.96 | 86.67 ± 2.05 |                    |                    |                    |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |   pp2048 @ d5000 |  6293.28 ± 200.29 |              |    1241.33 ± 35.85 |    1121.28 ± 35.85 |    1241.33 ± 35.85 |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |    tg480 @ d5000 |      71.79 ± 1.34 | 90.00 ± 0.82 |                    |                    |                    |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |  pp2048 @ d10000 |   5764.98 ± 66.54 |              |    2210.31 ± 24.36 |    2090.26 ± 24.36 |    2210.31 ± 24.36 |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |   tg480 @ d10000 |      71.77 ± 5.24 | 86.00 ± 5.35 |                    |                    |                    |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |  pp2048 @ d20000 |    5020.15 ± 9.69 |              |     4512.04 ± 8.31 |     4391.99 ± 8.31 |     4513.21 ± 8.16 |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |   tg480 @ d20000 |      74.68 ± 1.77 | 94.00 ± 2.16 |                    |                    |                    |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |  pp2048 @ d50000 |    3634.37 ± 3.95 |              |   14441.41 ± 15.57 |   14321.36 ± 15.57 |   14444.10 ± 15.13 |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |   tg480 @ d50000 |      65.42 ± 5.26 | 83.33 ± 7.41 |                    |                    |                    |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP | pp2048 @ d100000 |    2500.68 ± 0.47 |              |    40928.48 ± 7.63 |    40808.42 ± 7.63 |    40933.15 ± 7.29 |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |  tg480 @ d100000 |      73.40 ± 4.21 | 85.00 ± 2.45 |                    |                    |                    |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP | pp2048 @ d150000 |    1900.32 ± 1.39 |              |   80132.00 ± 58.27 |   80011.94 ± 58.27 |   80138.64 ± 57.60 |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |  tg480 @ d150000 |      67.87 ± 1.65 | 79.67 ± 3.30 |                    |                    |                    |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP | pp2048 @ d200000 |    1535.79 ± 1.74 |              | 131680.08 ± 149.90 | 131560.02 ± 149.90 | 131688.59 ± 149.41 |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |  tg480 @ d200000 |      56.88 ± 2.29 | 73.33 ± 2.05 |                    |                    |                    |
</code></pre>
<p dir="auto">GPT結論</p>
<blockquote>
<p dir="auto"><strong>結論</strong></p>
<p dir="auto"><code>cu130-0.20</code> 的主要優勢在 prefill throughput 和 TTFT，特別是短到中等 context 的 prompt processing。</p>
<p dir="auto">更新後的 <code>cu129-0.22</code> 在 token generation / decode throughput 上比之前更強，平均 <code>tg480</code> generation t/s 約比 <code>cu130-0.20</code> 高 <code>4.6%</code>。</p>
<p dir="auto">整體而言，若 workload 偏 prompt-heavy、RAG、長 prompt prefill，<code>cu130-0.20</code> 較合適；若 workload 偏長時間生成 token，<code>cu129-0.22</code> 較合適。</p>
</blockquote>
]]></description><link>https://lcz.me/post/5268</link><guid isPermaLink="true">https://lcz.me/post/5268</guid><dc:creator><![CDATA[566656661]]></dc:creator><pubDate>Fri, 05 Jun 2026 19:09:28 GMT</pubDate></item><item><title><![CDATA[Reply to 論 迷你電腦 配合 RTX Pro 4500 的簡單測試, 以及Blackwell架構下的一些嘗試 on Fri, 05 Jun 2026 18:32:00 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/cs6" aria-label="Profile: CS6">@<bdi>CS6</bdi></a></p>
<p dir="auto">如果像4090一樣應該可以...吧, 到時候我們就知道了</p>
]]></description><link>https://lcz.me/post/5265</link><guid isPermaLink="true">https://lcz.me/post/5265</guid><dc:creator><![CDATA[566656661]]></dc:creator><pubDate>Fri, 05 Jun 2026 18:32:00 GMT</pubDate></item><item><title><![CDATA[Reply to 論 迷你電腦 配合 RTX Pro 4500 的簡單測試, 以及Blackwell架構下的一些嘗試 on Fri, 05 Jun 2026 18:27:02 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/566656661" aria-label="Profile: 566656661">@<bdi>566656661</bdi></a>  5090D 能送去華強北魔改嗎？</p>
]]></description><link>https://lcz.me/post/5263</link><guid isPermaLink="true">https://lcz.me/post/5263</guid><dc:creator><![CDATA[CS6]]></dc:creator><pubDate>Fri, 05 Jun 2026 18:27:02 GMT</pubDate></item><item><title><![CDATA[Reply to 論 迷你電腦 配合 RTX Pro 4500 的簡單測試, 以及Blackwell架構下的一些嘗試 on Fri, 05 Jun 2026 18:26:30 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/cs6" aria-label="Profile: CS6">@<bdi>CS6</bdi></a></p>
<p dir="auto">其實我有想過RTX Pro 4500混合5090D一起使用, 畢竟兩張卡都是32GB, vLLM跑TP2不會有VRAM浪費</p>
<p dir="auto"><a href="https://discuss.vllm.ai/t/combining-2-different-gpus/1609" rel="nofollow ugc">但是vLLM表明5090D會很大機會只有一半性能</a></p>
]]></description><link>https://lcz.me/post/5262</link><guid isPermaLink="true">https://lcz.me/post/5262</guid><dc:creator><![CDATA[566656661]]></dc:creator><pubDate>Fri, 05 Jun 2026 18:26:30 GMT</pubDate></item><item><title><![CDATA[Reply to 論 迷你電腦 配合 RTX Pro 4500 的簡單測試, 以及Blackwell架構下的一些嘗試 on Fri, 05 Jun 2026 18:26:11 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/rolex-lo" aria-label="Profile: rolex-lo">@<bdi>rolex-lo</bdi></a>  我是一開始就打算雙卡，挑的主板支援 PCIe 5.0 x8 兩個...<br />
你還是考慮單卡吧，不要重複消費</p>
<p dir="auto">我這次已經浪費錢多賣了一組 DDR5 32*2 ram ，成本暴增</p>
]]></description><link>https://lcz.me/post/5261</link><guid isPermaLink="true">https://lcz.me/post/5261</guid><dc:creator><![CDATA[CS6]]></dc:creator><pubDate>Fri, 05 Jun 2026 18:26:11 GMT</pubDate></item><item><title><![CDATA[Reply to 論 迷你電腦 配合 RTX Pro 4500 的簡單測試, 以及Blackwell架構下的一些嘗試 on Fri, 05 Jun 2026 18:21:23 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/rolex-lo" aria-label="Profile: rolex-lo">@<bdi>rolex-lo</bdi></a></p>
<p dir="auto">還沒調整好, INT 4估計還能更快</p>
<p dir="auto">2張R9700走TP 2用Oculink跟 PCIe 5.0 x8 混合使用估計會出事誒, Oculink只有PCIe 4.0 x4, PCIe 5.0 x8, 結果就是只能走PCIe 4.0 x4</p>
]]></description><link>https://lcz.me/post/5260</link><guid isPermaLink="true">https://lcz.me/post/5260</guid><dc:creator><![CDATA[566656661]]></dc:creator><pubDate>Fri, 05 Jun 2026 18:21:23 GMT</pubDate></item><item><title><![CDATA[Reply to 論 迷你電腦 配合 RTX Pro 4500 的簡單測試, 以及Blackwell架構下的一些嘗試 on Fri, 05 Jun 2026 18:19:11 GMT]]></title><description><![CDATA[<p dir="auto">感謝，我太久沒關注 N卡，還停留在舊價格</p>
]]></description><link>https://lcz.me/post/5259</link><guid isPermaLink="true">https://lcz.me/post/5259</guid><dc:creator><![CDATA[CS6]]></dc:creator><pubDate>Fri, 05 Jun 2026 18:19:11 GMT</pubDate></item><item><title><![CDATA[Reply to 論 迷你電腦 配合 RTX Pro 4500 的簡單測試, 以及Blackwell架構下的一些嘗試 on Fri, 05 Jun 2026 18:17:17 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/cs6" aria-label="Profile: CS6">@<bdi>CS6</bdi></a></p>
<p dir="auto">我的是5090D版 (住香港), 而且香港現在5090D貴到快要到2萬中, 非D都起碼要3萬頭港幣了</p>
<p dir="auto">差異的話我是沒特別留意, 畢竟5090D太多時候都是試驗品 + 日常使用</p>
<p dir="auto">4500的fp16 tflops卡在5070ti 跟 5080中間, Prefill的話你可以用5070ti作爲基準加個5%左右吧.</p>
<p dir="auto">至於CP嘛, 混合日常使用跟LLM肯定是5090更好, 怕功耗600w可以用afterburner降到最低400w左右, <a href="https://www.reddit.com/r/LocalLLaMA/comments/1tcvji7/benchmark_5090rtx_promt_parsing_token_generation/" rel="nofollow ugc">引用一下這個Reddit Post</a>, 性能損失如下:</p>
<p dir="auto"><img src="https://upload.lcz.me/uploads/b900e4b9-6511-44dc-ad52-6b0a4be7e69e.jpeg" alt="4024df3a-02d0-4254-a26d-c6e02b7ad156-image.jpeg" class=" img-fluid img-markdown" /></p>
<p dir="auto"><img src="https://upload.lcz.me/uploads/66351c95-f995-4671-b4df-bf2e9c64b478.jpeg" alt="3b7d6077-abd6-4df8-8043-eaeff7f8d96d-image.jpeg" class=" img-fluid img-markdown" /></p>
]]></description><link>https://lcz.me/post/5258</link><guid isPermaLink="true">https://lcz.me/post/5258</guid><dc:creator><![CDATA[566656661]]></dc:creator><pubDate>Fri, 05 Jun 2026 18:17:17 GMT</pubDate></item><item><title><![CDATA[Reply to 論 迷你電腦 配合 RTX Pro 4500 的簡單測試, 以及Blackwell架構下的一些嘗試 on Fri, 05 Jun 2026 18:09:10 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/566656661" aria-label="Profile: 566656661">@<bdi>566656661</bdi></a> 大神 看你用oclink 也有這個tks 我用底座 beelink pci 5.0 更加定了<br />
就買4500 吧<img src="https://lcz.me/assets/plugins/nodebb-plugin-emoji/emoji/android/1f642.png?v=d348ca29232" class="not-responsive emoji emoji-android emoji--slightly_smiling_face" style="height:23px;width:auto;vertical-align:middle" title="🙂" alt="🙂" />‍<img src="https://lcz.me/assets/plugins/nodebb-plugin-emoji/emoji/android/2195.png?v=d348ca29232" class="not-responsive emoji emoji-android emoji--arrow_up_down" style="height:23px;width:auto;vertical-align:middle" title="↕" alt="↕" />️<br />
數據實測結果十分好，都肯定兩張R9700也達不到</p>
<p dir="auto">身心錢包都要痛了<img src="https://lcz.me/assets/plugins/nodebb-plugin-emoji/emoji/android/1f927.png?v=d348ca29232" class="not-responsive emoji emoji-android emoji--sneezing_face" style="height:23px;width:auto;vertical-align:middle" title="🤧" alt="🤧" /></p>
]]></description><link>https://lcz.me/post/5257</link><guid isPermaLink="true">https://lcz.me/post/5257</guid><dc:creator><![CDATA[rolex lo]]></dc:creator><pubDate>Fri, 05 Jun 2026 18:09:10 GMT</pubDate></item><item><title><![CDATA[Reply to 論 迷你電腦 配合 RTX Pro 4500 的簡單測試, 以及Blackwell架構下的一些嘗試 on Fri, 05 Jun 2026 18:03:33 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/566656661" aria-label="Profile: 566656661">@<bdi>566656661</bdi></a>  我記得你好像有張 5090 ，PRO 4500 價位也差不多，你有比較過差異跟 CP 值嗎？</p>
]]></description><link>https://lcz.me/post/5255</link><guid isPermaLink="true">https://lcz.me/post/5255</guid><dc:creator><![CDATA[CS6]]></dc:creator><pubDate>Fri, 05 Jun 2026 18:03:33 GMT</pubDate></item><item><title><![CDATA[Reply to 論 迷你電腦 配合 RTX Pro 4500 的簡單測試, 以及Blackwell架構下的一些嘗試 on Fri, 05 Jun 2026 18:02:20 GMT]]></title><description><![CDATA[<p dir="auto"><img src="https://upload.lcz.me/uploads/60352466-bffe-4bab-a564-a5ffe09c29b3.jpeg" alt="93f0f237-99a2-4a6d-91c9-6474a2ec24a1-image.jpeg" class=" img-fluid img-markdown" /></p>
<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/kop-wang" aria-label="Profile: kop-wang">@<bdi>kop-wang</bdi></a></p>
<p dir="auto">找到一個你可能感興趣的東西, <a href="https://kaitchup.substack.com/p/qwen35-quantization-similar-accuracy" rel="nofollow ugc">引用這位大神的文章</a></p>
<p dir="auto">沒有理解錯的話應該算是不同quantization下模型的精度, 原型BF16, 原型FP8,  AWQ量化的INT4, AWQ 4bit (類似GGUF Q4的概念), Autoround量化的INT4</p>
<p dir="auto">部分任務好像NVFP4的精度還滿吃虧的</p>
]]></description><link>https://lcz.me/post/5252</link><guid isPermaLink="true">https://lcz.me/post/5252</guid><dc:creator><![CDATA[566656661]]></dc:creator><pubDate>Fri, 05 Jun 2026 18:02:20 GMT</pubDate></item><item><title><![CDATA[Reply to 論 迷你電腦 配合 RTX Pro 4500 的簡單測試, 以及Blackwell架構下的一些嘗試 on Fri, 05 Jun 2026 19:29:03 GMT]]></title><description><![CDATA[<p dir="auto">基準測試</p>
<p dir="auto"><em><strong>vLLM cu130 nightly (0.20) -&gt; v0.22.1 cu129, 其餘包括benchmark不變</strong></em></p>
<p dir="auto"><em><strong>之後測試如果沒再提及Docker Image變化請默認為 v0.22.1-cu129-ubuntu2404</strong></em></p>
<p dir="auto"><s>打了瞌睡, 發現原來參數沒刪乾淨, 只能帶著舊參數 + 新docker image 跑了</s></p>
<p dir="auto">測試如下</p>
<pre><code>| model                                    |             test |               t/s |     peak t/s |          ttfr (ms) |       est_ppt (ms) |      e2e_ttft (ms) |
| :--------------------------------------- | ---------------: | ----------------: | -----------: | -----------------: | -----------------: | -----------------: |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |           pp2048 | 4112.24 ± 2335.79 |              |   1000.79 ± 713.91 |    882.88 ± 713.91 |   1000.79 ± 713.91 |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |            tg480 |      70.62 ± 0.93 | 90.67 ± 1.25 |                    |                    |                    |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |   pp2048 @ d1000 |  6522.05 ± 180.65 |              |     585.81 ± 13.00 |     467.90 ± 13.00 |     585.81 ± 13.00 |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |    tg480 @ d1000 |      72.00 ± 4.34 | 87.00 ± 0.82 |                    |                    |                    |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |   pp2048 @ d5000 |  5716.09 ± 781.76 |              |   1377.22 ± 190.64 |   1259.31 ± 190.64 |   1377.22 ± 190.64 |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |    tg480 @ d5000 |      71.20 ± 1.68 | 90.33 ± 3.40 |                    |                    |                    |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |  pp2048 @ d10000 |   5791.35 ± 64.74 |              |    2198.74 ± 23.28 |    2080.84 ± 23.28 |    2198.74 ± 23.28 |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |   tg480 @ d10000 |      70.74 ± 7.93 | 86.67 ± 4.19 |                    |                    |                    |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |  pp2048 @ d20000 |    5015.72 ± 8.10 |              |     4513.90 ± 7.10 |     4395.99 ± 7.10 |     4515.13 ± 6.99 |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |   tg480 @ d20000 |      68.54 ± 4.81 | 86.67 ± 3.68 |                    |                    |                    |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |  pp2048 @ d50000 |    3643.75 ± 3.58 |              |   14402.48 ± 14.02 |   14284.58 ± 14.02 |   14404.87 ± 13.87 |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |   tg480 @ d50000 |      71.21 ± 6.44 | 86.67 ± 1.25 |                    |                    |                    |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP | pp2048 @ d100000 |    2495.95 ± 3.04 |              |   41003.94 ± 49.73 |   40886.04 ± 49.73 |   41008.28 ± 49.60 |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |  tg480 @ d100000 |      61.24 ± 2.76 | 81.33 ± 3.86 |                    |                    |                    |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP | pp2048 @ d150000 |    1898.18 ± 0.59 |              |   80220.31 ± 24.93 |   80102.40 ± 24.93 |   80226.48 ± 24.91 |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |  tg480 @ d150000 |      63.09 ± 4.07 | 80.67 ± 4.92 |                    |                    |                    |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP | pp2048 @ d200000 |    1531.27 ± 1.25 |              | 132066.32 ± 107.58 | 131948.41 ± 107.58 | 132076.34 ± 108.43 |
| Huihui-Qwen3.6-27B-abliterated-NVFP4-MTP |  tg480 @ d200000 |      58.89 ± 1.49 | 76.67 ± 3.77 |                    |                    |                    |
</code></pre>
<p dir="auto">GPT分析</p>
<table class="table table-bordered table-striped">
<thead>
<tr>
<th>指標</th>
<th>結論</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>pp2048</code> / prefill t/s</td>
<td><code>cu130-0.20</code> 全面較快</td>
</tr>
<tr>
<td>短 context</td>
<td><code>cu130-0.20</code> 優勢最大，純 <code>pp2048</code> 約快 <code>88%</code>，<code>d1000</code> 約快 <code>25%</code></td>
</tr>
<tr>
<td>中長 context</td>
<td><code>cu130-0.20</code> 仍較快，但差距逐步縮小</td>
</tr>
<tr>
<td><code>d50000</code> 以上</td>
<td>prefill 差距只剩約 <code>1% - 2%</code></td>
</tr>
<tr>
<td><code>ttfr</code> / <code>e2e_ttft</code></td>
<td><code>cu130-0.20</code> 較低，代表首 token 等待時間較短</td>
</tr>
<tr>
<td><code>tg480</code> generation t/s</td>
<td><code>cu129-0.22</code> 平均略快，<code>cu130-0.20</code> 約慢 <code>1.8% - 1.9%</code></td>
</tr>
<tr>
<td>peak generation t/s</td>
<td><code>cu129-0.22</code> 多數情況較高</td>
</tr>
</tbody>
</table>
<p dir="auto">看起來cu130 nightly或者說整個cu130是有特別針對blackwell做優化, cu129估計是針對30跟40系優化</p>
]]></description><link>https://lcz.me/post/5243</link><guid isPermaLink="true">https://lcz.me/post/5243</guid><dc:creator><![CDATA[566656661]]></dc:creator><pubDate>Fri, 05 Jun 2026 19:29:03 GMT</pubDate></item><item><title><![CDATA[Reply to 論 迷你電腦 配合 RTX Pro 4500 的簡單測試, 以及Blackwell架構下的一些嘗試 on Fri, 05 Jun 2026 17:23:31 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/cs6" aria-label="Profile: CS6">@<bdi>CS6</bdi></a></p>
<p dir="auto"><a href="https://lcz.me/topic/398/%E8%AB%96-a10g-3090-%E5%BA%95%E4%B8%8B%E7%9A%84gemma-4%E8%B7%9Fqwen-3.6%E6%B8%AC%E8%A9%A6%E5%BF%83%E5%BE%97/9">Vulkan</a>支持混合卡, 把sm86變成sm120應該就可以, 畢竟CS你應該也在用vulkan吧</p>
<p dir="auto">B70的話還是避開吧, 這張卡很多測試情景都是用Intel自己docker image, 適用性可能無限趨近0</p>
]]></description><link>https://lcz.me/post/5242</link><guid isPermaLink="true">https://lcz.me/post/5242</guid><dc:creator><![CDATA[566656661]]></dc:creator><pubDate>Fri, 05 Jun 2026 17:23:31 GMT</pubDate></item><item><title><![CDATA[Reply to 論 迷你電腦 配合 RTX Pro 4500 的簡單測試, 以及Blackwell架構下的一些嘗試 on Fri, 05 Jun 2026 17:13:34 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/566656661" aria-label="Profile: 566656661">@<bdi>566656661</bdi></a>  單卡25萬左右還是太硬了，我的微薄月薪還要先扣 ai 税，除非有額外的收入可以回本，我目前已經有一張 R9700 可以玩，目前是在考慮第二張可以選  R9700 或是 B70 或是捏一下上   Pro 4500</p>
]]></description><link>https://lcz.me/post/5241</link><guid isPermaLink="true">https://lcz.me/post/5241</guid><dc:creator><![CDATA[CS6]]></dc:creator><pubDate>Fri, 05 Jun 2026 17:13:34 GMT</pubDate></item><item><title><![CDATA[Reply to 論 迷你電腦 配合 RTX Pro 4500 的簡單測試, 以及Blackwell架構下的一些嘗試 on Fri, 05 Jun 2026 17:02:31 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/cs6" aria-label="Profile: CS6">@<bdi>CS6</bdi></a></p>
<p dir="auto">笑死, 不過有預算的話上RTX Pro 5000吧, Pro 4500 其實比較冷門點</p>
]]></description><link>https://lcz.me/post/5239</link><guid isPermaLink="true">https://lcz.me/post/5239</guid><dc:creator><![CDATA[566656661]]></dc:creator><pubDate>Fri, 05 Jun 2026 17:02:31 GMT</pubDate></item><item><title><![CDATA[Reply to 論 迷你電腦 配合 RTX Pro 4500 的簡單測試, 以及Blackwell架構下的一些嘗試 on Fri, 05 Jun 2026 17:01:02 GMT]]></title><description><![CDATA[<p dir="auto">你讓我對 4500 心動了</p>
]]></description><link>https://lcz.me/post/5238</link><guid isPermaLink="true">https://lcz.me/post/5238</guid><dc:creator><![CDATA[CS6]]></dc:creator><pubDate>Fri, 05 Jun 2026 17:01:02 GMT</pubDate></item></channel></rss>