<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[新手入坑 R9700 真的行嗎？]]></title><description><![CDATA[<p dir="auto">如題<br />
小弟有一部mini PC 連底座PCI 5.0, 285H 64GB Arc 140T.<br />
想試做一個小型工作試驗機，目的是想行Qwen 27B  做Hermes 大腦</p>
<p dir="auto">做了些功課，R9700 32GB 是唯一性格比高<br />
但效能不怎麼好🫠 想問一下R9700 和 7090XTX 差很遠嗎性能上？除了vram外<br />
兩萬內好像也沒有更好選擇....有32GB.vram..</p>
<p dir="auto">目前有裝Hermes, 嘗試用內置140T 行 9B model，可以說是基本，有大概8至14 token /s<br />
但問題長便要等3至7分鐘，甚至更長時間。</p>
<p dir="auto">感覺玩了一下，我的工作真的對AI有需求，所以準備入坑🫠，<br />
（不是製圖/製片，但需要長上下文，coding, reasoning, 邏輯類型）</p>
<p dir="auto">求問各位經驗豐富的哥們</p>
]]></description><link>https://lcz.me/topic/426/新手入坑-r9700-真的行嗎</link><generator>RSS for Node</generator><lastBuildDate>Sat, 06 Jun 2026 02:29:32 GMT</lastBuildDate><atom:link href="https://lcz.me/topic/426.rss" rel="self" type="application/rss+xml"/><pubDate>Thu, 04 Jun 2026 15:30:35 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to 新手入坑 R9700 真的行嗎？ on Fri, 05 Jun 2026 08:12:00 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/cs6" aria-label="Profile: CS6">@<bdi>CS6</bdi></a> 我的版本是GTI 15 285h 是 Pci 5.0 x8<br />
底座有兩個 8pin , 兩條電線, 可以轉8+6</p>
<p dir="auto"><a href="https://www.notebookcheck-cn.com/Beelink-eGPU-OCuLink.882436.0.html" rel="nofollow ugc">https://www.notebookcheck-cn.com/Beelink-eGPU-OCuLink.882436.0.html</a></p>
]]></description><link>https://lcz.me/post/5178</link><guid isPermaLink="true">https://lcz.me/post/5178</guid><dc:creator><![CDATA[rolex lo]]></dc:creator><pubDate>Fri, 05 Jun 2026 08:12:00 GMT</pubDate></item><item><title><![CDATA[Reply to 新手入坑 R9700 真的行嗎？ on Fri, 05 Jun 2026 07:50:48 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/cs6" aria-label="Profile: CS6">@<bdi>CS6</bdi></a></p>
<p dir="auto"><a href="https://lcz.me/topic/431/%E5%AF%B9-m5-max-%E8%B7%91%E6%9C%AC%E5%9C%B0%E5%A4%A7%E6%A8%A1%E5%9E%8B%E6%9C%89%E7%82%B9%E5%A4%B1%E6%9C%9B/28">https://lcz.me/topic/431/对-m5-max-跑本地大模型有点失望/28</a></p>
<p dir="auto">我在這裏簡單用llama benchy測試了一下, 可以參考看看</p>
<p dir="auto">5000 Pro, 6000 Pro那些應該只會更快不會更慢</p>
]]></description><link>https://lcz.me/post/5174</link><guid isPermaLink="true">https://lcz.me/post/5174</guid><dc:creator><![CDATA[566656661]]></dc:creator><pubDate>Fri, 05 Jun 2026 07:50:48 GMT</pubDate></item><item><title><![CDATA[Reply to 新手入坑 R9700 真的行嗎？ on Fri, 05 Jun 2026 07:47:25 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/rolex-lo" aria-label="Profile: rolex-lo">@<bdi>rolex-lo</bdi></a>  你的 底座PCI 5.0 是 x16還是 x8 ?<br />
R9700 跟高階 Ｎ卡電源接頭不同喔！</p>
]]></description><link>https://lcz.me/post/5172</link><guid isPermaLink="true">https://lcz.me/post/5172</guid><dc:creator><![CDATA[CS6]]></dc:creator><pubDate>Fri, 05 Jun 2026 07:47:25 GMT</pubDate></item><item><title><![CDATA[Reply to 新手入坑 R9700 真的行嗎？ on Fri, 05 Jun 2026 07:36:07 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/566656661" aria-label="Profile: 566656661">@<bdi>566656661</bdi></a> 我也很期待，也許我們可以來測同一個指標？</p>
]]></description><link>https://lcz.me/post/5170</link><guid isPermaLink="true">https://lcz.me/post/5170</guid><dc:creator><![CDATA[CS6]]></dc:creator><pubDate>Fri, 05 Jun 2026 07:36:07 GMT</pubDate></item><item><title><![CDATA[Reply to 新手入坑 R9700 真的行嗎？ on Fri, 05 Jun 2026 05:05:03 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/566656661" aria-label="Profile: 566656661">@<bdi>566656661</bdi></a> 謝過大哥. 都想了解 一倍價錢, 會否比r9700好一半,,,<img src="https://lcz.me/assets/plugins/nodebb-plugin-emoji/emoji/android/1f635.png?v=d348ca29232" class="not-responsive emoji emoji-android emoji--dizzy_face" style="height:23px;width:auto;vertical-align:middle" title=":dizzy_face:" alt="😵" /></p>
]]></description><link>https://lcz.me/post/5137</link><guid isPermaLink="true">https://lcz.me/post/5137</guid><dc:creator><![CDATA[rolex lo]]></dc:creator><pubDate>Fri, 05 Jun 2026 05:05:03 GMT</pubDate></item><item><title><![CDATA[Reply to 新手入坑 R9700 真的行嗎？ on Fri, 05 Jun 2026 04:37:09 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/rolex-lo" aria-label="Profile: rolex-lo">@<bdi>rolex-lo</bdi></a></p>
<p dir="auto">我現在就是用RTX Pro 4500, 也許晚上我發個文?</p>
]]></description><link>https://lcz.me/post/5129</link><guid isPermaLink="true">https://lcz.me/post/5129</guid><dc:creator><![CDATA[566656661]]></dc:creator><pubDate>Fri, 05 Jun 2026 04:37:09 GMT</pubDate></item><item><title><![CDATA[Reply to 新手入坑 R9700 真的行嗎？ on Fri, 05 Jun 2026 04:15:32 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/566656661" aria-label="Profile: 566656661">@<bdi>566656661</bdi></a> 看了又看 那如果上 blackwell 4500 32GB vram 對比 R9700 來說差多嗎？除了價錢外...</p>
]]></description><link>https://lcz.me/post/5125</link><guid isPermaLink="true">https://lcz.me/post/5125</guid><dc:creator><![CDATA[rolex lo]]></dc:creator><pubDate>Fri, 05 Jun 2026 04:15:32 GMT</pubDate></item><item><title><![CDATA[Reply to 新手入坑 R9700 真的行嗎？ on Fri, 05 Jun 2026 04:15:18 GMT]]></title><description><![CDATA[<p dir="auto">Pro 4500 32GB (麗台 NT$130K) 就是VRAM加大版的 RTX5070Ti 16GB (NT$35K) 規格一模一樣 除了 32GB at 896 GB/s, 可以捏一下大腿 去PTT HardwareSale 版面 有機會 130K 徵到一張, 我昨天有看到有人出了一張白色海外進口版的5090 大約 $12X K 出手; 海外版一般只有3年保固 而且可能要送到歐美保修(??)</p>
]]></description><link>https://lcz.me/post/5124</link><guid isPermaLink="true">https://lcz.me/post/5124</guid><dc:creator><![CDATA[kos or]]></dc:creator><pubDate>Fri, 05 Jun 2026 04:15:18 GMT</pubDate></item><item><title><![CDATA[Reply to 新手入坑 R9700 真的行嗎？ on Fri, 05 Jun 2026 04:14:15 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/cs6" aria-label="Profile: CS6">@<bdi>CS6</bdi></a> 果然是大神<img src="https://lcz.me/assets/plugins/nodebb-plugin-emoji/emoji/android/1f64f.png?v=d348ca29232" class="not-responsive emoji emoji-android emoji--pray" style="height:23px;width:auto;vertical-align:middle" title="🙏" alt="🙏" /> 那r9700對你來說真的雞肋,你cotext 開到多少&gt;?</p>
]]></description><link>https://lcz.me/post/5123</link><guid isPermaLink="true">https://lcz.me/post/5123</guid><dc:creator><![CDATA[rolex lo]]></dc:creator><pubDate>Fri, 05 Jun 2026 04:14:15 GMT</pubDate></item><item><title><![CDATA[Reply to 新手入坑 R9700 真的行嗎？ on Fri, 05 Jun 2026 03:47:49 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/rolex-lo" aria-label="Profile: rolex-lo">@<bdi>rolex-lo</bdi></a><br />
我是 opencode 搭配 liteLLM 跑 gamma4 / Qwne 3.6 3.7<br />
主力是  codex max ＋ claude code max 200 ，我的工作是移動端全棧開發＋LLM devops<br />
我平常常會把大量的裝置端 log直接喂進去做分析，也會讓AI直接去做E2E測試<br />
還有配合 BDD 做 測試與開發</p>
]]></description><link>https://lcz.me/post/5111</link><guid isPermaLink="true">https://lcz.me/post/5111</guid><dc:creator><![CDATA[CS6]]></dc:creator><pubDate>Fri, 05 Jun 2026 03:47:49 GMT</pubDate></item><item><title><![CDATA[Reply to 新手入坑 R9700 真的行嗎？ on Fri, 05 Jun 2026 03:40:38 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/cs6" aria-label="Profile: CS6">@<bdi>CS6</bdi></a> 工作上要求是 邏輯思考工作流程及方式以及方法 從而尋找問題 當中要配合閱讀日誌 和 提供script 等等 所以上下文比較大需要。</p>
<p dir="auto">那請問你是用他來寫code嗎？</p>
]]></description><link>https://lcz.me/post/5105</link><guid isPermaLink="true">https://lcz.me/post/5105</guid><dc:creator><![CDATA[rolex lo]]></dc:creator><pubDate>Fri, 05 Jun 2026 03:40:38 GMT</pubDate></item><item><title><![CDATA[Reply to 新手入坑 R9700 真的行嗎？ on Fri, 05 Jun 2026 03:37:29 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/rolex-lo" aria-label="Profile: rolex-lo">@<bdi>rolex-lo</bdi></a>  coding 你還是訂 codex 或是 claude code 吧！ 沒比較貴，目前我 R9700 單卡 coding 體驗很糟</p>
]]></description><link>https://lcz.me/post/5102</link><guid isPermaLink="true">https://lcz.me/post/5102</guid><dc:creator><![CDATA[CS6]]></dc:creator><pubDate>Fri, 05 Jun 2026 03:37:29 GMT</pubDate></item><item><title><![CDATA[Reply to 新手入坑 R9700 真的行嗎？ on Fri, 05 Jun 2026 03:36:00 GMT]]></title><description><![CDATA[<p dir="auto">那如果上 blackwell 4500 32GB vram 對比 R9700 來說<br />
是否值得？差多嗎？</p>
]]></description><link>https://lcz.me/post/5099</link><guid isPermaLink="true">https://lcz.me/post/5099</guid><dc:creator><![CDATA[rolex lo]]></dc:creator><pubDate>Fri, 05 Jun 2026 03:36:00 GMT</pubDate></item><item><title><![CDATA[Reply to 新手入坑 R9700 真的行嗎？ on Thu, 04 Jun 2026 16:38:09 GMT]]></title><description><![CDATA[<p dir="auto">還有一個選擇 Mac Studio M3 Ultra 假如二手買不到 現在官網還有賣 但要等五個月<br />
Mac Studio M3 Ultra 96GB<br />
28 核心 CPU 配備 20 個效能核心與 8 個節能核心60 核心 GPU<br />
硬體加速光線追蹤 32 核心神經網路引擎 819GB/s 記憶體頻寬</p>
]]></description><link>https://lcz.me/post/5049</link><guid isPermaLink="true">https://lcz.me/post/5049</guid><dc:creator><![CDATA[kos or]]></dc:creator><pubDate>Thu, 04 Jun 2026 16:38:09 GMT</pubDate></item><item><title><![CDATA[Reply to 新手入坑 R9700 真的行嗎？ on Thu, 04 Jun 2026 16:31:14 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://www.reddit.com/r/LocalLLaMA/comments/1swt19w/brief_ngrammod_test_results_r9700qwen36_27b/" rel="nofollow ugc">剛找到一個Vulkan的數據</a></p>
<pre><code>$: llama-bench-vulkan   -m 'Qwen3.6-27B-UD-Q4_K_XL.gguf' 
WARNING: radv is not a conformant Vulkan implementation, testing use only.
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon AI PRO R9700 (RADV GFX1201) (radv) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen35 27B Q4_K - Medium       |  16.39 GiB |    26.90 B | Vulkan     |  99 |           pp512 |       1050.13 ± 0.54 |
| qwen35 27B Q4_K - Medium       |  16.39 GiB |    26.90 B | Vulkan     |  99 |           tg128 |         31.26 ± 0.01 |

build: 97895129e (8863)
</code></pre>
<p dir="auto">運行參數</p>
<pre><code>llama-server-vulkan   -m '/Qwen3.6-27B-UD-Q4_K_XL.gguf'   --mmproj '/mmproj-BF16(3).gguf'  -np 1 -ngl 99   --temp 0.6   --top-p 0.95   --top-k 20   --min-p 0.00 --presence_penalty 0.00 --jinja  --chat-template-kwargs '{"preserve_thinking": true}' -ub 2048 -fa 1 --spec-type ngram-mod --spec-ngram-size-n 24 --draft-min 12 --draft-max 48 --host 0.0.0.0   --port 8180
</code></pre>
<pre><code>--- Prompt Processing (PPS) Statistics ---
Mean:       549.60 t/s
Median:     519.19 t/s
P95:        936.60 t/s
StdDev:     240.80 (Stability)
Range:    64.18 - 1015.91 t/s

--- Token Generation (Tok/s) Statistics ---
Mean:        28.80 t/s
Median:      28.20 t/s
P95:         45.34 t/s
StdDev:       6.78 (Stability)
Range:    16.49 - 53.63   t/s

Total Tokens Generated: 87840
$:~/Documents/llama_perf$ python3 parse_performance_stats_full.py

== Prompt Processing (PPS) Analysis ==
Effective Avg:     549.60 t/s (Token-Weighted)
Median (P50):      519.19 t/s
Tail (P99):        958.31 t/s
Stability(CV):       43.8% (JITTERY)
Skewness:            0.04 (Symmetric)

== Token Generation (Tok/s) Analysis ==
Effective Avg:    1697.20 t/s (Token-Weighted)
Median (P50):       28.20 t/s
Tail (P99):         51.39 t/s
Stability(CV):       23.5% (JITTERY)
Skewness:            1.40 (Burst Heavy)
</code></pre>
<p dir="auto">看上去至少比vLLM好, 不過真的就只有一點</p>
]]></description><link>https://lcz.me/post/5047</link><guid isPermaLink="true">https://lcz.me/post/5047</guid><dc:creator><![CDATA[566656661]]></dc:creator><pubDate>Thu, 04 Jun 2026 16:31:14 GMT</pubDate></item><item><title><![CDATA[Reply to 新手入坑 R9700 真的行嗎？ on Thu, 04 Jun 2026 16:27:17 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/rolex-lo" aria-label="Profile: rolex-lo">@<bdi>rolex-lo</bdi></a></p>
<p dir="auto">可以這樣說, AMD在原生的Linux内核會比WSL 2來得好, 畢竟WSL 2再怎麽貼近Linux 内核, 它的本質還是Hyper V, 不多不少都會有影響</p>
]]></description><link>https://lcz.me/post/5045</link><guid isPermaLink="true">https://lcz.me/post/5045</guid><dc:creator><![CDATA[566656661]]></dc:creator><pubDate>Thu, 04 Jun 2026 16:27:17 GMT</pubDate></item><item><title><![CDATA[Reply to 新手入坑 R9700 真的行嗎？ on Thu, 04 Jun 2026 16:26:44 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/kop-wang" aria-label="Profile: kop-wang">@<bdi>kop-wang</bdi></a> 有想過 直上mbp 16 m5 max 算....<br />
但看過測試數據，還是很普通....</p>
]]></description><link>https://lcz.me/post/5044</link><guid isPermaLink="true">https://lcz.me/post/5044</guid><dc:creator><![CDATA[rolex lo]]></dc:creator><pubDate>Thu, 04 Jun 2026 16:26:44 GMT</pubDate></item><item><title><![CDATA[Reply to 新手入坑 R9700 真的行嗎？ on Thu, 04 Jun 2026 16:24:48 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/566656661" aria-label="Profile: 566656661">@<bdi>566656661</bdi></a> 我目標都只是100K 🥲 沒貨了<br />
機型所限沒法上雙卡。想過兩張7900 XTX 才2萬內 960GB 頻寬，好像總比兩張R9700來得化算。<br />
但 單卡就是沒有2萬內比R9700快</p>
<p dir="auto">還是謝過大哥，抄來的數據，很有用。</p>
<p dir="auto">現時小弟都是用wsl + lm studio...如果入手r9700 看似要全部搬到ubuntu....</p>
]]></description><link>https://lcz.me/post/5043</link><guid isPermaLink="true">https://lcz.me/post/5043</guid><dc:creator><![CDATA[rolex lo]]></dc:creator><pubDate>Thu, 04 Jun 2026 16:24:48 GMT</pubDate></item><item><title><![CDATA[Reply to 新手入坑 R9700 真的行嗎？ on Thu, 04 Jun 2026 16:20:02 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/kos-or" aria-label="Profile: kos-or">@<bdi>kos-or</bdi></a> 事實，比5060Ti好 介乎5070 。 看來買了真的要跟哥們調了。<img src="https://lcz.me/assets/plugins/nodebb-plugin-emoji/emoji/android/1f627.png?v=d348ca29232" class="not-responsive emoji emoji-android emoji--anguished" style="height:23px;width:auto;vertical-align:middle" title=":anguished:" alt="😧" /></p>
]]></description><link>https://lcz.me/post/5042</link><guid isPermaLink="true">https://lcz.me/post/5042</guid><dc:creator><![CDATA[rolex lo]]></dc:creator><pubDate>Thu, 04 Jun 2026 16:20:02 GMT</pubDate></item><item><title><![CDATA[Reply to 新手入坑 R9700 真的行嗎？ on Thu, 04 Jun 2026 16:18:13 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://www.reddit.com/r/LocalLLM/comments/1swd1bk/r9700_qwen36_and_mtp_makes_it_worse/" rel="nofollow ugc">來自Reddit這個帖子</a></p>
<p dir="auto">這個是2張R9700的配置, vllm-openai-rocm 配合FP8</p>
<pre><code>| Model       | Test          | Tokens/sec      | Peak Tokens/sec | TTFR (ms)       | Est PPT (ms)    | E2E TTFT (ms)   |
|:------------|--------------:|----------------:|----------------:|----------------:|----------------:|----------------:|
| Qwen3.6-27B | pp2048 @ d4096 | 2508.92 ± 11.57 | —               | 2529.74 ± 11.19 | 2449.58 ± 11.19 | 2529.74 ± 11.19 |
| Qwen3.6-27B | tg32 @ d4096   | 72.94 ± 0.55    | 75.30 ± 0.57    | —               | —               | —               |
| Qwen3.6-27B | pp2048 @ d8132 | 2402.38 ± 1.13  | —               | 4318.05 ± 1.99  | 4237.88 ± 1.99  | 4318.05 ± 1.99  |
| Qwen3.6-27B | tg32 @ d8132   | 63.52 ± 3.35    | 65.58 ± 3.46    | —               | —               | —               |
| Qwen3.6-27B | pp2048 @ d16000| 2197.86 ± 7.44  | —               | 8292.49 ± 28.04 | 8212.32 ± 28.04 | 8293.70 ± 28.04 |
| Qwen3.6-27B | tg32 @ d16000  | 53.45 ± 2.63    | 55.18 ± 2.71    | —               | —               | —               |
| Qwen3.6-27B | pp2048 @ d30000| 1899.63 ± 1.41  | —               | 16951.73 ± 13.21| 16871.56 ± 13.21| 16952.54 ± 14.22|
| Qwen3.6-27B | tg32 @ d30000  | 53.23 ± 0.16    | 54.95 ± 0.17    | —               | —               | —               |
| Qwen3.6-27B | pp2048 @ d60000| 1459.41 ± 0.62  | —               | 42596.49 ± 18.16| 42516.32 ± 18.16| 42598.65 ± 18.72|
| Qwen3.6-27B | tg32 @ d60000  | 40.35 ± 0.04    | 41.66 ± 0.04    | —               | —               | —               |
| Qwen3.6-27B | pp2048 @ d90000| 1181.78 ± 0.27  | —               | 77970.53 ± 16.71| 77890.36 ± 16.71| 77970.53 ± 16.71|
| Qwen3.6-27B | tg32 @ d90000  | 28.89 ± 0.07    | 30.33 ± 0.47    | —               | —               | —               |
| Qwen3.6-27B | pp2048 @ d120000| 991.43 ± 0.47  | —               | 123185.76 ± 58.07| 123103.97 ± 58.07| 123187.93 ± 60.50|
| Qwen3.6-27B | tg32 @ d120000 | 25.20 ± 1.44    | 26.67 ± 0.94    | —               | —               | —               |
| Qwen3.6-27B | pp2048 @ d150000| 854.21 ± 0.17  | —               | 178081.59 ± 36.01| 177999.80 ± 36.01| 178088.15 ± 32.55|
| Qwen3.6-27B | tg32 @ d150000 | 21.86 ± 1.19    | 24.33 ± 0.94    | —               | —               | —               |
</code></pre>
<p dir="auto">運行參數</p>
<pre><code> --model /app/models

--served-model-name Qwen3.6-27B-FP8

--host 192.168.1.224

--port 5678

--tool-call-parser qwen3_coder

--enable-auto-tool-choice

--reasoning-parser qwen3

--language-model-only

--tensor-parallel-size 2

--max-num-seqs 4

--max-model-len 200k

--dtype auto

--gpu-memory-utilization 0.95

--attention-config.backend TRITON_ATTN

--quantization fp8

--enable-chunked-prefill

--enable-prefix-caching

--override-generation-config '{"temperature":0.6, "top_p":0.95, "top_k":20, "presence_penalty": 0.0 , "repetition_penalty":1.0}'

--speculative-config '{"method":"mtp","num_speculative_tokens":3}' 
</code></pre>
<p dir="auto">就這個而言, 單卡估計要把上下文長度砍半變100K了, 然後TTFT如未意外應該也會大降</p>
<p dir="auto">估計要玩還是玩llama.cpp + Vulkan了</p>
]]></description><link>https://lcz.me/post/5038</link><guid isPermaLink="true">https://lcz.me/post/5038</guid><dc:creator><![CDATA[566656661]]></dc:creator><pubDate>Thu, 04 Jun 2026 16:18:13 GMT</pubDate></item><item><title><![CDATA[Reply to 新手入坑 R9700 真的行嗎？ on Thu, 04 Jun 2026 15:57:43 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/rolex-lo" aria-label="Profile: rolex-lo">@<bdi>rolex-lo</bdi></a> 是的，localLLM的甜点区（高显存带宽的32GB卡）原本是5090的位置，但现在他已经上天了。<br />
5090目前的价格比rtx pro 5000还要贵，我就很难理解……</p>
<p dir="auto">如果想爽跑LLM，显存带宽1T以上是基本要求，才会在不过分降低模型精度，稍大的上下文的前提下，有一个比较漂亮的prefill数据。在Agent工具流行的现在，系统提示词超过20k很轻松，prefill过慢会导致等待时间太长。</p>
]]></description><link>https://lcz.me/post/5035</link><guid isPermaLink="true">https://lcz.me/post/5035</guid><dc:creator><![CDATA[kop wang]]></dc:creator><pubDate>Thu, 04 Jun 2026 15:57:43 GMT</pubDate></item><item><title><![CDATA[Reply to 新手入坑 R9700 真的行嗎？ on Thu, 04 Jun 2026 16:04:24 GMT]]></title><description><![CDATA[<p dir="auto">Radeon AI PRO R9700 (本質上是RX9070XT 32GB版) 所有規格參數都一樣 除了VRAM大了一倍; Memory Bandwidth 644 GB/s, 介於RTX 5070 Ti 896 GB/s和 RTX 5060 Ti 446 GB/s 中間, 速度上我覺得還可以接受, 要更快就要買魔改卡或5090了</p>
<p dir="auto">我目前的理解是Mem BW 跟推論速度Token Generation 有關, PP Prefill 牽涉到Tensor Cores的數量(就N卡而言)</p>
<p dir="auto">Hermes Agent system prompt 基本就約17.5K 要先Prefill</p>
<p dir="auto">但第二次同樣的17.5K 會有KV cache hit 這部分不用進行第二次Prefill processing 除非你的LLM有去動到其中一部分 就會從變動的那一個token開始進行Prefill</p>
]]></description><link>https://lcz.me/post/5034</link><guid isPermaLink="true">https://lcz.me/post/5034</guid><dc:creator><![CDATA[kos or]]></dc:creator><pubDate>Thu, 04 Jun 2026 16:04:24 GMT</pubDate></item><item><title><![CDATA[Reply to 新手入坑 R9700 真的行嗎？ on Thu, 04 Jun 2026 15:50:38 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/kop-wang" aria-label="Profile: kop-wang">@<bdi>kop-wang</bdi></a> 看了很多測試，R9700真的很一般，但好像沒有更好選擇<br />
1萬1 也不怎麼好。但加到兩萬也沒有好選擇🥲</p>
]]></description><link>https://lcz.me/post/5033</link><guid isPermaLink="true">https://lcz.me/post/5033</guid><dc:creator><![CDATA[rolex lo]]></dc:creator><pubDate>Thu, 04 Jun 2026 15:50:38 GMT</pubDate></item><item><title><![CDATA[Reply to 新手入坑 R9700 真的行嗎？ on Thu, 04 Jun 2026 15:41:42 GMT]]></title><description><![CDATA[<p dir="auto">看运行环境。<br />
如果主要是跑推理模型驱动Hermes的话，R9700应该很轻松胜任。我的9700要到8月才能运到。<br />
不过我用16G的rtx A5000，跑qwen3 14b  和gpt oss 20b 这两个模型来本地驱动Hermes，除了慢一点傻一点外，似乎没什么大问题</p>
]]></description><link>https://lcz.me/post/5030</link><guid isPermaLink="true">https://lcz.me/post/5030</guid><dc:creator><![CDATA[lxbs]]></dc:creator><pubDate>Thu, 04 Jun 2026 15:41:42 GMT</pubDate></item></channel></rss>