<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[x99服务器配双3070-8g qwen3.6-35B-A3B 50tk&#x2F;s-200k上下文-多模态]]></title><description><![CDATA[<p dir="auto">用二手3070和淘汰服务器跑35B大模型的实践记录</p>
<p dir="auto">硬件配置：</p>
<ul>
<li>Dell R730（2017年），双路E5-2680v4，56核，512GB DDR4</li>
<li>2张二手RTX 3070 8GB</li>
<li>推理框架：llama.cpp，CPU MoE offload</li>
<li>模型：Qwen3.6-35B-A3B（APEX量化17.3GB + MTP投机解码）</li>
</ul>
<p dir="auto">性能实测：</p>
<p dir="auto">日常对话：36-40 tok/s<br />
代码生成：60 tok/s（MTP命中率85%）<br />
数学推理：52 tok/s<br />
长上下文200K：45-50 tok/s<br />
多模态图片理解：40-50 tok/s</p>
<p dir="auto">上下文能力：</p>
<ul>
<li>200K稳定运行，多模态+MTP全开</li>
<li>236K是多模态极限（128K到256K每12K逐档扫描确定）</li>
<li>256K纯文本可用，但多模态OOM</li>
</ul>
<p dir="auto">资源占用（最有价值的部分）：</p>
<p dir="auto">GPU SM利用率最高23%，双卡TDP 440W实际只用了137W。<br />
CPU 56核用了13核（23%）。<br />
内存512GB用了25GB（5%）。</p>
<p dir="auto">算力全部过剩。唯一不够的是带宽。</p>
<p dir="auto">DDR4-2400八通道144GB/s(实测)，PCIe 3.0 x16只有16GB/s。MoE每次推理要从内存搬专家权重到CPU算，再通过PCIe回传GPU。GPU和CPU大部分时间都在等数据。</p>
<p dir="auto">结论：本地MoE推理的瓶颈是访存带宽，不是算力。选硬件时优先看DDR5、PCIe 4.0、大显存，而不是核心数和GPU算力。</p>
<p dir="auto">调优过程（4个维度21种配置自动化测试）：</p>
<p dir="auto">MoE CPU线程数：32是唯一稳定值，16和56导致OOM，24和40/48直接崩溃。<br />
批处理线程：28最佳，42无提升，56断崖下跌。<br />
显存分配GPU0:GPU1=4:1，速度差距忽略不计，但余量差别巨大（1.5GB vs 126MB）。<br />
NUMA绑定：效果不明显，不绑。</p>
<p dir="auto">最关键的一个参数是-np 1。llama.cpp默认n_parallel=4，预分配4份KV cache，16GB显存直接撑爆。改成1份才能跑到200K。</p>
]]></description><link>https://lcz.me/topic/249/x99服务器配双3070-8g-qwen3.6-35b-a3b-50tk-s-200k上下文-多模态</link><generator>RSS for Node</generator><lastBuildDate>Sun, 31 May 2026 06:05:20 GMT</lastBuildDate><atom:link href="https://lcz.me/topic/249.rss" rel="self" type="application/rss+xml"/><pubDate>Fri, 22 May 2026 01:07:29 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to x99服务器配双3070-8g qwen3.6-35B-A3B 50tk&#x2F;s-200k上下文-多模态 on Tue, 26 May 2026 04:54:52 GMT]]></title><description><![CDATA[<p dir="auto"><img src="https://upload.lcz.me/uploads/786787aa-7959-4335-a2dd-7cd1ed10223e.jpg" alt="image1.jpg" class=" img-fluid img-markdown" /><br />
<img src="https://upload.lcz.me/uploads/9a3111a5-d43b-46e5-b9e3-7b3c71bc582c.jpg" alt="image2.jpg" class=" img-fluid img-markdown" />x99,e5 2673v3,4x16g1333,3070m20g抄作业报告</p>
]]></description><link>https://lcz.me/post/3728</link><guid isPermaLink="true">https://lcz.me/post/3728</guid><dc:creator><![CDATA[深圳律师陈扬波]]></dc:creator><pubDate>Tue, 26 May 2026 04:54:52 GMT</pubDate></item><item><title><![CDATA[Reply to x99服务器配双3070-8g qwen3.6-35B-A3B 50tk&#x2F;s-200k上下文-多模态 on Sat, 23 May 2026 03:36:28 GMT]]></title><description><![CDATA[<blockquote>
<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/lannykov" aria-label="Profile: lannykov">@<bdi>lannykov</bdi></a> <a href="/post/2982">说</a>:</p>
<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/%E6%B7%B1%E5%9C%B3%E5%BE%8B%E5%B8%88%E9%99%88%E6%89%AC%E6%B3%A2" aria-label="Profile: 深圳律师陈扬波">@<bdi>深圳律师陈扬波</bdi></a> llama-server <br />
-m Qwen3.6-35B-A3B-APEX-MTP-I-Compact.gguf <br />
--mmproj mmproj-F16.gguf <br />
-c 204800 \                    # 200K上下文<br />
--n-cpu-moe 32 \               # MoE专家权重CPU卸载(32线程)<br />
--spec-type draft-mtp \        # MTP投机解码<br />
--reasoning off \              # 禁用Think模式<br />
--jinja \                      # Jinja模板<br />
-ngl 99 \                      # GPU层全卸载<br />
-ts 4,1 \                      # 线程调度<br />
-np 1 \                        # 并行解码1<br />
--port 8081 <br />
--host 0.0.0.0</p>
</blockquote>
<p dir="auto">谢谢。愉快地抄作业。</p>
]]></description><link>https://lcz.me/post/3199</link><guid isPermaLink="true">https://lcz.me/post/3199</guid><dc:creator><![CDATA[深圳律师陈扬波]]></dc:creator><pubDate>Sat, 23 May 2026 03:36:28 GMT</pubDate></item><item><title><![CDATA[Reply to x99服务器配双3070-8g qwen3.6-35B-A3B 50tk&#x2F;s-200k上下文-多模态 on Sat, 23 May 2026 03:28:30 GMT]]></title><description><![CDATA[<blockquote>
<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/terry" aria-label="Profile: terry">@<bdi>terry</bdi></a> <a href="/post/3059">说</a>:</p>
<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/%E6%B7%B1%E5%9C%B3%E5%BE%8B%E5%B8%88%E9%99%88%E6%89%AC%E6%B3%A2" aria-label="Profile: 深圳律师陈扬波">@<bdi>深圳律师陈扬波</bdi></a> 律师都开始玩AI了？太可怕了。</p>
</blockquote>
<p dir="auto">现在的用AI的律师，就像40年前电算化的会计</p>
]]></description><link>https://lcz.me/post/3197</link><guid isPermaLink="true">https://lcz.me/post/3197</guid><dc:creator><![CDATA[深圳律师陈扬波]]></dc:creator><pubDate>Sat, 23 May 2026 03:28:30 GMT</pubDate></item><item><title><![CDATA[Reply to x99服务器配双3070-8g qwen3.6-35B-A3B 50tk&#x2F;s-200k上下文-多模态 on Sat, 23 May 2026 03:26:08 GMT]]></title><description><![CDATA[<p dir="auto">AI将成为最正义的法官</p>
]]></description><link>https://lcz.me/post/3196</link><guid isPermaLink="true">https://lcz.me/post/3196</guid><dc:creator><![CDATA[深圳律师陈扬波]]></dc:creator><pubDate>Sat, 23 May 2026 03:26:08 GMT</pubDate></item><item><title><![CDATA[Reply to x99服务器配双3070-8g qwen3.6-35B-A3B 50tk&#x2F;s-200k上下文-多模态 on Fri, 22 May 2026 09:04:43 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/%E6%B7%B1%E5%9C%B3%E5%BE%8B%E5%B8%88%E9%99%88%E6%89%AC%E6%B3%A2" aria-label="Profile: 深圳律师陈扬波">@<bdi>深圳律师陈扬波</bdi></a> 律师都开始玩AI了？太可怕了。</p>
]]></description><link>https://lcz.me/post/3059</link><guid isPermaLink="true">https://lcz.me/post/3059</guid><dc:creator><![CDATA[terry]]></dc:creator><pubDate>Fri, 22 May 2026 09:04:43 GMT</pubDate></item><item><title><![CDATA[Reply to x99服务器配双3070-8g qwen3.6-35B-A3B 50tk&#x2F;s-200k上下文-多模态 on Fri, 22 May 2026 09:04:02 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/lannykov" aria-label="Profile: lannykov">@<bdi>lannykov</bdi></a> 连律师都玩AI了？太可怕了。</p>
]]></description><link>https://lcz.me/post/3058</link><guid isPermaLink="true">https://lcz.me/post/3058</guid><dc:creator><![CDATA[terry]]></dc:creator><pubDate>Fri, 22 May 2026 09:04:02 GMT</pubDate></item><item><title><![CDATA[Reply to x99服务器配双3070-8g qwen3.6-35B-A3B 50tk&#x2F;s-200k上下文-多模态 on Fri, 22 May 2026 04:14:52 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/%E6%B7%B1%E5%9C%B3%E5%BE%8B%E5%B8%88%E9%99%88%E6%89%AC%E6%B3%A2" aria-label="Profile: 深圳律师陈扬波">@<bdi>深圳律师陈扬波</bdi></a> llama-server <br />
-m Qwen3.6-35B-A3B-APEX-MTP-I-Compact.gguf <br />
--mmproj mmproj-F16.gguf <br />
-c 204800 \                    # 200K上下文<br />
--n-cpu-moe 32 \               # MoE专家权重CPU卸载(32线程)<br />
--spec-type draft-mtp \        # MTP投机解码<br />
--reasoning off \              # 禁用Think模式<br />
--jinja \                      # Jinja模板<br />
-ngl 99 \                      # GPU层全卸载<br />
-ts 4,1 \                      # 线程调度<br />
-np 1 \                        # 并行解码1<br />
--port 8081 <br />
--host 0.0.0.0</p>
]]></description><link>https://lcz.me/post/2982</link><guid isPermaLink="true">https://lcz.me/post/2982</guid><dc:creator><![CDATA[lannykov]]></dc:creator><pubDate>Fri, 22 May 2026 04:14:52 GMT</pubDate></item><item><title><![CDATA[Reply to x99服务器配双3070-8g qwen3.6-35B-A3B 50tk&#x2F;s-200k上下文-多模态 on Fri, 22 May 2026 03:45:56 GMT]]></title><description><![CDATA[<p dir="auto">我有一张3070m魔改16g，能否将你的参数给我？</p>
]]></description><link>https://lcz.me/post/2978</link><guid isPermaLink="true">https://lcz.me/post/2978</guid><dc:creator><![CDATA[深圳律师陈扬波]]></dc:creator><pubDate>Fri, 22 May 2026 03:45:56 GMT</pubDate></item><item><title><![CDATA[Reply to x99服务器配双3070-8g qwen3.6-35B-A3B 50tk&#x2F;s-200k上下文-多模态 on Fri, 22 May 2026 03:24:49 GMT]]></title><description><![CDATA[<p dir="auto">有换两块3080 20g的冲动了，哈哈。<br />
当时这个设备为了跑好两块3070还专门换了riser-3:Riser 3 Alternate (GPU版)，从亚马逊专门买的，所以两块都是pcie 3.0-x16的满速。另外一块riser-2默认是pcie3.0-x16,不需要折腾。<br />
机架式服务器原装货，风扇可调，噪声可以接受，还是有潜力挖一挖的。</p>
]]></description><link>https://lcz.me/post/2975</link><guid isPermaLink="true">https://lcz.me/post/2975</guid><dc:creator><![CDATA[lannykov]]></dc:creator><pubDate>Fri, 22 May 2026 03:24:49 GMT</pubDate></item><item><title><![CDATA[Reply to x99服务器配双3070-8g qwen3.6-35B-A3B 50tk&#x2F;s-200k上下文-多模态 on Fri, 22 May 2026 01:17:13 GMT]]></title><description><![CDATA[<p dir="auto">结果有些意外，没想到“废卡”还能派上用场。特别是生成速度、上下文窗口、多模态，以及低得吓人的功耗</p>
]]></description><link>https://lcz.me/post/2966</link><guid isPermaLink="true">https://lcz.me/post/2966</guid><dc:creator><![CDATA[lannykov]]></dc:creator><pubDate>Fri, 22 May 2026 01:17:13 GMT</pubDate></item></channel></rss>