<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[120 tok&#x2F;s Gemma 4 12B + MTP RTX-4070S 12GB]]></title><description><![CDATA[<ol>
<li>
<p dir="auto">Clone llama.cpp<br />
git clone <a href="https://github.com/ggml-org/llama.cpp.git" rel="nofollow ugc">https://github.com/ggml-org/llama.cpp.git</a><br />
cd llama.cpp</p>
</li>
<li>
<p dir="auto">Fetch and switch to the Gemma 4 MTP PR branch<br />
git fetch origin pull/23398/head:gemma4-mtp<br />
git checkout gemma4-mtp</p>
</li>
<li>
<p dir="auto">Build with CUDA support for NVIDIA GPUs<br />
cmake -B build -DGGML_CUDA=ON -DBUILD_SHARED_LIBS=OFF<br />
cmake --build build --config Release -j$(nproc)</p>
</li>
<li>
<p dir="auto">Download Unsloth's Gemma 4 12B QAT here: <a href="https://huggingface.co/unsloth/gemma-4-12B-it-qat-GGUF" rel="nofollow ugc">https://huggingface.co/unsloth/gemma-4-12B-it-qat-GGUF</a></p>
</li>
<li>
<p dir="auto">Download Google's Gemma 4 assistant / draft here <a href="https://huggingface.co/Janvitos/gemma-4-12B-it-qat-assistant-MTP-Q8_0-GGUF" rel="nofollow ugc">https://huggingface.co/Janvitos/gemma-4-12B-it-qat-assistant-MTP-Q8_0-GGUF</a></p>
</li>
<li>
<p dir="auto">Load the models with llama-server<br />
llama-server <br />
-m gemma-4-12B-it-qat-UD-Q4_K_XL.gguf <br />
--model-draft gemma-4-12B-it-qat-assistant-MTP-Q8_0.gguf <br />
--spec-type draft-mtp <br />
--spec-draft-n-max 4 <br />
--ctx-size 131072 <br />
--temp 1.0 <br />
--top-p 0.95 <br />
--top-k 64</p>
</li>
</ol>
]]></description><link>https://lcz.me/topic/459/120-tok-s-gemma-4-12b-mtp-rtx-4070s-12gb</link><generator>RSS for Node</generator><lastBuildDate>Thu, 11 Jun 2026 13:58:20 GMT</lastBuildDate><atom:link href="https://lcz.me/topic/459.rss" rel="self" type="application/rss+xml"/><pubDate>Sun, 07 Jun 2026 12:07:11 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to 120 tok&#x2F;s Gemma 4 12B + MTP RTX-4070S 12GB on Mon, 08 Jun 2026 23:36:03 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/johnnybegood" aria-label="Profile: johnnybegood">@<bdi>johnnybegood</bdi></a> 我是工作要用，肯定要选对简体中文支持强大的模型</p>
]]></description><link>https://lcz.me/post/5883</link><guid isPermaLink="true">https://lcz.me/post/5883</guid><dc:creator><![CDATA[joker_chang]]></dc:creator><pubDate>Mon, 08 Jun 2026 23:36:03 GMT</pubDate></item><item><title><![CDATA[Reply to 120 tok&#x2F;s Gemma 4 12B + MTP RTX-4070S 12GB on Mon, 08 Jun 2026 13:55:02 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/stxpnet" aria-label="Profile: stxpnet">@<bdi>stxpnet</bdi></a> 直接让hermes帮我做的测试</p>
]]></description><link>https://lcz.me/post/5745</link><guid isPermaLink="true">https://lcz.me/post/5745</guid><dc:creator><![CDATA[暧昧光影]]></dc:creator><pubDate>Mon, 08 Jun 2026 13:55:02 GMT</pubDate></item><item><title><![CDATA[Reply to 120 tok&#x2F;s Gemma 4 12B + MTP RTX-4070S 12GB on Mon, 08 Jun 2026 12:42:51 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/%E6%9A%A7%E6%98%A7%E5%85%89%E5%BD%B1" aria-label="Profile: 暧昧光影">@<bdi>暧昧光影</bdi></a> 这个测试脚本是如何生成的呢?</p>
]]></description><link>https://lcz.me/post/5730</link><guid isPermaLink="true">https://lcz.me/post/5730</guid><dc:creator><![CDATA[stxpnet]]></dc:creator><pubDate>Mon, 08 Jun 2026 12:42:51 GMT</pubDate></item><item><title><![CDATA[Reply to 120 tok&#x2F;s Gemma 4 12B + MTP RTX-4070S 12GB on Mon, 08 Jun 2026 12:06:17 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/joker_chang" aria-label="Profile: joker_chang">@<bdi>joker_chang</bdi></a> 你写的是中文吧， 不要指望它用中文干活呢</p>
]]></description><link>https://lcz.me/post/5727</link><guid isPermaLink="true">https://lcz.me/post/5727</guid><dc:creator><![CDATA[johnnybegood]]></dc:creator><pubDate>Mon, 08 Jun 2026 12:06:17 GMT</pubDate></item><item><title><![CDATA[Reply to 120 tok&#x2F;s Gemma 4 12B + MTP RTX-4070S 12GB on Mon, 08 Jun 2026 06:57:44 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/johnnybegood" aria-label="Profile: johnnybegood">@<bdi>johnnybegood</bdi></a> 对图片中的手写文字识别真不怎么样，相比Qwen3-VL-8B差太远了~</p>
]]></description><link>https://lcz.me/post/5689</link><guid isPermaLink="true">https://lcz.me/post/5689</guid><dc:creator><![CDATA[joker_chang]]></dc:creator><pubDate>Mon, 08 Jun 2026 06:57:44 GMT</pubDate></item><item><title><![CDATA[Reply to 120 tok&#x2F;s Gemma 4 12B + MTP RTX-4070S 12GB on Sun, 07 Jun 2026 17:57:33 GMT]]></title><description><![CDATA[<p dir="auto"><img src="https://lcz.me/assets/plugins/nodebb-plugin-emoji/emoji/android/2705.png?v=d348ca29232" class="not-responsive emoji emoji-android emoji--white_check_mark" style="height:23px;width:auto;vertical-align:middle" title="✅" alt="✅" /> Gemma4 12B 能力测试报告</p>
<pre><code>环境： RTX 3060 (12GB) | 128K ctx | Q4_0 KV Cache | MTP n_max=2 | ~10.3GB VRAM

| #   | 测试项     | 结果 | 速度       | 关键表现                    |
|-----|------------|------|------------|-----------------------------|
| 1   | 逻辑推理   | ✅   | 50.9 tok/s | 正确识别三段论有效性        |
| 2   | 数学应用题 | ✅   | 53.8 tok/s | 分步计算，得出正确结论      |
| 3   | 多轮对话   | ✅   | 49.6 tok/s | 准确记住 Alice 的名字和爱好 |
| 4   | 长程检索   | ✅   | 29.9 tok/s | 在大量重复文本中找到答案    |
| 5   | 代码生成   | ✅   | 52.1 tok/s | 生成 Python 回文算法        |
| 6   | 文本摘要   | ✅   | 38.3 tok/s | 一句话准确概括              |
| 7   | 创意写作   | ✅   | 35.9 tok/s | 写出有氛围感的微型故事      |



📊 性能亮点

- 128K 上下文完全可用 — 长文检索准确命中
- 生成速度 ~35-53 tok/s — 比纯 CPU 快很多
- 显存占用 ~10.3GB — 12GB 卡有安全余量
- MTP 接受率正常 —  speculative decoding 工作稳定

结论： Gemma4 12B 在 3060 + 128K ctx 配置下，综合能力均衡，推理、代码、长文检索均表现良好，日常使用完全没问题。
</code></pre>
<p dir="auto">速度差异好大</p>
]]></description><link>https://lcz.me/post/5597</link><guid isPermaLink="true">https://lcz.me/post/5597</guid><dc:creator><![CDATA[暧昧光影]]></dc:creator><pubDate>Sun, 07 Jun 2026 17:57:33 GMT</pubDate></item><item><title><![CDATA[Reply to 120 tok&#x2F;s Gemma 4 12B + MTP RTX-4070S 12GB on Sun, 07 Jun 2026 15:01:09 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/johnnybegood" aria-label="Profile: johnnybegood">@<bdi>johnnybegood</bdi></a></p>
<p dir="auto">其實也不算完全原生態就是了, 這模型單純沒有音頻Encoder, 圖像也沒完全弄走Encoder的樣子, 還留了一個小的Embedder.</p>
]]></description><link>https://lcz.me/post/5556</link><guid isPermaLink="true">https://lcz.me/post/5556</guid><dc:creator><![CDATA[566656661]]></dc:creator><pubDate>Sun, 07 Jun 2026 15:01:09 GMT</pubDate></item><item><title><![CDATA[Reply to 120 tok&#x2F;s Gemma 4 12B + MTP RTX-4070S 12GB on Sun, 07 Jun 2026 14:50:36 GMT]]></title><description><![CDATA[<p dir="auto">试了一下， 需要重新编译llama.cpp， 3090 下面能到120t/s， 速度不错， 跑128k上下文实际任务也能在80-90t/s ， 智商也算是在线，关键是多模态原生支持图像和音频， 试了一下也比较准确。不错。</p>
]]></description><link>https://lcz.me/post/5554</link><guid isPermaLink="true">https://lcz.me/post/5554</guid><dc:creator><![CDATA[johnnybegood]]></dc:creator><pubDate>Sun, 07 Jun 2026 14:50:36 GMT</pubDate></item><item><title><![CDATA[Reply to 120 tok&#x2F;s Gemma 4 12B + MTP RTX-4070S 12GB on Sun, 07 Jun 2026 14:48:04 GMT]]></title><description><![CDATA[<p dir="auto">以后不要发纯英文帖子，如果是AI生成的，会封号。</p>
]]></description><link>https://lcz.me/post/5551</link><guid isPermaLink="true">https://lcz.me/post/5551</guid><dc:creator><![CDATA[terry]]></dc:creator><pubDate>Sun, 07 Jun 2026 14:48:04 GMT</pubDate></item></channel></rss>