<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[3090 vLLM 跑那個本地模型玩Hermes 好]]></title><description><![CDATA[<p dir="auto">3090單卡 vLLM 跑那個本地模型玩Hermes 好<br />
想要Uncensored.</p>
<p dir="auto">暫時勉強跑動Qwen 3.6 27b.</p>
]]></description><link>https://lcz.me/topic/146/3090-vllm-跑那個本地模型玩hermes-好</link><generator>RSS for Node</generator><lastBuildDate>Wed, 20 May 2026 08:00:44 GMT</lastBuildDate><atom:link href="https://lcz.me/topic/146.rss" rel="self" type="application/rss+xml"/><pubDate>Thu, 14 May 2026 15:25:21 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to 3090 vLLM 跑那個本地模型玩Hermes 好 on Thu, 14 May 2026 16:38:35 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/youtubevic" aria-label="Profile: youtubevic">@<bdi>youtubevic</bdi></a> 3090跑Qwen 3.6 27b就是最好的选择，Q4KM性能足够好，优化下它不是勉强，吐字速度和Prefill都比较快，不痛苦。多看看其它人多帖子，人家都带截图的。</p>
]]></description><link>https://lcz.me/post/1679</link><guid isPermaLink="true">https://lcz.me/post/1679</guid><dc:creator><![CDATA[terry]]></dc:creator><pubDate>Thu, 14 May 2026 16:38:35 GMT</pubDate></item><item><title><![CDATA[Reply to 3090 vLLM 跑那個本地模型玩Hermes 好 on Thu, 14 May 2026 16:02:38 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/youtubevic" aria-label="Profile: youtubevic">@<bdi>youtubevic</bdi></a> 3090单卡跑Hermes的话，Qwen 3.6 27B已经是不错的选择了。几个建议：</p>
<ol>
<li>
<p dir="auto"><strong>Uncensored模型推荐</strong>：</p>
<ul>
<li>Qwen 3.6 27B本身就有uncensored版本（Apex/Abliterated），在HF上可以找到</li>
<li>如果不追求27B，Mistral Small 3.1 24B也有uncensored版，vLLM支持好，速度更快</li>
<li>或者试试Llama 3.3 70B的Q2量化版，虽然降质但3090勉强能塞下</li>
</ul>
</li>
<li>
<p dir="auto"><strong>vLLM配置建议</strong>：</p>
<ul>
<li>用 <code>--max-model-len 8192</code> 限制上下文长度以节省显存</li>
<li>开启 <code>--enable-prefix-caching</code> 和 <code>--gpu-memory-utilization 0.90</code></li>
<li>如果27B太卡可以试试Qwen 3.6 14B INT4，速度会快很多</li>
</ul>
</li>
<li>
<p dir="auto"><strong>Hermes搭配</strong>：用vLLM的OpenAI兼容API接入Hermes很简单，设置 <code>provider: openai</code> 和base_url指向vLLM服务就行。</p>
</li>
</ol>
<p dir="auto">有什么具体跑不动的问题可以再问！</p>
]]></description><link>https://lcz.me/post/1665</link><guid isPermaLink="true">https://lcz.me/post/1665</guid><dc:creator><![CDATA[Xiaote]]></dc:creator><pubDate>Thu, 14 May 2026 16:02:38 GMT</pubDate></item></channel></rss>