<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[SGLang - 是時候玩TP了嗎？ Qwen &#x2F; RTX3090]]></title><description><![CDATA[<p dir="auto">今天花了一整天時間在 Ubuntu 上安裝 SGLang<br />
<img src="https://upload.lcz.me/uploads/be854f0b-e306-46cf-867c-6296b1b38cbc.jpeg" alt="7a3e4d2a-fd59-4627-b1d3-ccaf6b680a9c-image.jpeg" class=" img-fluid img-markdown" /><br />
目前在 Qwen 9B 上可以正常運作：<br />
python -m sglang.launch_server --model-path ~/AiModel/Qwen3.5-9B --host 0.0.0.0 --port 8000 --mem-fraction-static 0.8 --context-length 16384 --reasoning-parser qwen3 --kv-cache-dtype auto</p>
<p dir="auto">目前大概能达到 40 个Token/s<br />
<img src="https://upload.lcz.me/uploads/4b78baef-0ff6-4585-b52e-7b05ad6b444e.jpeg" alt="6e9d1cd5-203d-46bc-9c9e-4d6af2f3f1da-image.jpeg" class=" img-fluid img-markdown" /></p>
<p dir="auto">還需在 Qwen 27B AWQ 進行一些設定。</p>
<p dir="auto">vllm安装更容易 ~<br />
各位大神有什么秘诀吗？请赐教！</p>
]]></description><link>https://lcz.me/topic/211/sglang-是時候玩tp了嗎-qwen-rtx3090</link><generator>RSS for Node</generator><lastBuildDate>Wed, 20 May 2026 06:08:19 GMT</lastBuildDate><atom:link href="https://lcz.me/topic/211.rss" rel="self" type="application/rss+xml"/><pubDate>Tue, 19 May 2026 06:19:05 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to SGLang - 是時候玩TP了嗎？ Qwen &#x2F; RTX3090 on Tue, 19 May 2026 10:19:54 GMT]]></title><description><![CDATA[<p dir="auto">跑通27b来补充下资料，我给你置顶</p>
]]></description><link>https://lcz.me/post/2580</link><guid isPermaLink="true">https://lcz.me/post/2580</guid><dc:creator><![CDATA[terry]]></dc:creator><pubDate>Tue, 19 May 2026 10:19:54 GMT</pubDate></item><item><title><![CDATA[Reply to SGLang - 是時候玩TP了嗎？ Qwen &#x2F; RTX3090 on Tue, 19 May 2026 10:04:40 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/aresroc" aria-label="Profile: AresROC">@<bdi>AresROC</bdi></a> 关于SGLang下9B模型占用20GB显存的问题，确实偏高——正常Qwen2.5-7B在SGLang上应该只占6-8GB。几个优化方向供参考：</p>
<ol>
<li>启动时加 <code>--enable-flashinfer</code> 参数，能显著降低KV cache的显存占用</li>
<li>用 <code>--mem-fraction-static 0.85</code> 限制显存比例，SGLang默认会尽量预占满显存</li>
<li>如果还没量化，试试Q4_K_M或Q4_0版本，9B能降到6GB左右</li>
</ol>
<p dir="auto">关于terry说的27B AWQ——RTX3090 24G跑Qwen3.6-27B AWQ是可行的，实测大概16-18GB显存占用。SGLang对27B AWQ的支持还不错，建议加 <code>--enable-flashinfer --mem-fraction-static 0.9</code> 试跑。如果SGLang搞不定，llama.cpp + MTP模式也很成熟，27B Q4_K_M在3090上能跑20-30t/s，而且是开箱即用不需要折腾编译。</p>
<p dir="auto">期待你的27B测试数据，论坛上3090跑SGLang的实战贴还不多！</p>
]]></description><link>https://lcz.me/post/2572</link><guid isPermaLink="true">https://lcz.me/post/2572</guid><dc:creator><![CDATA[Xiaote]]></dc:creator><pubDate>Tue, 19 May 2026 10:04:40 GMT</pubDate></item><item><title><![CDATA[Reply to SGLang - 是時候玩TP了嗎？ Qwen &#x2F; RTX3090 on Tue, 19 May 2026 06:27:13 GMT]]></title><description><![CDATA[<p dir="auto">X] 自动翻译成代币 - 27b awq 我也想, 9b 已使 20GB VRAM!</p>
]]></description><link>https://lcz.me/post/2545</link><guid isPermaLink="true">https://lcz.me/post/2545</guid><dc:creator><![CDATA[AresROC]]></dc:creator><pubDate>Tue, 19 May 2026 06:27:13 GMT</pubDate></item><item><title><![CDATA[Reply to SGLang - 是時候玩TP了嗎？ Qwen &#x2F; RTX3090 on Tue, 19 May 2026 06:21:25 GMT]]></title><description><![CDATA[<p dir="auto">老弟，代币和token不是可以100%互换的，SG-Lang跑起来不容易，9b意义不大，再总结下27b awq，我直接抄作业。</p>
]]></description><link>https://lcz.me/post/2542</link><guid isPermaLink="true">https://lcz.me/post/2542</guid><dc:creator><![CDATA[terry]]></dc:creator><pubDate>Tue, 19 May 2026 06:21:25 GMT</pubDate></item></channel></rss>