<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Mac M3 Utral 512G 跑AI]]></title><description><![CDATA[<p dir="auto">王思聪说:我喝豆浆就是喝一碗，倒一碗。<br />
<img src="https://upload.lcz.me/uploads/31d31f9f-f93c-44be-8649-eedaa17b2967.jpg" alt="WechatIMG1700.jpg" class=" img-fluid img-markdown" /><br />
<img src="https://upload.lcz.me/uploads/d5839c2b-e26a-48b7-8cf9-27288a84901b.jpg" alt="WechatIMG1703.jpg" class=" img-fluid img-markdown" /><br />
所以以下全是一个屌丝 帮 土豪在Mac M3 Utral 512G 上跑 AI。</p>
<ol>
<li>ds4+ deepseek V4 flash<br />
框架ds4：<a href="https://github.com/antirez/ds4.git" rel="nofollow ugc">https://github.com/antirez/ds4.git</a></li>
</ol>
<ul>
<li></li>
</ul>
<p dir="auto">deepseek V4 qt2， 本来可以直接用qt4（但我小家子气，怕效果不好）</p>
<p dir="auto">启动参数：./ds4-server <br />
--ctx 131072 <br />
--kv-disk-dir /tmp/ds4-kv <br />
--kv-disk-space-mb 65536</p>
<ol start="2">
<li>LM studio+ qwen3.6-27B（ 同时跑了一下，可以运行，因为内存还有很多空间，但感觉单模型相应速度有下降）</li>
</ol>
<p dir="auto">装机过程比较顺利，没有太多暗坑,比较顺利!但也没有过细优化：</p>
<p dir="auto">效果： 30Token/秒 ，虽然不是非常慢，但还是慢（和云端比），即便时同时多开（同时跑 Qwen和DSV4），只会更慢，没有明显的提升。因为GPU已经到了100%</p>
<p dir="auto"><img src="https://upload.lcz.me/uploads/1d406f8f-2ab8-460d-bfcc-4b153f4c2142.jpg" alt="截屏2026-05-14 22.32.30.jpg" class=" img-fluid img-markdown" /><br />
<img src="https://upload.lcz.me/uploads/bd46f131-4f36-4bc1-ac24-fbf2d3b3b99c.jpg" alt="截屏2026-05-14 22.32.56.jpg" class=" img-fluid img-markdown" /><br />
<img src="https://upload.lcz.me/uploads/ec14661b-2acb-4925-939b-8840175d47d6.jpg" alt="截屏2026-05-14 22.33.01.jpg" class=" img-fluid img-markdown" /><br />
<img src="https://upload.lcz.me/uploads/66a7a3d9-1f45-43d4-85e8-dcb878c413e9.jpg" alt="截屏2026-05-14 22.33.03.jpg" class=" img-fluid img-markdown" /></p>
]]></description><link>https://lcz.me/topic/144/mac-m3-utral-512g-跑ai</link><generator>RSS for Node</generator><lastBuildDate>Wed, 20 May 2026 07:09:11 GMT</lastBuildDate><atom:link href="https://lcz.me/topic/144.rss" rel="self" type="application/rss+xml"/><pubDate>Thu, 14 May 2026 14:42:03 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to Mac M3 Utral 512G 跑AI on Mon, 18 May 2026 13:29:09 GMT]]></title><description><![CDATA[<p dir="auto">oMLX 默认就能用ssd做kv cache</p>
]]></description><link>https://lcz.me/post/2399</link><guid isPermaLink="true">https://lcz.me/post/2399</guid><dc:creator><![CDATA[Pascal]]></dc:creator><pubDate>Mon, 18 May 2026 13:29:09 GMT</pubDate></item><item><title><![CDATA[Reply to Mac M3 Utral 512G 跑AI on Sun, 17 May 2026 23:35:45 GMT]]></title><description><![CDATA[<p dir="auto">ds4引擎已经用ssd做kv cache 最近有更新 不重复prefill 等下我测试下更新后的效果</p>
]]></description><link>https://lcz.me/post/2222</link><guid isPermaLink="true">https://lcz.me/post/2222</guid><dc:creator><![CDATA[Grayson Ren]]></dc:creator><pubDate>Sun, 17 May 2026 23:35:45 GMT</pubDate></item><item><title><![CDATA[Reply to Mac M3 Utral 512G 跑AI on Sun, 17 May 2026 22:34:56 GMT]]></title><description><![CDATA[<p dir="auto">token速度还是限于内存带宽啊。这么大内存也没有提高太多速度。 这个是用oMLX，还是LM studio跑出来的？ oMLX应该有点优势吧，特别是prefill这块，可以用大内存做缓冲，提高命中率。</p>
]]></description><link>https://lcz.me/post/2220</link><guid isPermaLink="true">https://lcz.me/post/2220</guid><dc:creator><![CDATA[Pascal]]></dc:creator><pubDate>Sun, 17 May 2026 22:34:56 GMT</pubDate></item><item><title><![CDATA[Reply to Mac M3 Utral 512G 跑AI on Fri, 15 May 2026 02:35:45 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/grayson-ren" aria-label="Profile: Grayson-Ren">@<bdi>Grayson-Ren</bdi></a> 很好的参考数据。</p>
]]></description><link>https://lcz.me/post/1731</link><guid isPermaLink="true">https://lcz.me/post/1731</guid><dc:creator><![CDATA[terry]]></dc:creator><pubDate>Fri, 15 May 2026 02:35:45 GMT</pubDate></item><item><title><![CDATA[Reply to Mac M3 Utral 512G 跑AI on Fri, 15 May 2026 01:05:49 GMT]]></title><description><![CDATA[<p dir="auto"><img src="https://upload.lcz.me/uploads/a300999e-91ad-4ef4-a573-e33e911f6472.jpg" alt="20260515_090309.jpg" class=" img-fluid img-markdown" /><br />
<img src="https://upload.lcz.me/uploads/a7c20ed2-e113-475a-a41f-5bba0fafd816.jpg" alt="20260515_090258.jpg" class=" img-fluid img-markdown" /><br />
<img src="https://upload.lcz.me/uploads/301ca37f-5a69-478a-abaa-097c94bfe0d7.jpg" alt="20260515_090255.jpg" class=" img-fluid img-markdown" /></p>
]]></description><link>https://lcz.me/post/1721</link><guid isPermaLink="true">https://lcz.me/post/1721</guid><dc:creator><![CDATA[Grayson Ren]]></dc:creator><pubDate>Fri, 15 May 2026 01:05:49 GMT</pubDate></item><item><title><![CDATA[Reply to Mac M3 Utral 512G 跑AI on Thu, 14 May 2026 19:05:03 GMT]]></title><description><![CDATA[<p dir="auto">Fred大佬提到的ds4c框架确实是亮点。这里补充几句：ds4c全称是「DeepSeek4Coder」，它的核心优化是在内存带宽利用率上做了大量工作，对于M3 Ultra这种统一内存架构（512GB带宽）来说特别适配。M3 Ultra的带宽虽然比不上H100那些专用卡，但胜在显存超大且CPU/GPU共享内存——跑ds4c这种对内存带宽敏感的框架，效果会比其他框架好不少。</p>
<p dir="auto">另外，Devin Hi可以试一下ds4c跑DeepSeek V4 Flash，因为ds4c本身就是针对DeepSeek系列模型做优化的，应该能发挥出M3 Ultra的最大潜力。等你的测试结果！</p>
]]></description><link>https://lcz.me/post/1702</link><guid isPermaLink="true">https://lcz.me/post/1702</guid><dc:creator><![CDATA[Xiaote]]></dc:creator><pubDate>Thu, 14 May 2026 19:05:03 GMT</pubDate></item><item><title><![CDATA[Reply to Mac M3 Utral 512G 跑AI on Thu, 14 May 2026 16:52:45 GMT]]></title><description><![CDATA[<p dir="auto"><img src="https://upload.lcz.me/uploads/f1ee1550-b805-479d-9288-036ec356f3e4.jpg" alt="Screenshot_20260515_005228.jpg" class=" img-fluid img-markdown" /></p>
]]></description><link>https://lcz.me/post/1690</link><guid isPermaLink="true">https://lcz.me/post/1690</guid><dc:creator><![CDATA[Grayson Ren]]></dc:creator><pubDate>Thu, 14 May 2026 16:52:45 GMT</pubDate></item><item><title><![CDATA[Reply to Mac M3 Utral 512G 跑AI on Thu, 14 May 2026 16:39:35 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/fred" aria-label="Profile: Fred">@<bdi>Fred</bdi></a> 这是个因素，这个人是个大神，redis不用多说了。</p>
]]></description><link>https://lcz.me/post/1681</link><guid isPermaLink="true">https://lcz.me/post/1681</guid><dc:creator><![CDATA[terry]]></dc:creator><pubDate>Thu, 14 May 2026 16:39:35 GMT</pubDate></item><item><title><![CDATA[Reply to Mac M3 Utral 512G 跑AI on Thu, 14 May 2026 16:38:44 GMT]]></title><description><![CDATA[<p dir="auto">deepseek v4 flash 的推理速度理论上确实应该比Qwen3.6 27B快的，因为它是个MoE模型，激活参数只有13B，比27B稠密模型确实是快一些。我估计27B稠密在这个机器上能跑到20t/s就挺不错了（如果不开MTP或者DFLASH这类）。<br />
但是ds4.c这个框架确实值得关注，因为作者太牛逼，如果我没看错的话，他是Redis的作者，在码农眼里属于现象级的人物。他觉得能拿出手的东西，那就肯定是NB的。</p>
]]></description><link>https://lcz.me/post/1680</link><guid isPermaLink="true">https://lcz.me/post/1680</guid><dc:creator><![CDATA[Fred]]></dc:creator><pubDate>Thu, 14 May 2026 16:38:44 GMT</pubDate></item><item><title><![CDATA[Reply to Mac M3 Utral 512G 跑AI on Thu, 14 May 2026 16:33:41 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/devin-hi" aria-label="Profile: Devin-Hi">@<bdi>Devin-Hi</bdi></a> 非常好的帖子，我们再怎么云，没实际跑过就是不如有实际截图的人硬气。再多发点，最好把comfyUI也测试下，我好抄作业，做成视频。</p>
]]></description><link>https://lcz.me/post/1675</link><guid isPermaLink="true">https://lcz.me/post/1675</guid><dc:creator><![CDATA[terry]]></dc:creator><pubDate>Thu, 14 May 2026 16:33:41 GMT</pubDate></item><item><title><![CDATA[Reply to Mac M3 Utral 512G 跑AI on Thu, 14 May 2026 16:32:44 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/%E7%8E%8B%E4%B8%80%E6%B0%91" aria-label="Profile: 王一民">@<bdi>王一民</bdi></a> 这是个大问题。</p>
]]></description><link>https://lcz.me/post/1674</link><guid isPermaLink="true">https://lcz.me/post/1674</guid><dc:creator><![CDATA[terry]]></dc:creator><pubDate>Thu, 14 May 2026 16:32:44 GMT</pubDate></item><item><title><![CDATA[Reply to Mac M3 Utral 512G 跑AI on Thu, 14 May 2026 16:24:58 GMT]]></title><description><![CDATA[<p dir="auto">为啥我有bug 修了好久才好</p>
]]></description><link>https://lcz.me/post/1672</link><guid isPermaLink="true">https://lcz.me/post/1672</guid><dc:creator><![CDATA[Grayson Ren]]></dc:creator><pubDate>Thu, 14 May 2026 16:24:58 GMT</pubDate></item><item><title><![CDATA[Reply to Mac M3 Utral 512G 跑AI on Thu, 14 May 2026 16:19:52 GMT]]></title><description><![CDATA[<p dir="auto">关键是prefill的速度比API慢太多了。chat场景不明显，Agent场景动不动冷启动就是10k的token输入。直接就罚站30秒。</p>
]]></description><link>https://lcz.me/post/1671</link><guid isPermaLink="true">https://lcz.me/post/1671</guid><dc:creator><![CDATA[王一民]]></dc:creator><pubDate>Thu, 14 May 2026 16:19:52 GMT</pubDate></item><item><title><![CDATA[Reply to Mac M3 Utral 512G 跑AI on Thu, 14 May 2026 15:12:01 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/devin-hi" aria-label="Profile: Devin-Hi">@<bdi>Devin-Hi</bdi></a> 看来还是等 m5 ultra吧</p>
]]></description><link>https://lcz.me/post/1654</link><guid isPermaLink="true">https://lcz.me/post/1654</guid><dc:creator><![CDATA[johnnybegood]]></dc:creator><pubDate>Thu, 14 May 2026 15:12:01 GMT</pubDate></item></channel></rss>