<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[【A卡&#x2F;ROCm】7900 XTX 跑 ComfyUI 启用 SageAttention 出黑图 (NaN) 修复指南]]></title><description><![CDATA[<h1>【A卡/ROCm】7900 XTX 跑 ComfyUI 启用 SageAttention 出黑图 (NaN) 修复指南</h1>
<p dir="auto">本篇指南记录了在 <strong>AMD Radeon RX 7900 XTX (RDNA3 / gfx1100)</strong> 显卡、<strong>Ubuntu/Linux (ROCm 7.x + PyTorch 2.x)</strong> 环境下，运行 ComfyUI 启用 <strong>SageAttention (v1.0.6)</strong> 导致生成图片“全黑（无报错静默失败）”的硬核排障过程与解决方案。</p>
<p dir="auto">如果你也遇到了开加速器必黑图、关掉就正常的灵异事件，本篇笔记能帮你彻底解决。</p>
<hr />
<h2><img src="https://lcz.me/assets/plugins/nodebb-plugin-emoji/emoji/android/1f4bb.png?v=9a87c0a6150" class="not-responsive emoji emoji-android emoji--computer" style="height:23px;width:auto;vertical-align:middle" title="💻" alt="💻" /> 我们的软硬件测试环境</h2>
<p dir="auto">为便于对比排查，以下是本案所处的真实软硬件基础环境：</p>
<ul>
<li><strong>GPU</strong>: AMD Radeon RX 7900 XTX (24GB GDDR6 / RDNA3 / gfx1100)</li>
<li><strong>CPU</strong>: 双路 Intel Xeon E5-2682 v4</li>
<li><strong>内存</strong>: 64GB DDR4 REG ECC (全插满)</li>
<li><strong>操作系统</strong>: Ubuntu (Kernel 5.15.0-181-generic)</li>
<li><strong>ROCm 运行环境</strong>: <code>torch 2.12.0+rocm7.2</code></li>
<li><strong>SageAttention 库版本</strong>: <strong><code>1.0.6</code></strong>（纯 Triton JIT 动态编译版）</li>
<li><strong>ComfyUI 版本</strong>: v0.24.0 (机智罗 A 卡专用整合版)</li>
</ul>
<hr />
<h2><img src="https://lcz.me/assets/plugins/nodebb-plugin-emoji/emoji/android/1f4fa.png?v=9a87c0a6150" class="not-responsive emoji emoji-android emoji--tv" style="height:23px;width:auto;vertical-align:middle" title="📺" alt="📺" /> 背景与受影响工作流</h2>
<p dir="auto">在运行以下针对 AMD 优化的 ComfyUI 整合包（如机智罗 A 卡专用包）工作流时极易触发此问题：</p>
<ul>
<li><strong>机智罗 44号工作流</strong>（基于 Qwen-Image / Flux 架构的 GGUF 混合多模态工作流）</li>
<li><strong>机智罗 14号工作流</strong>（Wan2.2 视频生成工作流，大分辨率开启 SageAttention 加速时）</li>
</ul>
<hr />
<h2><img src="https://lcz.me/assets/plugins/nodebb-plugin-emoji/emoji/android/1f6a8.png?v=9a87c0a6150" class="not-responsive emoji emoji-android emoji--rotating_light" style="height:23px;width:auto;vertical-align:middle" title="🚨" alt="🚨" /> 故障现象</h2>
<ul>
<li><strong>表现</strong>：当且仅当在工作流中接入 <strong><code>XB_SageAttentionAccelerator</code>（SageAttention 算子加速器）</strong> 节点时，最终输出的图片（或视频帧）<strong>100% 是一片漆黑</strong>。</li>
<li><strong>控制台警告</strong>：生图结束、准备输出图像的瞬间，终端会静默弹出一行 RuntimeWarning，没有其他任何 CUDA/ROCm 崩溃堆栈：<pre><code class="language-text">/home/peter/ComfyUI/nodes.py:1657: RuntimeWarning: invalid value encountered in cast
  img = Image.fromarray(np.clip(i, 0, 255).astype(np.uint8))
</code></pre>
</li>
</ul>
<hr />
<h2><img src="https://lcz.me/assets/plugins/nodebb-plugin-emoji/emoji/android/1f50d.png?v=9a87c0a6150" class="not-responsive emoji emoji-android emoji--mag" style="height:23px;width:auto;vertical-align:middle" title="🔍" alt="🔍" /> 硬核根因分析</h2>
<p dir="auto">通过拉取并审计 SageAttention (v1.0.6) 在 Linux AMD 环境下的底层源码，揪出了以下两个核心冲突：</p>
<h3>1. 累加精度不足导致数值溢出（NaN）</h3>
<p dir="auto">目前 pip 直接安装的 SageAttention 1.0.6 是<strong>纯 Triton JIT 编译版本</strong>（无预编译 <code>.so</code>），它在 GPU 运行时动态编译 attention kernel。<br />
在其 <code>attn_qk_int8_per_block.py</code> 源码中，矩阵乘法累加计算硬编码为：</p>
<pre><code class="language-python">acc += tl.dot(p, v, out_dtype=tl.float16)  # 默认使用半精度累加
</code></pre>
<p dir="auto">在 AMD RDNA3 (7900 XTX) 的 Triton 编译器后端上，半精度累加在长序列或特定激活值下极易发生<strong>数值精度溢出，产生大量 NaN（非数）</strong>。<br />
NaN 顺着 KSampler 扩散到整个 latent，最终 VAE 解码时把 NaN 全部强制截断为 <code>0</code>，导致最终渲染出来的图片全黑。</p>
<h3>2. Shared Memory 硬件超限</h3>
<p dir="auto">原版 SageAttention 的 block 大小硬编码为 <code>BLOCK_M=128, BLOCK_N=64</code>，这在编译时需要约 <strong>106KB</strong> 的 Shared Memory（共享内存）。<br />
而 AMD RDNA3 显卡（7900 XTX）的物理 Shared Memory 单个 Workgroup 上限只有 <strong>65KB</strong>，这会导致 Triton 编译器在分配寄存器和共享内存时崩溃，或发生隐式内存回滚，进一步拉低速度并加剧精度混乱。</p>
<hr />
<h2><img src="https://lcz.me/assets/plugins/nodebb-plugin-emoji/emoji/android/1f6e0.png?v=9a87c0a6150" class="not-responsive emoji emoji-android emoji--hammer_and_wrench" style="height:23px;width:auto;vertical-align:middle" title="🛠" alt="🛠" />️ 终极解决方案（手动打补丁）</h2>
<p dir="auto">既然知道了是因为 <strong>“Shared Memory 超限”</strong> 和 <strong>“Triton 浮点累加溢出”</strong>，解决办法就是给 SageAttention 的 python 库手动替换补丁文件。</p>
<h3>第一步：定位 SageAttention 库路径</h3>
<p dir="auto">在你的 ComfyUI 运行虚拟环境（venv）下，找到 <code>sageattention</code> 包的实际安装路径：</p>
<pre><code class="language-bash">source /home/peter/ComfyUI/venv/bin/activate
SA_DIR=$(python3 -c 'import sageattention, os; print(os.path.dirname(sageattention.__file__))')
echo "你的包路径在: $SA_DIR"
</code></pre>
<h3>第二步：备份原始文件（安全第一）</h3>
<pre><code class="language-bash">cp $SA_DIR/attn_qk_int8_per_block.py $SA_DIR/attn_qk_int8_per_block.py.bak
cp $SA_DIR/attn_qk_int8_per_block_causal.py $SA_DIR/attn_qk_int8_per_block_causal.py.bak
cp $SA_DIR/quant_per_block.py $SA_DIR/quant_per_block.py.bak
</code></pre>
<h3>第三步：下载并覆盖 Zluda-AMD 优化版补丁</h3>
<p dir="auto">使用社区（来自 <code>patientx/ComfyUI-Zluda</code>）针对 AMD 显卡优化过的 Triton 参数补丁，直接覆盖本地文件：</p>
<pre><code class="language-bash">BASE_URL='https://raw.githubusercontent.com/patientx/ComfyUI-Zluda/refs/heads/master/comfy/customzluda/sa'

curl -fsSL $BASE_URL/attn_qk_int8_per_block.py -o $SA_DIR/attn_qk_int8_per_block.py
curl -fsSL $BASE_URL/attn_qk_int8_per_block_causal.py -o $SA_DIR/attn_qk_int8_per_block_causal.py
curl -fsSL $BASE_URL/quant_per_block.py -o $SA_DIR/quant_per_block.py
</code></pre>
<h3>第四步：清空 Triton 缓存（核心步骤）</h3>
<p dir="auto">为了让刚刚覆盖的补丁生效，必须清空 Triton 之前的旧编译缓存，强迫它在下次启动时重新编译 kernel：</p>
<pre><code class="language-bash">rm -rf ~/.triton/cache
</code></pre>
<hr />
<h2><img src="https://lcz.me/assets/plugins/nodebb-plugin-emoji/emoji/android/1f4a1.png?v=9a87c0a6150" class="not-responsive emoji emoji-android emoji--bulb" style="height:23px;width:auto;vertical-align:middle" title="💡" alt="💡" /> 补丁到底改了什么？</h2>
<ol>
<li><strong><code>out_dtype: tl.float16</code> ➔ <code>tl.float32</code></strong>：累加矩阵全部改用 <strong>Float32 全精度</strong>。这一步彻底消灭了 A 卡上的精度溢出，是解决黑图（NaN）的核心！</li>
<li><strong><code>BLOCK_M=128, BLOCK_N=64</code> ➔ <code>BLOCK_M=32, BLOCK_N=16</code></strong>：将 Block 大小缩到极小。这使得单个 Workgroup 占用的 Shared Memory 从 106KB 暴降到 <strong>~8KB</strong>，完美躲开 gfx1100 显卡的 65KB 硬件上限。</li>
<li><strong>引入 Autotune（自动寻优）</strong>：新增了对 <code>qo_len</code>、<code>kv_len</code>、<code>h_qo</code> 的 Triton 动态 autotune 查找，网卡会根据你的生成分辨率自动寻找效率最高的线程分配方案，不再死板硬编码。</li>
</ol>
<hr />
<h2>🧪 验证与收尾</h2>
<ol>
<li>
<p dir="auto"><strong>导入冒烟测试</strong>：<br />
在命令行运行以下测试代码，确认没有 NaN 且有数据输出：</p>
<pre><code class="language-bash">python3 -c "
import torch, sageattention
from sageattention import sageattn
q = torch.randn(1, 16, 256, 128, dtype=torch.float16, device='cuda')
k = torch.randn(1, 16, 256, 128, dtype=torch.float16, device='cuda')
v = torch.randn(1, 16, 256, 128, dtype=torch.float16, device='cuda')
out = sageattn(q, k, v, tensor_layout='HND')
print('是否有NaN:', torch.isnan(out).any().item())
print('是否全为零:', (out == 0).all().item())
"
# 输出：是否有NaN: False，是否全为零: False  ➔  ✅ 算法通路畅通！
</code></pre>
</li>
<li>
<p dir="auto"><strong>重新运行 ComfyUI</strong>：<br />
重启你的 ComfyUI 进程，在网页上<strong>接回并点亮 <code>XB_SageAttentionAccelerator</code> 节点</strong>，重新点击“Queue Prompt”。</p>
<p dir="auto">生图不仅完全恢复了色彩，而且由于 Triton 自动寻优，7900 XTX 终于能满血享受 SageAttention 带来的显存带宽减半与推理无痛加速了！<img src="https://lcz.me/assets/plugins/nodebb-plugin-emoji/emoji/android/1f680.png?v=9a87c0a6150" class="not-responsive emoji emoji-android emoji--rocket" style="height:23px;width:auto;vertical-align:middle" title="🚀" alt="🚀" /><br />
<img src="https://upload.lcz.me/uploads/b0f4d183-5d72-4923-9cd4-5774efba0a79.jpeg" alt="6d20ca31-3b0a-4d02-8470-5e946adcae8a-image.jpeg" class=" img-fluid img-markdown" /><br />
一开始agent建议我直接bypass掉<br />
<img src="https://upload.lcz.me/uploads/3fd4b87c-8309-44bf-91d3-00753277e70b.jpeg" alt="8e141ecd-792b-43d5-aea6-dc475614d912-image.jpeg" class=" img-fluid img-markdown" /><br />
<img src="https://upload.lcz.me/uploads/dee47565-84b1-43d3-8f75-29da85c3719f.jpeg" alt="828333ab-690c-4bcb-87b7-baae77858953-image.jpeg" class=" img-fluid img-markdown" /><br />
终于判断出来问题所在<br />
<img src="https://upload.lcz.me/uploads/dce893f3-6dca-4d1b-af5f-cb53369f7191.jpeg" alt="8016daed-366f-4103-8078-718e6f6fa15a-image.jpeg" class=" img-fluid img-markdown" /><br />
解决方案出来了<br />
<img src="https://upload.lcz.me/uploads/cf4fba8e-00fa-49aa-8b28-587642ac6dfd.jpeg" alt="f0fc47cf-058f-47c6-8935-6112dfe78d0e-image.jpeg" class=" img-fluid img-markdown" /><br />
测试成功<br />
<img src="https://upload.lcz.me/uploads/7d212b20-aa65-4f34-8a2f-5ff7fdb373c5.jpeg" alt="b5725ae2-1b40-4b6b-90a0-2c0e5e90094c-image.jpeg" class=" img-fluid img-markdown" /></p>
</li>
</ol>
]]></description><link>https://lcz.me/topic/574/a卡-rocm-7900-xtx-跑-comfyui-启用-sageattention-出黑图-nan-修复指南</link><generator>RSS for Node</generator><lastBuildDate>Wed, 01 Jul 2026 10:53:05 GMT</lastBuildDate><atom:link href="https://lcz.me/topic/574.rss" rel="self" type="application/rss+xml"/><pubDate>Mon, 15 Jun 2026 13:54:09 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to 【A卡&#x2F;ROCm】7900 XTX 跑 ComfyUI 启用 SageAttention 出黑图 (NaN) 修复指南 on Wed, 17 Jun 2026 02:44:01 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/566656661" aria-label="Profile: 566656661">@<bdi>566656661</bdi></a> 哈哈 现在是背靠背的锅，买了个8卡矿架还有延长线还没收到货，不背靠背就不用互相加热了</p>
]]></description><link>https://lcz.me/post/7155</link><guid isPermaLink="true">https://lcz.me/post/7155</guid><dc:creator><![CDATA[abaalei]]></dc:creator><pubDate>Wed, 17 Jun 2026 02:44:01 GMT</pubDate></item><item><title><![CDATA[Reply to 【A卡&#x2F;ROCm】7900 XTX 跑 ComfyUI 启用 SageAttention 出黑图 (NaN) 修复指南 on Tue, 16 Jun 2026 14:44:29 GMT]]></title><description><![CDATA[<p dir="auto">這溫度快燒烤了 <img src="https://lcz.me/assets/plugins/nodebb-plugin-emoji/emoji/android/1f602.png?v=9a87c0a6150" class="not-responsive emoji emoji-android emoji--joy" style="height:23px;width:auto;vertical-align:middle" title=":joy:" alt="😂" /> 多加幾把風扇吧</p>
]]></description><link>https://lcz.me/post/7088</link><guid isPermaLink="true">https://lcz.me/post/7088</guid><dc:creator><![CDATA[566656661]]></dc:creator><pubDate>Tue, 16 Jun 2026 14:44:29 GMT</pubDate></item><item><title><![CDATA[Reply to 【A卡&#x2F;ROCm】7900 XTX 跑 ComfyUI 启用 SageAttention 出黑图 (NaN) 修复指南 on Tue, 16 Jun 2026 14:08:22 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/terry" aria-label="Profile: terry">@<bdi>terry</bdi></a> <img src="https://lcz.me/assets/plugins/nodebb-plugin-emoji/emoji/android/1f60a.png?v=9a87c0a6150" class="not-responsive emoji emoji-android emoji--blush" style="height:23px;width:auto;vertical-align:middle" title=":blush:" alt="😊" /> 谢谢站长~<br />
<img src="https://upload.lcz.me/uploads/959ca5b2-8091-448e-92f7-7cfe4f8e174c.jpeg" alt="970640e3-22e4-4006-9671-f300dfe402e2-image.jpeg" class=" img-fluid img-markdown" /><br />
<img src="https://upload.lcz.me/uploads/43b0fe1b-3aa0-46a3-b903-62c8629ff3b1.jpeg" alt="e367e19d-7668-4e46-a1ac-1100d1f80e47-image.jpeg" class=" img-fluid img-markdown" /><br />
今天加了第二张7900xtx，正在瞎折腾中 <img src="https://lcz.me/assets/plugins/nodebb-plugin-emoji/emoji/android/1f912.png?v=9a87c0a6150" class="not-responsive emoji emoji-android emoji--face_with_thermometer" style="height:23px;width:auto;vertical-align:middle" title=":face_with_thermometer:" alt="🤒" /><br />
这个电暖器可太好玩了<br />
就是两张卡背靠背太热了，还是得上延长线<br />
<img src="https://upload.lcz.me/uploads/3c658696-8786-4991-8fa3-89dad6ffd68e.jpeg" alt="5b9a4411-6f9c-4f4d-a2ad-d7a2e4b5509a-image.jpeg" class=" img-fluid img-markdown" /><br />
<img src="https://upload.lcz.me/uploads/b25d6828-fd4d-41de-b0d5-964ae23c23c1.jpeg" alt="4fc683bd-388d-430e-aaf0-935395953122-image.jpeg" class=" img-fluid img-markdown" /></p>
]]></description><link>https://lcz.me/post/7082</link><guid isPermaLink="true">https://lcz.me/post/7082</guid><dc:creator><![CDATA[abaalei]]></dc:creator><pubDate>Tue, 16 Jun 2026 14:08:22 GMT</pubDate></item><item><title><![CDATA[Reply to 【A卡&#x2F;ROCm】7900 XTX 跑 ComfyUI 启用 SageAttention 出黑图 (NaN) 修复指南 on Tue, 16 Jun 2026 14:06:13 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/joe1900" aria-label="Profile: joe1900">@<bdi>joe1900</bdi></a> 谢谢 我都是白嫖gemini的token来解决的而已，我自己是一窍不通<img src="https://lcz.me/assets/plugins/nodebb-plugin-emoji/emoji/android/1f627.png?v=9a87c0a6150" class="not-responsive emoji emoji-android emoji--anguished" style="height:23px;width:auto;vertical-align:middle" title=":anguished:" alt="😧" /> <img src="https://lcz.me/assets/plugins/nodebb-plugin-emoji/emoji/android/1f627.png?v=9a87c0a6150" class="not-responsive emoji emoji-android emoji--anguished" style="height:23px;width:auto;vertical-align:middle" title=":anguished:" alt="😧" /></p>
]]></description><link>https://lcz.me/post/7080</link><guid isPermaLink="true">https://lcz.me/post/7080</guid><dc:creator><![CDATA[abaalei]]></dc:creator><pubDate>Tue, 16 Jun 2026 14:06:13 GMT</pubDate></item><item><title><![CDATA[Reply to 【A卡&#x2F;ROCm】7900 XTX 跑 ComfyUI 启用 SageAttention 出黑图 (NaN) 修复指南 on Tue, 16 Jun 2026 11:06:25 GMT]]></title><description><![CDATA[<p dir="auto">很实用，也很牛逼</p>
]]></description><link>https://lcz.me/post/7073</link><guid isPermaLink="true">https://lcz.me/post/7073</guid><dc:creator><![CDATA[terry]]></dc:creator><pubDate>Tue, 16 Jun 2026 11:06:25 GMT</pubDate></item><item><title><![CDATA[Reply to 【A卡&#x2F;ROCm】7900 XTX 跑 ComfyUI 启用 SageAttention 出黑图 (NaN) 修复指南 on Tue, 16 Jun 2026 10:57:12 GMT]]></title><description><![CDATA[<p dir="auto">听过AMD就是坑，填坑就靠大叔这样的人了</p>
]]></description><link>https://lcz.me/post/7072</link><guid isPermaLink="true">https://lcz.me/post/7072</guid><dc:creator><![CDATA[joe1900]]></dc:creator><pubDate>Tue, 16 Jun 2026 10:57:12 GMT</pubDate></item><item><title><![CDATA[Reply to 【A卡&#x2F;ROCm】7900 XTX 跑 ComfyUI 启用 SageAttention 出黑图 (NaN) 修复指南 on Tue, 16 Jun 2026 07:45:04 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/ye9ok" aria-label="Profile: ye9ok">@<bdi>ye9ok</bdi></a> <img src="https://lcz.me/assets/plugins/nodebb-plugin-emoji/emoji/android/1f62c.png?v=9a87c0a6150" class="not-responsive emoji emoji-android emoji--grimacing" style="height:23px;width:auto;vertical-align:middle" title=":grimacing:" alt="😬" /> 谢谢支持，然而我只是工程狗，只是一个从小学开始就喜欢折腾电脑的大叔而已拉</p>
]]></description><link>https://lcz.me/post/7053</link><guid isPermaLink="true">https://lcz.me/post/7053</guid><dc:creator><![CDATA[abaalei]]></dc:creator><pubDate>Tue, 16 Jun 2026 07:45:04 GMT</pubDate></item><item><title><![CDATA[Reply to 【A卡&#x2F;ROCm】7900 XTX 跑 ComfyUI 启用 SageAttention 出黑图 (NaN) 修复指南 on Mon, 15 Jun 2026 14:27:56 GMT]]></title><description><![CDATA[<p dir="auto">程序员就是专业！一般淫不可能找到</p>
]]></description><link>https://lcz.me/post/6967</link><guid isPermaLink="true">https://lcz.me/post/6967</guid><dc:creator><![CDATA[ye9ok]]></dc:creator><pubDate>Mon, 15 Jun 2026 14:27:56 GMT</pubDate></item></channel></rss>