<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[平民AI硬件参数对比]]></title><description><![CDATA[<p dir="auto">这几天出于好奇让Gemini帮忙总结了一下主流卡的参数，以及A卡和I卡大致相同计算性能的N卡的对比，数据不一定完全准确所以仅供参考</p>
<p dir="auto">这是一些我的硬件理解，如果有不对的还请指正：</p>
<ul>
<li>这些数据只是理论数据，因为有很多其他瓶颈所以并不完全代表实际性能，尤其是sparse matmul数据仅供参考，也不代表用了cudagraph或者其他特别优化的kernel之后的性能</li>
<li>一些架构除非特别指定一般会根据硬件用兼容性最高的dtype作为运算，比如llama.cpp默认dequant activation到fp32, weight到fp16，vllm和comfyui默认fp16/bf16</li>
<li>跑LLM更多的是sparse matmul 意思是矩阵会有很多0weight N卡tensor对这种矩阵运算有特殊的优化</li>
<li>跑comfyui更多是dense matmul</li>
<li>运算性能只是一部分，有些步骤比如LLM decode和video generation更加多是受显存带宽限制而跑不满运算</li>
<li>I卡的INT8性能虽然强，但似乎只有openvino支持</li>
<li>7900XTX和r9700虽然没有原生fp16硬件支持但似乎Rocm有黑科技能加速fp16运算 R9700是有原生fp8硬件支持的</li>
</ul>
<hr />
<table class="table table-bordered table-striped">
<thead>
<tr>
<th>Hardware Metric</th>
<th>Intel Arc Pro B70</th>
<th>AMD Radeon AI PRO R9700</th>
<th>AMD Radeon RX 7900 XTX</th>
<th>NVIDIA RTX 3090</th>
<th>NVIDIA RTX 4070</th>
<th>NVIDIA RTX 5060 Ti</th>
<th>NVIDIA RTX 5070</th>
<th>NVIDIA RTX 5070 Ti</th>
<th>NVIDIA RTX 4090</th>
<th>NVIDIA RTX 5090</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Architecture</strong></td>
<td>Intel Xe2</td>
<td>AMD RDNA 4</td>
<td>AMD RDNA 3</td>
<td>3rd-Gen Ampere</td>
<td>4th-Gen Ada</td>
<td>5th-Gen Blackwell</td>
<td>5th-Gen Blackwell</td>
<td>5th-Gen Blackwell</td>
<td>4th-Gen Ada</td>
<td>5th-Gen Blackwell</td>
</tr>
<tr>
<td><strong>VRAM Capacity</strong></td>
<td>32 GB GDDR6</td>
<td>32 GB GDDR6</td>
<td>24 GB GDDR6</td>
<td>24 GB GDDR6X</td>
<td>12 GB GDDR6X</td>
<td>16 GB GDDR7</td>
<td>12 GB GDDR7</td>
<td>16 GB GDDR7</td>
<td>24 GB GDDR6X</td>
<td>32 GB GDDR7</td>
</tr>
<tr>
<td><strong>Memory Bus Width</strong></td>
<td>256-bit</td>
<td>256-bit</td>
<td>384-bit</td>
<td>384-bit</td>
<td>192-bit</td>
<td>128-bit</td>
<td>192-bit</td>
<td>256-bit</td>
<td>384-bit</td>
<td>512-bit</td>
</tr>
<tr>
<td><strong>Memory Bandwidth</strong></td>
<td>608 GB/s</td>
<td>644.6 GB/s</td>
<td>960 GB/s</td>
<td>936 GB/s</td>
<td>504 GB/s</td>
<td>448 GB/s</td>
<td>672 GB/s</td>
<td>896 GB/s</td>
<td>1,008 GB/s</td>
<td>1,792 GB/s</td>
</tr>
<tr>
<td><strong>FP32 (Float32)</strong></td>
<td>~22.9 TFLOPS</td>
<td>~47.8 TFLOPS</td>
<td>~61.4 TFLOPS</td>
<td>~35.6 TFLOPS</td>
<td>~29.2 TFLOPS</td>
<td>~23.7 TFLOPS</td>
<td>~30.8 TFLOPS</td>
<td>~43.9 TFLOPS</td>
<td>~82.6 TFLOPS</td>
<td>~104.8 TFLOPS</td>
</tr>
<tr>
<td><strong>FP16 / BF16 (Dense)</strong></td>
<td>~46 TFLOPS</td>
<td>~95.7 TFLOPS</td>
<td>~123 TFLOPS</td>
<td>~71 TFLOPS</td>
<td>~117 TFLOPS</td>
<td>~94.8 TFLOPS</td>
<td>~124 TFLOPS</td>
<td>~175.7 TFLOPS</td>
<td>~165.2 TFLOPS</td>
<td>~419.2 TFLOPS</td>
</tr>
<tr>
<td><strong>FP16 / BF16 (Sparse)</strong></td>
<td><em>No Sparsity</em></td>
<td><em>No Sparsity</em></td>
<td><em>No Sparsity</em></td>
<td>~142 TFLOPS</td>
<td>~233 TFLOPS</td>
<td>~189.6 TFLOPS</td>
<td>~248 TFLOPS</td>
<td>~351.4 TFLOPS</td>
<td>~330.3 TFLOPS</td>
<td>~838.4 TFLOPS</td>
</tr>
<tr>
<td><strong>INT8 / FP8 (Dense)</strong></td>
<td>367 TOPS / ~46 TF</td>
<td>~191.4 / ~95.7 TF</td>
<td>~246 TOPS / <em>Emulated</em></td>
<td>~142 TOPS / <em>Emulated</em></td>
<td>~233 TOPS / ~233 TF</td>
<td>~189.6 / ~189.6 TF</td>
<td>~248 / ~248 TF</td>
<td>~351.4 TOPS / ~351.4 TF</td>
<td>~330.3 / ~330.3 TF</td>
<td>~838.4 / ~838.4 TF</td>
</tr>
<tr>
<td><strong>INT8 / FP8 (Sparse)</strong></td>
<td><em>No Sparsity</em></td>
<td><em>No Sparsity</em></td>
<td><em>No Sparsity</em></td>
<td>~284 TOPS / <em>Emulated</em></td>
<td>~466 TOPS / ~466 TF</td>
<td>~379.2 / ~379.2 TF</td>
<td>~496 / ~496 TF</td>
<td>~702.8 TOPS / ~702.8 TF</td>
<td>~660.6 / ~660.6 TF</td>
<td>~1,676.8 / ~1,676.8 TF</td>
</tr>
<tr>
<td><strong>INT4 (Dense / Sparse)</strong></td>
<td>~734 / <em>No Sparse</em></td>
<td>~1,531 / <em>No Sparse</em></td>
<td>~246 / <em>No Sparse</em></td>
<td>~284 / ~568 TOPS</td>
<td>~466 / ~932 TOPS</td>
<td>~379.2 / ~758.4 TOPS</td>
<td>~496 / ~992 TOPS</td>
<td>~702.8 / ~1,405.6 TOPS</td>
<td>~660.6 / ~1,321 TOPS</td>
<td>~1,676.8 / ~3,353.6 TOPS</td>
</tr>
<tr>
<td><strong>FP4 (Dense / Sparse)</strong></td>
<td><em>N/A (Emulated)</em></td>
<td><em>N/A (Emulated)</em></td>
<td><em>N/A (Emulated)</em></td>
<td><em>N/A (Emulated)</em></td>
<td><em>N/A (Emulated)</em></td>
<td>~758.4 / ~1518 TF</td>
<td>~988 / ~1,976 TF</td>
<td>~1,403 / ~2,806 TFLOPS</td>
<td><em>N/A (Emulated)</em></td>
<td>~1,676.8 / ~3,353.6 TF</td>
</tr>
</tbody>
</table>
<hr />
]]></description><link>https://lcz.me/topic/439/平民ai硬件参数对比</link><generator>RSS for Node</generator><lastBuildDate>Sat, 06 Jun 2026 02:29:33 GMT</lastBuildDate><atom:link href="https://lcz.me/topic/439.rss" rel="self" type="application/rss+xml"/><pubDate>Fri, 05 Jun 2026 13:46:05 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to 平民AI硬件参数对比 on Fri, 05 Jun 2026 18:05:04 GMT]]></title><description><![CDATA[<p dir="auto">只能説這個表格寫的是理論性能, 并不是實際性能, 能影響實際操作的東西太多了</p>
]]></description><link>https://lcz.me/post/5256</link><guid isPermaLink="true">https://lcz.me/post/5256</guid><dc:creator><![CDATA[566656661]]></dc:creator><pubDate>Fri, 05 Jun 2026 18:05:04 GMT</pubDate></item><item><title><![CDATA[Reply to 平民AI硬件参数对比 on Fri, 05 Jun 2026 17:54:00 GMT]]></title><description><![CDATA[<p dir="auto">5090 32G 算贵族硬件, 起码是 上中产 的硬件</p>
]]></description><link>https://lcz.me/post/5253</link><guid isPermaLink="true">https://lcz.me/post/5253</guid><dc:creator><![CDATA[Tony Wang]]></dc:creator><pubDate>Fri, 05 Jun 2026 17:54:00 GMT</pubDate></item><item><title><![CDATA[Reply to 平民AI硬件参数对比 on Fri, 05 Jun 2026 17:46:46 GMT]]></title><description><![CDATA[<p dir="auto">大家都說RTX顯卡 玩AI可以少折騰很多 本來看上RTX 5060 Ti 16GB的 結果一個老外說5070Ti 速度很快 核心都餵不飽, 顯卡LLM實際運作時溫度不高頂多45~55~60度, 可能沒有持續大量的workload吧<br />
5070 Ti 16GB 算是小甜 弄兩張湊32GB 也算舒服 不過沒用過ComfyUI<br />
而且有五年保固 大概是一般遊戲玩家喜愛的顯卡之一吧 以後要升級脫手也方便</p>
]]></description><link>https://lcz.me/post/5246</link><guid isPermaLink="true">https://lcz.me/post/5246</guid><dc:creator><![CDATA[kos or]]></dc:creator><pubDate>Fri, 05 Jun 2026 17:46:46 GMT</pubDate></item><item><title><![CDATA[Reply to 平民AI硬件参数对比 on Fri, 05 Jun 2026 14:47:57 GMT]]></title><description><![CDATA[<p dir="auto">确实5070ti如果能32G将是绝杀，5090要是能改64G也很猛</p>
]]></description><link>https://lcz.me/post/5225</link><guid isPermaLink="true">https://lcz.me/post/5225</guid><dc:creator><![CDATA[5ccccc]]></dc:creator><pubDate>Fri, 05 Jun 2026 14:47:57 GMT</pubDate></item><item><title><![CDATA[Reply to 平民AI硬件参数对比 on Fri, 05 Jun 2026 14:09:18 GMT]]></title><description><![CDATA[<p dir="auto">看起来PRO R9700和5070TI是个不错的选择， 只是不知道5070TI能不能改成32G</p>
]]></description><link>https://lcz.me/post/5219</link><guid isPermaLink="true">https://lcz.me/post/5219</guid><dc:creator><![CDATA[sirwang]]></dc:creator><pubDate>Fri, 05 Jun 2026 14:09:18 GMT</pubDate></item></channel></rss>