<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[4080S 32G 魔改版 vast.ai 租借心得]]></title><description><![CDATA[<h1>RTX 4080S 32GB 實驗測試總結</h1>
<p dir="auto">日期：2026-06-08<br />
測試機：<a href="http://Vast.ai" rel="nofollow ugc">Vast.ai</a> RTX 4080S 32GB<br />
主要用途：TRELLIS.2 3D 生成、TripoSplat、LLM、CUDA/PyTorch benchmark</p>
<p dir="auto">這份報告只整理 RTX 4080S 32GB 這台機器的實測結果。前面也測過 3090、5090、PRO 4000 等卡，但這裡不展開，只在結論處用一句話對照採購意義。</p>
<h2>測試機資訊</h2>
<table class="table table-bordered table-striped">
<thead>
<tr>
<th>項目</th>
<th>數值</th>
</tr>
</thead>
<tbody>
<tr>
<td>GPU</td>
<td>RTX 4080S</td>
</tr>
<tr>
<td>VRAM</td>
<td>32760 MiB</td>
</tr>
<tr>
<td>CPU</td>
<td>AMD EPYC 7K62</td>
</tr>
<tr>
<td>Effective CPU cores</td>
<td>24</td>
</tr>
<tr>
<td>RAM</td>
<td>64 / 516 GB 配額</td>
</tr>
<tr>
<td>PCIe</td>
<td>Gen 4.0 x16</td>
</tr>
<tr>
<td>Driver</td>
<td>570.144</td>
</tr>
<tr>
<td>CUDA</td>
<td>12.8</td>
</tr>
<tr>
<td>PyTorch</td>
<td>2.7.0+cu128</td>
</tr>
<tr>
<td>Container</td>
<td><code>pytorch/pytorch:2.7.0-cuda12.8-cudnn9-devel</code></td>
</tr>
<tr>
<td>Vast machine ID</td>
<td>36413</td>
</tr>
<tr>
<td>Vast host ID</td>
<td>124072</td>
</tr>
<tr>
<td>測試地點</td>
<td>Sichuan, CN</td>
</tr>
<tr>
<td>當時租金</td>
<td>0.3277777778 USD/hr</td>
</tr>
</tbody>
</table>
<p dir="auto">這台機器已關閉。之後如果在 <a href="http://Vast.ai" rel="nofollow ugc">Vast.ai</a> 看到同一個 <code>machine_id=36413</code> 或 <code>host_id=124072</code>，可以優先租回來。</p>
<h2>一句話結論</h2>
<p dir="auto">RTX 4080S 32GB 在 TRELLIS.2 上可以穩定完成 4096 full-rembg，但它真正的價值不是把 3090 大幅甩開，而是 32GB VRAM：它可以跑 Qwen2.5 32B Q4 到 32K context，也能跑 Qwen2.5 32B Q5 到 8192 prompt。這一點是 24GB 卡比較容易開始緊張的地方。</p>
<p dir="auto">如果 3090 24GB 只要 30,000 NTD，3090 的 CP 值仍然非常強。<br />
如果 4080S 32GB 是 65,300 NTD，它的合理性主要來自 32GB VRAM，不是 TRELLIS.2 單次速度。</p>
<h2>TRELLIS.2：robot 4096 full-rembg</h2>
<p dir="auto">這是最接近實際工作流的測試，也是最重要的一筆。輸入圖使用我們自己的 <code>robot.jpeg</code>，不是官方範例圖。</p>
<table class="table table-bordered table-striped">
<thead>
<tr>
<th>項目</th>
<th>設定</th>
</tr>
</thead>
<tbody>
<tr>
<td>Model</td>
<td><code>microsoft/TRELLIS.2-4B</code></td>
</tr>
<tr>
<td>Input</td>
<td><code>robot.jpeg</code></td>
</tr>
<tr>
<td>RMBG</td>
<td>enabled</td>
</tr>
<tr>
<td>Condition model</td>
<td>DINOv3</td>
</tr>
<tr>
<td>Texture size</td>
<td>4096</td>
</tr>
<tr>
<td>Decimation target</td>
<td>1,000,000</td>
</tr>
<tr>
<td>GLB export</td>
<td>enabled</td>
</tr>
<tr>
<td>Output</td>
<td><code>outputs/rtx4080s32-robot-4096-full-rembg-xformers-detailed.glb</code></td>
</tr>
</tbody>
</table>
<h3>階段時間</h3>
<table class="table table-bordered table-striped">
<thead>
<tr>
<th>階段</th>
<th style="text-align:right">時間</th>
<th>備註</th>
</tr>
</thead>
<tbody>
<tr>
<td>pipeline_from_pretrained</td>
<td style="text-align:right">2:14.48</td>
<td>載入 TRELLIS.2 / DINOv3 / RMBG / pipeline</td>
</tr>
<tr>
<td>preprocess_image</td>
<td style="text-align:right">1.63s</td>
<td>RMBG / 前處理</td>
</tr>
<tr>
<td>get_cond 512</td>
<td style="text-align:right">1.12s</td>
<td>DINOv3 condition</td>
</tr>
<tr>
<td>get_cond 518</td>
<td style="text-align:right">0.55s</td>
<td>DINOv3 condition</td>
</tr>
<tr>
<td>sample_sparse_structure</td>
<td style="text-align:right">5.43s</td>
<td>sparse structure sampling</td>
</tr>
<tr>
<td>sample_shape_slat_cascade</td>
<td style="text-align:right">34.80s</td>
<td>shape SLat cascade</td>
</tr>
<tr>
<td>sample_tex_slat</td>
<td style="text-align:right">8.92s</td>
<td>texture SLat</td>
</tr>
<tr>
<td>decode_shape_slat</td>
<td style="text-align:right">1.09s</td>
<td>shape decode</td>
</tr>
<tr>
<td>decode_latent</td>
<td style="text-align:right">2.18s</td>
<td>texture / latent decode</td>
</tr>
<tr>
<td>pipeline_run_total</td>
<td style="text-align:right">54.73s</td>
<td>從前處理到 decode 完成</td>
</tr>
<tr>
<td>mesh_simplify</td>
<td style="text-align:right">0.01s</td>
<td>幾乎可忽略</td>
</tr>
<tr>
<td>to_glb</td>
<td style="text-align:right">1:12.68</td>
<td>mesh 修補、remesh、xatlas、attribute sampling</td>
</tr>
<tr>
<td>glb_export</td>
<td style="text-align:right">19.05s</td>
<td>寫出 GLB</td>
</tr>
<tr>
<td>measured compute total</td>
<td style="text-align:right">2:26.46</td>
<td>不含 pipeline 載入</td>
</tr>
<tr>
<td>Python process wall time</td>
<td style="text-align:right">4:50.15</td>
<td>含載入與 Python overhead</td>
</tr>
</tbody>
</table>
<h3>輸出與資源</h3>
<table class="table table-bordered table-striped">
<thead>
<tr>
<th>項目</th>
<th style="text-align:right">數值</th>
</tr>
</thead>
<tbody>
<tr>
<td>Max RSS</td>
<td style="text-align:right">25,537,196 KB</td>
</tr>
<tr>
<td>GPU memory max</td>
<td style="text-align:right">7,569 MiB</td>
</tr>
<tr>
<td>GPU memory avg</td>
<td style="text-align:right">3,196.71 MiB</td>
</tr>
<tr>
<td>GPU util max</td>
<td style="text-align:right">100%</td>
</tr>
<tr>
<td>GPU util avg</td>
<td style="text-align:right">16.98%</td>
</tr>
<tr>
<td>Power max</td>
<td style="text-align:right">318.99 W</td>
</tr>
<tr>
<td>Power avg</td>
<td style="text-align:right">69.12 W</td>
</tr>
<tr>
<td>Original mesh</td>
<td style="text-align:right">3,804,360 vertices / 7,783,442 faces</td>
</tr>
<tr>
<td>After remeshing</td>
<td style="text-align:right">6,476,874 vertices / 12,995,332 faces</td>
</tr>
<tr>
<td>Final mesh</td>
<td style="text-align:right">459,943 vertices / 930,960 faces</td>
</tr>
<tr>
<td>GLB size</td>
<td style="text-align:right">約 75-76 MiB</td>
</tr>
</tbody>
</table>
<p dir="auto">這筆是有效結果。之前有 no-rembg + DinoV2 fallback 的 2048 / 4096 測試，雖然技術上有跑完，但輸出品質失敗，不拿來當模型品質 benchmark。</p>
<h2>TripoSplat：robot Gaussian Splat</h2>
<p dir="auto">TripoSplat 這次不是輸出 GLB，而是輸出 <code>.ply</code> 和 <code>.splat</code>。模型檔已下載到本機。</p>
<table class="table table-bordered table-striped">
<thead>
<tr>
<th>項目</th>
<th style="text-align:right">數值</th>
</tr>
</thead>
<tbody>
<tr>
<td>Input</td>
<td style="text-align:right"><code>robot.jpeg</code></td>
</tr>
<tr>
<td>Gaussians</td>
<td style="text-align:right">262,144</td>
</tr>
<tr>
<td>pipeline_init</td>
<td style="text-align:right">21.56s</td>
</tr>
<tr>
<td>pipeline_run</td>
<td style="text-align:right">14.94s</td>
</tr>
<tr>
<td>save_preprocessed PNG</td>
<td style="text-align:right">0.07s</td>
</tr>
<tr>
<td>save PLY</td>
<td style="text-align:right">1.01s</td>
</tr>
<tr>
<td>save SPLAT</td>
<td style="text-align:right">0.22s</td>
</tr>
<tr>
<td>Process wall</td>
<td style="text-align:right">43.13s</td>
</tr>
<tr>
<td>Max RSS</td>
<td style="text-align:right">5,578,752 KB</td>
</tr>
<tr>
<td>CUDA max allocated</td>
<td style="text-align:right">4,929,969,152 bytes</td>
</tr>
<tr>
<td>CUDA max reserved</td>
<td style="text-align:right">7,000,293,376 bytes</td>
</tr>
<tr>
<td>GPU memory max</td>
<td style="text-align:right">7,013 MiB</td>
</tr>
<tr>
<td>GPU util max</td>
<td style="text-align:right">100%</td>
</tr>
</tbody>
</table>
<p dir="auto">輸出檔：</p>
<pre><code class="language-text">outputs/ai-benchmarks/triposplat-robot/robot_262144.ply
outputs/ai-benchmarks/triposplat-robot/robot_262144.splat
outputs/ai-benchmarks/triposplat-robot/preprocessed_image.png
</code></pre>
<p dir="auto">第一次 TripoSplat 推論其實有成功，但在存 WebP 時 PIL plugin 出錯；後來改成 PNG 後重跑，這一輪才列為有效 artifact run。</p>
<h2>PyTorch / Transformers</h2>
<p dir="auto">這一組用來看基本 CUDA/PyTorch 算力和小型 Transformers 推論。它不是採購的唯一依據，但可以確認環境沒有明顯問題。</p>
<h3>Synthetic kernels</h3>
<table class="table table-bordered table-striped">
<thead>
<tr>
<th>測試</th>
<th>Shape / iterations</th>
<th style="text-align:right">結果</th>
</tr>
</thead>
<tbody>
<tr>
<td>matmul fp32, TF32 off</td>
<td>4096 x 4096, 12 iters</td>
<td style="text-align:right">35.54 TFLOPS</td>
</tr>
<tr>
<td>matmul TF32</td>
<td>8192 x 8192, 20 iters</td>
<td style="text-align:right">54.22 TFLOPS</td>
</tr>
<tr>
<td>matmul fp16</td>
<td>8192 x 8192, 40 iters</td>
<td style="text-align:right">109.57 TFLOPS</td>
</tr>
<tr>
<td>matmul bf16</td>
<td>8192 x 8192, 40 iters</td>
<td style="text-align:right">109.75 TFLOPS</td>
</tr>
<tr>
<td>conv2d fp16</td>
<td>batch 16, 64-&gt;128, 224x224, 80 iters</td>
<td style="text-align:right">58.69 TFLOPS</td>
</tr>
</tbody>
</table>
<h3>Transformers generation</h3>
<p dir="auto">模型：<code>Qwen/Qwen2.5-1.5B-Instruct</code>，BF16。</p>
<table class="table table-bordered table-striped">
<thead>
<tr>
<th>階段</th>
<th style="text-align:right">時間 / 結果</th>
</tr>
</thead>
<tbody>
<tr>
<td>tokenizer_from_pretrained</td>
<td style="text-align:right">7.43s</td>
</tr>
<tr>
<td>model_from_pretrained</td>
<td style="text-align:right">51.89s</td>
</tr>
<tr>
<td>model_cuda</td>
<td style="text-align:right">0.80s</td>
</tr>
<tr>
<td>prefill forward, 29 input tokens</td>
<td style="text-align:right">0.26s</td>
</tr>
<tr>
<td>generate batch 1, 128 new tokens</td>
<td style="text-align:right">2.93s</td>
</tr>
<tr>
<td>batch 1 throughput</td>
<td style="text-align:right">43.64 tok/s</td>
</tr>
<tr>
<td>generate batch 4, 64 each</td>
<td style="text-align:right">1.49s</td>
</tr>
<tr>
<td>batch 4 throughput</td>
<td style="text-align:right">171.20 tok/s</td>
</tr>
<tr>
<td>total_llm_model</td>
<td style="text-align:right">68.21s</td>
</tr>
<tr>
<td>process wall</td>
<td style="text-align:right">1:13.47</td>
</tr>
<tr>
<td>GPU memory max</td>
<td style="text-align:right">3,493 MiB</td>
</tr>
</tbody>
</table>
<h2>llama.cpp / llama-bench</h2>
<p dir="auto">這是最能看出 32GB VRAM 價值的一組測試。所有測試都用 CUDA backend，全 GPU offload：</p>
<pre><code class="language-text">llama-bench -ngl -1 -fa auto
</code></pre>
<h3>1.5B / 7B 基準線</h3>
<table class="table table-bordered table-striped">
<thead>
<tr>
<th>模型</th>
<th>Quant</th>
<th style="text-align:right">Prompt</th>
<th style="text-align:right">Prefill</th>
<th style="text-align:right">Generation</th>
</tr>
</thead>
<tbody>
<tr>
<td>Qwen2.5 1.5B</td>
<td>Q4_K_M</td>
<td style="text-align:right">512</td>
<td style="text-align:right">26,438 tok/s</td>
<td style="text-align:right">441.9 tok/s</td>
</tr>
<tr>
<td>Qwen2.5 1.5B</td>
<td>Q4_K_M</td>
<td style="text-align:right">2048</td>
<td style="text-align:right">29,389 tok/s</td>
<td style="text-align:right">440.9 tok/s</td>
</tr>
<tr>
<td>Qwen2.5 7B</td>
<td>Q4_K_M</td>
<td style="text-align:right">512</td>
<td style="text-align:right">8,400 tok/s</td>
<td style="text-align:right">142.2 tok/s</td>
</tr>
<tr>
<td>Qwen2.5 7B</td>
<td>Q4_K_M</td>
<td style="text-align:right">2048</td>
<td style="text-align:right">8,431 tok/s</td>
<td style="text-align:right">141.8 tok/s</td>
</tr>
</tbody>
</table>
<h3>14B / 32B 大模型測試</h3>
<table class="table table-bordered table-striped">
<thead>
<tr>
<th>模型</th>
<th>Quant</th>
<th style="text-align:right">Prompt</th>
<th style="text-align:right">Prefill</th>
<th style="text-align:right">Generation</th>
<th>結果</th>
</tr>
</thead>
<tbody>
<tr>
<td>Qwen2.5 14B</td>
<td>Q4_K_M</td>
<td style="text-align:right">512</td>
<td style="text-align:right">4,299 tok/s</td>
<td style="text-align:right">73.6 tok/s</td>
<td>成功</td>
</tr>
<tr>
<td>Qwen2.5 14B</td>
<td>Q4_K_M</td>
<td style="text-align:right">2048</td>
<td style="text-align:right">4,234 tok/s</td>
<td style="text-align:right">73.6 tok/s</td>
<td>成功</td>
</tr>
<tr>
<td>Qwen2.5 14B</td>
<td>Q4_K_M</td>
<td style="text-align:right">8192</td>
<td style="text-align:right">3,615 tok/s</td>
<td style="text-align:right">73.6 tok/s</td>
<td>成功</td>
</tr>
<tr>
<td>Qwen2.5 32B</td>
<td>Q4_K_M</td>
<td style="text-align:right">512</td>
<td style="text-align:right">1,977 tok/s</td>
<td style="text-align:right">34.0 tok/s</td>
<td>成功</td>
</tr>
<tr>
<td>Qwen2.5 32B</td>
<td>Q4_K_M</td>
<td style="text-align:right">2048</td>
<td style="text-align:right">1,942 tok/s</td>
<td style="text-align:right">34.0 tok/s</td>
<td>成功</td>
</tr>
<tr>
<td>Qwen2.5 32B</td>
<td>Q4_K_M</td>
<td style="text-align:right">8192</td>
<td style="text-align:right">1,752 tok/s</td>
<td style="text-align:right">34.0 tok/s</td>
<td>成功</td>
</tr>
<tr>
<td>Qwen2.5 32B</td>
<td>Q4_K_M</td>
<td style="text-align:right">16384</td>
<td style="text-align:right">1,557 tok/s</td>
<td style="text-align:right">34.0 tok/s</td>
<td>成功</td>
</tr>
<tr>
<td>Qwen2.5 32B</td>
<td>Q4_K_M</td>
<td style="text-align:right">32768</td>
<td style="text-align:right">1,247 tok/s</td>
<td style="text-align:right">33.9 tok/s</td>
<td>成功</td>
</tr>
<tr>
<td>Qwen2.5 32B</td>
<td>Q5_K_M</td>
<td style="text-align:right">2048</td>
<td style="text-align:right">1,886 tok/s</td>
<td style="text-align:right">29.4 tok/s</td>
<td>成功</td>
</tr>
<tr>
<td>Qwen2.5 32B</td>
<td>Q5_K_M</td>
<td style="text-align:right">8192</td>
<td style="text-align:right">1,706 tok/s</td>
<td style="text-align:right">29.4 tok/s</td>
<td>成功</td>
</tr>
</tbody>
</table>
<p dir="auto">這裡可以看到 4080S 32GB 的意義。32B Q4 能跑到 32K prompt，32B Q5 也能跑 8192 prompt。這不是「跑得很勉強」的結果；測試過程中沒有 OOM，也沒有需要改成 CPU offload。</p>
<h2>CUDA samples / memory bandwidth</h2>
<p dir="auto">這組是硬體層面的健康檢查，不當成主要採購依據。CUDA samples 本來就不是嚴格跑分工具，但它能幫忙確認 CUDA、PCIe、device copy 沒有明顯異常。</p>
<table class="table table-bordered table-striped">
<thead>
<tr>
<th>測試</th>
<th style="text-align:right">結果</th>
</tr>
</thead>
<tbody>
<tr>
<td>CUDA sample matrixMul</td>
<td style="text-align:right">3298.69 GFLOP/s</td>
</tr>
<tr>
<td>P2P self-copy bandwidth</td>
<td style="text-align:right">約 638-641 GB/s</td>
</tr>
<tr>
<td>GPU latency</td>
<td style="text-align:right">1.34-1.39 us</td>
</tr>
<tr>
<td>CPU latency</td>
<td style="text-align:right">3.42 us</td>
</tr>
<tr>
<td>PyTorch H2D, 4GiB pinned</td>
<td style="text-align:right">23.41 GB/s</td>
</tr>
<tr>
<td>PyTorch D2H, 4GiB pinned</td>
<td style="text-align:right">24.51 GB/s</td>
</tr>
<tr>
<td>PyTorch D2D, 4GiB</td>
<td style="text-align:right">321.47 GB/s</td>
</tr>
</tbody>
</table>
]]></description><link>https://lcz.me/topic/478/4080s-32g-魔改版-vast.ai-租借心得</link><generator>RSS for Node</generator><lastBuildDate>Thu, 11 Jun 2026 13:58:16 GMT</lastBuildDate><atom:link href="https://lcz.me/topic/478.rss" rel="self" type="application/rss+xml"/><pubDate>Mon, 08 Jun 2026 14:44:56 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to 4080S 32G 魔改版 vast.ai 租借心得 on Thu, 11 Jun 2026 04:11:20 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/xiaote" aria-label="Profile: Xiaote">@<bdi>Xiaote</bdi></a> 感谢老哥中肯建议，用MSIAfterburner锁住了80%功耗，温度和噪音明显下降，性能损失都能接受。</p>
]]></description><link>https://lcz.me/post/6244</link><guid isPermaLink="true">https://lcz.me/post/6244</guid><dc:creator><![CDATA[lcc168]]></dc:creator><pubDate>Thu, 11 Jun 2026 04:11:20 GMT</pubDate></item><item><title><![CDATA[Reply to 4080S 32G 魔改版 vast.ai 租借心得 on Thu, 11 Jun 2026 01:15:50 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/lcc168" aria-label="Profile: lcc168">@<bdi>lcc168</bdi></a> 关于4080S魔改卡锁功率的问题，其实是有必要的，而且操作很简单，不会损失太多性能。</p>
<p dir="auto">几点建议：</p>
<ol>
<li>
<p dir="auto">用 nvidia-smi 锁功率（不用重启）：<br />
sudo nvidia-smi -pl 250<br />
这个命令把功耗墙从320W降到250W，热点温度能降8-10度，性能损失大概10-15%。对于魔改卡来说，这个平衡点比较合理。</p>
</li>
<li>
<p dir="auto">热点90度确实偏高了。一般来说，GDDR6X显存的长期安全温度是95度以下，但核心热点长期保持在90度以上会加速硅脂老化和PCB形变。降到80-85度能明显延长寿命。</p>
</li>
<li>
<p dir="auto">魔改卡的显存用的是非官方渠道的颗粒，散热设计也不如原厂卡，所以温度控制比原厂卡更重要。建议先锁250W用一周，观察热点温度和性能表现，如果温度能稳定在85度以下就OK。</p>
</li>
<li>
<p dir="auto">另外确认一下散热：魔改卡很多时候散热器是后配的，检查一下散热垫有没有贴好、风扇曲线有没有调过。用 nvidia-smi -q -d TEMPERATURE 可以看各传感器温度。</p>
</li>
</ol>
<p dir="auto">简单说：320W跑90度热点有点冒险，锁到250-280W，热点控制在85度以下，这张卡跑两年以上没问题。</p>
]]></description><link>https://lcz.me/post/6204</link><guid isPermaLink="true">https://lcz.me/post/6204</guid><dc:creator><![CDATA[Xiaote]]></dc:creator><pubDate>Thu, 11 Jun 2026 01:15:50 GMT</pubDate></item><item><title><![CDATA[Reply to 4080S 32G 魔改版 vast.ai 租借心得 on Wed, 10 Jun 2026 15:38:00 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/lcc168" aria-label="Profile: lcc168">@<bdi>lcc168</bdi></a> 我是覺得都魔改卡了，在有限的壽命裡面盡可能地工作就好了，房間有開冷氣或做好通風可能會更重要，如果要鎖定功率的話我記得有人之前有分享過你可以翻一下</p>
]]></description><link>https://lcz.me/post/6166</link><guid isPermaLink="true">https://lcz.me/post/6166</guid><dc:creator><![CDATA[CS6]]></dc:creator><pubDate>Wed, 10 Jun 2026 15:38:00 GMT</pubDate></item><item><title><![CDATA[Reply to 4080S 32G 魔改版 vast.ai 租借心得 on Wed, 10 Jun 2026 14:11:24 GMT]]></title><description><![CDATA[<p dir="auto">4080S 这种魔改卡全负荷是320W左右的功耗，热点温度直逼90度，有没有必要锁定功率来延长寿命？</p>
<p dir="auto"><img src="https://upload.lcz.me/uploads/2158a8b9-6b21-44b7-975e-e880b8235af9.jpeg" alt="a03f5675-62f7-46fc-9110-086e5941d263-image.jpeg" class=" img-fluid img-markdown" /></p>
]]></description><link>https://lcz.me/post/6150</link><guid isPermaLink="true">https://lcz.me/post/6150</guid><dc:creator><![CDATA[lcc168]]></dc:creator><pubDate>Wed, 10 Jun 2026 14:11:24 GMT</pubDate></item><item><title><![CDATA[Reply to 4080S 32G 魔改版 vast.ai 租借心得 on Mon, 08 Jun 2026 14:59:35 GMT]]></title><description><![CDATA[<p dir="auto">感謝分享</p>
<p dir="auto">想買的同好們也推薦研究FP8量化的模型</p>
<p dir="auto">Ada架構支持FP8, 模型權重會比普通普通BF16約少一半, 比起INT4/INT8 + Autoround, 精度保持會好點外加硬件加速</p>
]]></description><link>https://lcz.me/post/5783</link><guid isPermaLink="true">https://lcz.me/post/5783</guid><dc:creator><![CDATA[566656661]]></dc:creator><pubDate>Mon, 08 Jun 2026 14:59:35 GMT</pubDate></item><item><title><![CDATA[Reply to 4080S 32G 魔改版 vast.ai 租借心得 on Mon, 08 Jun 2026 15:02:23 GMT]]></title><description><![CDATA[<p dir="auto">這裡有 <a class="plugin-mentions-user plugin-mentions-a" href="/user/loulan" aria-label="Profile: loulan">@<bdi>loulan</bdi></a> 的噪音展示<br />
<a href="https://lcz.me/post/5729">https://lcz.me/post/5729</a></p>
<h1><strong>居家勸退</strong></h1>
]]></description><link>https://lcz.me/post/5782</link><guid isPermaLink="true">https://lcz.me/post/5782</guid><dc:creator><![CDATA[CS6]]></dc:creator><pubDate>Mon, 08 Jun 2026 15:02:23 GMT</pubDate></item><item><title><![CDATA[Reply to 4080S 32G 魔改版 vast.ai 租借心得 on Mon, 08 Jun 2026 14:54:10 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/cs6" aria-label="Profile: CS6">@<bdi>CS6</bdi></a> 非常有意义。此帖让我详细了解了4080S。</p>
]]></description><link>https://lcz.me/post/5780</link><guid isPermaLink="true">https://lcz.me/post/5780</guid><dc:creator><![CDATA[williamlouis]]></dc:creator><pubDate>Mon, 08 Jun 2026 14:54:10 GMT</pubDate></item><item><title><![CDATA[Reply to 4080S 32G 魔改版 vast.ai 租借心得 on Mon, 08 Jun 2026 14:53:41 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/terry" aria-label="Profile: terry">@<bdi>terry</bdi></a>  我是租算力啦，這些資訊都是 <a href="http://vast.ai" rel="nofollow ugc">vast.ai</a>  上讀出來的，有沒有假卡我也看不出來。<br />
<s>結果我反手就買了3090</s></p>
<p dir="auto"><img src="https://upload.lcz.me/uploads/4ae6929f-989f-4ee6-8687-0c1541d23001.jpeg" alt="b34c186b-33c4-433c-a075-bb96ba1e03b7-image.jpeg" class=" img-fluid img-markdown" /></p>
]]></description><link>https://lcz.me/post/5778</link><guid isPermaLink="true">https://lcz.me/post/5778</guid><dc:creator><![CDATA[CS6]]></dc:creator><pubDate>Mon, 08 Jun 2026 14:53:41 GMT</pubDate></item><item><title><![CDATA[Reply to 4080S 32G 魔改版 vast.ai 租借心得 on Mon, 08 Jun 2026 16:25:33 GMT]]></title><description><![CDATA[<p dir="auto">非常好，补充下跑的截图，给予置顶。如果有人想要发广告贴，参照这片文章。你给论坛带来价值，就可以突破一点规则限制。但是发帖需要截图真实运营图片，防止误导用户，我们无法准确判断数据真实性，要以截图作证。最好把机器整体配置也截图，方便大家参考。</p>
]]></description><link>https://lcz.me/post/5776</link><guid isPermaLink="true">https://lcz.me/post/5776</guid><dc:creator><![CDATA[terry]]></dc:creator><pubDate>Mon, 08 Jun 2026 16:25:33 GMT</pubDate></item></channel></rss>