抡锤者

刘

早买早享受把，目前这个卡就是在6000以内。有保修最好！

刘

我使用的官版的llama.cpp q8_0 kv缓存目前tqs在40左右，还没使用dflash、mtp这些。想等成熟一些

刘

@terry 好的，谢谢。我试一下

刘

按照这个趋势，后续的模型估计越来越给力，玩本地AI的估计有福了

刘

@terry 哥，我的启动参数如下：
/root/llama.cpp/build/bin/llama-server -m /data/models/gguf/Qwen3.6-27B-UD-Q4_K_XL.gguf --mmproj /data/models/gguf/Qwen3.6-27B-mmproj-F16.gguf --mmproj-offload --alias qwen36-27B-Q4 --jinja -ngl 999 -c 128000 -fa on --cache-ram 16384 --cache-type-k q8_0 --cache-type-v q8_0 -np 1 --sampling-seq k --top-k 1 --host 0.0.0.0 --port 11434 --reasoning on --reasoning-format deepseek --reasoning-budget 512

刘

我目前使用rtx3090 跑qwen3.6 27B Q4量化，给hermes用基本可以的，就是有时候偶发工具调用死循环，我已经在hermes的人设内容限制很死了，概率降低了很多，但是偶尔还是会，我感觉是模型问题了。

抡锤者

刘海彬

帖子