看目前這社區越來越多人買7900XTX了,大家為了一個爽度token無限發與反應速度,這幾天折騰的過程分享給大家(win11+vulkan & ubuntu +rocm)
-
@agi @chia-an-yang 两位老师我跑通了,但是我用hermes的时候工具调用感觉卡了额,我的7900xtx在疯狂的生成,但是hermes却卡住了。请问两位遇到过类似的问题吗?
llama.cpp 输出:65.27 t/s, tg_3s = 55.86 t/s
36.30.567.224 I slot print_timing: id 0 | task 8648 | n_decoded = 62072, tg = 65.24 t/s, tg_3s = 55.84 t/s
36.33.579.807 I slot print_timing: id 0 | task 8648 | n_decoded = 62240, tg = 65.21 t/s, tg_3s = 55.77 t/s
36.36.592.579 I slot print_timing: id 0 | task 8648 | n_decoded = 62408, tg = 65.18 t/s, tg_3s = 55.76 t/s
36.39.607.362 I slot print_timing: id 0 | task 8648 | n_decoded = 62576, tg = 65.15 t/s, tg_3s = 55.73 t/s
36.42.629.501 I slot print_timing: id 0 | task 8648 | n_decoded = 62744, tg = 65.12 t/s, tg_3s = 55.59 t/s
36.45.651.508 I slot print_timing: id 0 | task 8648 | n_decoded = 62912, tg = 65.09 t/s, tg_3s = 55.59 t/s
36.48.669.380 I slot print_timing: id 0 | task 8648 | n_decoded = 63080, tg = 65.06 t/s, tg_3s = 55.67 t/s
36.51.697.721 I slot print_timing: id 0 | task 8648 | n_decoded = 63247, tg = 65.03 t/s, tg_3s = 55.15 t/s
36.54.730.154 I slot print_timing: id 0 | task 8648 | n_decoded = 63415, tg = 65.00 t/s, tg_3s = 55.40 t/s
36.57.762.852 I slot print_timing: id 0 | task 8648 | n_decoded = 63583, tg = 64.97 t/s, tg_3s = 55.40 t/s
37.00.794.845 I slot print_timing: id 0 | task 8648 | n_decoded = 63751, tg = 64.94 t/s, tg_3s = 55.41 t/shermes输出:
c09f0fd3-2890-42e1-838f-8e36a2ab527b-bd93db497055bc01fe89b39dc4f1a308915fe680.rtfd
preparing browser_navigate...
navigate
search.yahoo.com
14.2s- Hermes
Let me try a more targeted search.
A
preparing browser_navigate... navigate www.google.com
3.35
Response truncated (finish_reason='length')
preparing browser_navigate...
navigate duckduckgo.com 20.5s preparing browser_scroll...
↓
scroll
down 0.2s
LOI
preparing browser_snapshot...
snapshot compact 0.2s preparing browser_navigate... navigate duckduckgo.com 1.5s
(>** cogitating...
model hit max output toke - qwen3.6-27b 30,9K/131.1K [
1]24% |36m |020
- Hermes
-
@terry 老师,但是它在的decode的时候生成将近60000个字符之后系统强制停止的。Response truncated (finish_reason='length'),感觉它不知道啥时候停止,最后hermes把结果截断了。
-
@terry 我就问了他一个问题:“吉利领克08车机如何通过adb安装应用?“,deepseek v4 flash 调用的时候感觉也没那么时间长。其实开始的时候挺顺利的,就是到navigate google.com 这个工具调用的时候生成了60000多个字符才被hermes强制截断结束。感觉挺奇怪的。不过这个vulkan比rocm快好多。这个挺好的。
@nami-ryuu 你先用deepseek v4 flash幫你把hermes搜索工具設定好,跟把soul跟memory也寫好搜索的時候要跑哪些工具,避免本地模型調用工具能力不足的地方他會不斷重試跑老半天跑不出來,讓在線雲端api (ds4 flash)幫你把本地的工作流都設計好,之後你就可以爽用本地端的hermes agent,