跳转至内容
  • 版块
  • 最新
  • 标签
  • 热门
  • 用户
  • 群组
皮肤
  • 浅色
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • 深色
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • 默认(不使用皮肤)
  • 不使用皮肤
折叠
品牌标识

抡锤者

  1. 主页
  2. LLM讨论区
  3. llama.cpp+qwen3.6-27b 初步测试

llama.cpp+qwen3.6-27b 初步测试

已定时 已固定 已锁定 已移动 LLM讨论区
amd7900xtx
25 帖子 9 发布者 485 浏览
  • 从旧到新
  • 从新到旧
  • 最多赞同
回复
  • 在新帖中回复
登录后回复
此主题已被删除。只有拥有主题管理权限的用户可以查看。
  • B blackjack

    @terry 说:

    @blackjack 相信你的测试个结果,但我实际跑hermes过程中,Q4_0确实拉胯,跑OpenClaw更是如此,就是经常会陷入死循环。

    qwen的工具调用极弱,让他专门做过patch工具测试,分不清工具名称patch和参数名称path。这个就是模型能力问题,再怎么提示也白扯,只能在hermes里把参数名称path改成路径等其他严重不让他花眼的文字,还有各种对他人性化的反馈。死循环基本就是掉入到各种工具调用的汪洋大海中了,你可以开个ai让他研究一下日志

    terryT 离线
    terryT 离线
    terry
    编写于 最后由 编辑
    #16

    @blackjack 没深入研究,我用Q8 kv就没这个问题了。

    油管:https://www.youtube.com/@抡锤者

    1 条回复 最后回复
    0
    • L 离线
      L 离线
      laobenxiong
      编写于 最后由 编辑
      #17

      昨天碰到一个 oom, 好像是 host ram 的 oom, 没搞懂为啥它使用那么多 system ram...

      [71766.725058] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 4.2025.05-1~bpo12+1 03/12/2026
      [71766.725059] Call Trace:
      [71766.725073]  <TASK>
      [71766.725076]  dump_stack_lvl+0x5d/0x80
      [71766.725082]  dump_header+0x43/0x1aa
      [71766.725085]  oom_kill_process.cold+0xa/0xb2
      [71766.725088]  out_of_memory+0x217/0x4b0
      [71766.725091]  __alloc_pages_slowpath.constprop.0+0xc3b/0xdd0
      [71766.725098]  __alloc_frozen_pages_noprof+0x2cd/0x320
      [71766.725103]  alloc_pages_mpol+0x7d/0x180
      [71766.725107]  folio_alloc_noprof+0x5d/0xe0
      [71766.725110]  __filemap_get_folio+0x1dd/0x330
      [71766.725112]  filemap_fault+0x10c/0x12f0
      [71766.725116]  __do_fault+0x30/0x180
      [71766.725119]  do_fault+0x310/0x540
      [71766.725122]  __handle_mm_fault+0x8ee/0xf20
      [71766.725124]  ? srso_alias_return_thunk+0x5/0xfbef5
      [71766.725129]  handle_mm_fault+0xec/0x2e0
      [71766.725132]  do_user_addr_fault+0x2c3/0x7f0
      [71766.725135]  exc_page_fault+0x74/0x180
      [71766.725139]  asm_exc_page_fault+0x26/0x30
      [71766.725140] RIP: 0033:0x9bab0a
      [71766.725161] Code: Unable to access opcode bytes at 0x9baae0.
      [71766.725162] RSP: 002b:000000c00093cc60 EFLAGS: 00010216
      [71766.725164] RAX: 0000000001b35410 RBX: 0000000001b77928 RCX: 0000000001b77928
      [71766.725165] RDX: 0000000000a244e0 RSI: 00000000009baae0 RDI: 000000c000aaa6e0
      [71766.725166] RBP: 000000c00093cca0 R08: 0000000000000040 R09: 0000000000000082
      [71766.725167] R10: 00007f88af626fa8 R11: 00000000000000d0 R12: 0000000000000006
      [71766.725168] R13: 000000c001548410 R14: 000000c000171880 R15: ffffffffffffffff
      [71766.725172]  </TASK>
      [71766.725173] Mem-Info:
      [71766.725181] active_anon:3652772 inactive_anon:3225935 isolated_anon:0
                      active_file:74 inactive_file:592 isolated_file:0
                      unevictable:2004 dirty:0 writeback:0
                      slab_reclaimable:16967 slab_unreclaimable:49972
                      mapped:2076 shmem:4443 pagetables:27449
                      sec_pagetables:0 bounce:0
                      kernel_misc_reclaimable:0
                      free:57534 free_pcp:0 free_cma:0
      [71766.725185] Node 0 active_anon:14611088kB inactive_anon:12903740kB active_file:296kB inactive_file:2004kB unevictable:8016kB isolated(anon):0kB isolated(file):0kB mapped:8304kB dirty:0kB writeback:0kB shmem:17772kB shmem_thp:0kB shmem_pmdmapped:0
      kB anon_thp:17055744kB kernel_stack:12720kB pagetables:109796kB sec_pagetables:0kB all_unreclaimable? no Balloon:0kB
      [71766.725189] Node 0 DMA free:11264kB boost:0kB min:28kB low:40kB high:52kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15864kB managed:153
      60kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
      [71766.725193] lowmem_reserve[]: 0 1948 32063 32063 32063
      [71766.725198] Node 0 DMA32 free:124216kB boost:0kB min:3760kB low:5576kB high:7392kB reserved_highatomic:0KB free_highatomic:0KB active_anon:1852244kB inactive_anon:14144kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:
      2061152kB managed:1995156kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
      [71766.725202] lowmem_reserve[]: 0 0 30114 30114 30114
      [71766.725207] Node 0 Normal free:94656kB boost:30720kB min:94508kB low:125332kB high:156156kB reserved_highatomic:0KB free_highatomic:0KB active_anon:12758844kB inactive_anon:12889596kB active_file:660kB inactive_file:2004kB unevictable:8016kB writ
      epending:0kB present:31457280kB managed:30837388kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
      [71766.725210] lowmem_reserve[]: 0 0 0 0 0
      [71766.725215] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB
      [71766.725228] Node 0 DMA32: 0*4kB 1*8kB (M) 3*16kB (UM) 8*32kB (UM) 10*64kB (UM) 5*128kB (UM) 3*256kB (UM) 4*512kB (UM) 1*1024kB (U) 2*2048kB (M) 28*4096kB (M) = 124216kB
      [71766.725245] Node 0 Normal: 635*4kB (UME) 484*8kB (UME) 664*16kB (UME) 503*32kB (UME) 315*64kB (UME) 196*128kB (UME) 59*256kB (UME) 3*512kB (ME) 0*1024kB 0*2048kB 0*4096kB = 95020kB
      [71766.725261] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
      [71766.725263] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
      [71766.725264] 12002 total pagecache pages
      [71766.725265] 6815 pages in swap cache
      [71766.725266] Free swap  = 72kB
      [71766.725267] Total swap = 8496124kB
      [71766.725268] 8383574 pages RAM
      [71766.725269] 0 pages HighMem/MovableOnly
      [71766.725269] 171598 pages reserved
      [71766.725270] 0 pages hwpoisoned
      [71766.725271] Tasks state (memory values in pages):
      [71766.725272] [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
      [71766.725279] [    513]     0   513    16897      240       32      208         0   135168      256          -250 systemd-journal
      [71766.725283] [    535]   990   535    23023      185        0      185         0    81920      256             0 systemd-timesyn
      [71766.725285] [    541]     0   541     9456      707      544      163         0    98304      384         -1000 systemd-udevd
      [71766.725288] [    819]     0   819    77254      161       64       97         0   106496       96             0 accounts-daemon
      [71766.725290] [    821]   105   821     1564      239        0      239         0    53248       64             0 avahi-daemon
      [71766.725292] [    824]     0   824     1716      185       64      121         0    49152        0             0 cron
      [71766.725294] [    826]   989   826     2549      457      256      201         0    61440      256          -900 dbus-daemon
      [71766.725297] [    839]     0   839    20199      275       96      179         0    61440        0             0 irqbalance
      [71766.725299] [    845]   987   845    95842      717      569      148         0   114688      224             0 polkitd
      [71766.725301] [    846]     0   846    54923      493      384      109         0    77824        0             0 rsyslogd
      [71766.725303] [    848]     0   848    76962      254      128      126         0   102400        0             0 switcheroo-cont
      [71766.725305] [    861]     0   861     4761      267      128      139         0    73728      192             0 systemd-logind
      [71766.725308] [    862]     0   862    48059      131        0      131         0   126976      480             0 touchegg
      [71766.725310] [    865]     0   865   117523      709      544      165         0   151552        0             0 udisksd
      [71766.725312] [    881]   105   881     1518      151       37      114         0    53248       32             0 avahi-daemon
      [71766.725314] [    964]     0   964   102609      882      704      178         0   167936        0             0 NetworkManager
      [71766.725317] [    969]     0   969     4383      353      224      129         0    77824        0             0 wpa_supplicant
      [71766.725319] [   1000]     0  1000    97731      195       64      131         0   126976      384             0 ModemManager
      [71766.725321] [   1045]     0  1045    95236      179        0      179         0   110592      192             0 lightdm
      [71766.725323] [   1048]     0  1048     2944      447      256      191         0    61440       32         -1000 sshd
      [71766.725325] [   1082]     0  1082   654404     6572     5217      178      1177   782336     9472             0 Xorg
      [71766.725327] [   1085]     0  1085     2042      147       32      115         0    57344        0             0 agetty
      [71766.725329] [   1228]     0  1228    10995      185       40      145         0    86016       96             0 master
      [71766.725332] [   1230]   116  1230    11129      208       32      176         0    77824      128             0 qmgr
      [71766.725334] [   1246]     0  1246    43208      161        0      161         0    90112      256             0 lightdm
      [71766.725336] [   1254]  1000  1254     5813      822      640      182         0    98304       32           100 systemd
      [71766.725338] [   1256]  1000  1256     6387      531      421      110         0    81920      128           100 (sd-pam)
      [71766.725340] [   1278]  1000  1278     2343      644      480      164         0    53248        0           200 dbus-daemon
      [71766.725342] [   1279]  1000  1279    25582     1141      960      181         0    98304        0           200 pipewire
      [71766.725345] [   1280]  1000  1280    21187      311      128      183         0    77824        0           200 pipewire
      [71766.725347] [   1281]  1000  1281   119867     1068      896      172         0   155648        0           200 wireplumber
      [71766.725349] [   1282]  1000  1282    41570      791      561      230         0    90112        0           200 pipewire-pulse
      [71766.725351] [   1283]  1000  1283   119267      162        0      162         0   151552      544             0 cinnamon-sessio
      [71766.725353] [   1297]  1000  1297     1810      183       32      151         0    53248        0           200 mpris-proxy
      [71766.725355] [   1300]   114  1300     5369      193       32      161         0    61440        0             0 rtkit-daemon
      [71766.725358] [   1361]  1000  1361     2637      153       71       82         0    49152      192             0 ssh-agent
      [71766.725360] [   1372]  1000  1372    61760      178       21      157         0   225280     2816             0 fcitx
      [71766.725362] [   1378]  1000  1378     2054      139       32      107         0    57344       64             0 dbus-daemon
      [71766.725364] [   1382]  1000  1382     1279      117        3      114         0    49152       32             0 fcitx-dbus-watc
      [71766.725367] [   1395]  1000  1395    45784      376      224      152         0   102400        0           200 gnome-keyring-d
      [71766.725369] [   1404]  1000  1404   191033      682      517      165         0   462848     3232             0 csd-media-keys
      [71766.725371] [   1407]  1000  1407    76111      218       32      186         0    98304       96             0 csd-screensaver
      [71766.725373] [   1408]  1000  1408    79234      246       64      182         0   114688      256             0 csd-print-notif
      [71766.725375] [   1411]  1000  1411   154236      268        0      268         0   438272     3648             0 csd-automount
      [71766.725377] [   1416]  1000  1416   136010      270        1      269         0   425984     3872             0 csd-wacom
      [71766.725379] [   1419]  1000  1419   176584      726      530      196         0   458752     3104             0 csd-color
      [71766.725381] [   1421]  1000  1421   119372     1360     1084      276         0   421888     2496             0 csd-housekeepin
      [71766.725384] [   1422]  1000  1422   119717      724      544      180         0   421888     3168             0 csd-xsettings
      [71766.725387] [   1425]  1000  1425   127752     1320     1062      226        32   446464     2624             0 csd-background
      [71766.725391] [   1427]  1000  1427   100813      282       97      185         0   413696     3520             0 csd-clipboard
      [71766.725394] [   1428]  1000  1428   175014     1286     1073      213         0   454656     2560             0 csd-power
      [71766.725396] [   1430]  1000  1430    60105      130       32       98         0    86016      128             0 csd-a11y-settin
      [71766.725398] [   1431]  1000  1431    59835      114        0      114         0    81920      160             0 csd-settings-re
      [71766.725400] [   1432]  1000  1432   154174      248        0      248         0   434176     3584             0 csd-keyboard
      [71766.725402] [   1435]  1000  1435    95328      128        0      128         0   102400      128             0 at-spi-bus-laun
      [71766.725404] [   1446]  1000  1446    41342      209      128       81         0    69632        0           200 dconf-service
      [71766.725406] [   1447]  1000  1447     2120      172        0      172         0    53248      128             0 dbus-daemon
      [71766.725408] [   1466]  1000  1466    78186      373      160      213         0   110592        0           200 gvfsd
      [71766.725410] [   1487]  1000  1487   118027      287      160      127         0   122880        0           200 gvfsd-fuse
      [71766.725413] [   1510]  1000  1510    42226      171        0      171         0    86016      192             0 at-spi2-registr
      [71766.725415] [   1514]  1000  1514   134662      585      320      265         0   147456        0           200 gvfs-udisks2-vo
      [71766.725417] [   1520]   115  1520    79008      173        0      173         0   122880     1120             0 colord
      [71766.725419] [   1527]     0  1527    79740      509      320      189         0   122880      192             0 upowerd
      [71766.725421] [   1539]  1000  1539    76984      252       32      220         0    98304      128           200 gvfs-mtp-volume
      [71766.725423] [   1552]  1000  1552    97513      145        0      145         0   126976      288           200 gvfs-afc-volume
      [71766.725425] [   1558]  1000  1558    77224      388      128      260         0   102400       32           200 gvfs-gphoto2-vo
      [71766.725428] [   1563]  1000  1563    76958      244       96      148         0   102400        0           200 gvfs-goa-volume
      [71766.725430] [   1568]  1000  1568   129472     1092      928      164         0   233472        0           200 goa-daemon
      [71766.725432] [   1576]  1000  1576   103338      155       32      123         0   167936      448             0 csd-printer
      [71766.725435] [   1582]  1000  1582   112521     2526     2302      224         0   229376     2176             0 cinnamon-launch
      [71766.725437] [   1593]  1000  1593    96929      398      192      206         0   114688        0           200 goa-identity-se
      [71766.725439] [   1609]  1000  1609  1585603    21400    19495      175      1730  1376256     9984             0 cinnamon
      [71766.725441] [   1666]  1000  1666    96417      143        0      143         0   118784      960             0 ibus-daemon
      [71766.725443] [   1673]  1000  1673    42299      171       32      139         0    86016       96             0 ibus-memconf
      [71766.725445] [   1674]  1000  1674    69794      271       27      244         0   167936     3040             0 ibus-extension-
      [71766.725447] [   1676]  1000  1676    44591      237       32      205         0    98304      320             0 ibus-x11
      [71766.725449] [   1681]  1000  1681    77136      234      128      106         0   102400        0           200 ibus-portal
      [71766.725452] [   1693]  1000  1693   102902      252       64      188         0   163840     1280             0 xapp-sn-watcher
      [71766.725454] [   1715]  1000  1715    42298      196       32      164         0    86016       96             0 ibus-engine-sim
      [71766.725456] [   1721]  1000  1721   153619      785      526      227        32   299008     5696             0 nemo-desktop
      [71766.725458] [   1724]  1000  1724   132589     3439     3227      212         0   270336     2816             0 blueman-applet
      [71766.725460] [   1726]  1000  1726    75200      278       64      214         0   204800     3904             0 cinnamon-killer
      [71766.725462] [   1729]  1000  1729   263369      806      525      281         0   450560     4480             0 evolution-alarm
      [71766.725464] [   1770]  1000  1770   375036     2744     2536      208         0   446464     1472           200 evolution-sourc
      [71766.725466] [   1775]  1000  1775   113530      795      608      187         0   180224       32           200 obexd
      [71766.725469] [   1788]  1000  1788   206201     1077      928      149         0   274432        0           200 evolution-addre
      [71766.725498] [   1791]  1000  1791   225026      209        0      209         0   241664      864           200 evolution-calen
      [71766.725501] [   1823]     0  1823   320552    10808    10778       30         0   282624      797             0 netbird
      [71766.725504] [   1824]     0  1824    77293      260      128      132         0    98304        0             0 power-profiles-
      [71766.725506] [   1910]  1000  1910   133521      405      192      213         0   126976        0           200 gvfsd-trash
      [71766.725509] [   2079]  1000  2079   263021     2578     2306      272         0   413696     9664             0 mintUpdate
      [71766.725511] [   2137]  1000  2137    16271      287      115      172         0   172032     4544             0 applet.py
      [71766.725513] [   2145]     0  2145     5552      717      512      205         0    81920       32             0 sshd-session
      [71766.725516] [   2155]  1000  2155     4453      169        0      169         0    81920      480             0 ssh
      [71766.725518] [   2158]  1000  2158     4323      265       64      201         0    69632      352             0 ssh
      [71766.725520] [   2160]  1000  2160   114180      159        0      159         0   102400      352             0 sshfs
      [71766.725523] [   2161]  1000  2161    77314       35       35        0         0    86016       32             0 sshfs
      [71766.725526] [   2169]  1000  2169   145847     5382     5163      219         0   315392     4672             0 mintreport-tray
      [71766.725528] [   2184]  1000  2184   134921     2068     1716      224       128   225280      192           200 gnome-terminal-
      [71766.725531] [   2189]  1000  2189   157663     1011      832      179         0   184320        0           200 xdg-desktop-por
      [71766.725533] [   2194]  1000  2194    77191      257       96      161         0   102400        0           200 xdg-permission-
      [71766.725535] [   2199]  1000  2199   134930      273      128      145         0   139264        0           200 xdg-document-po
      [71766.725538] [   2205]  1000  2205      646       83        0       83         0    49152        0           200 fusermount3
      [71766.725540] [   2210]  1000  2210   102736     1353     1120      233         0   167936      192           200 xdg-desktop-por
      [71766.725543] [   2228]  1000  2228   103011     1398     1248      150         0   172032       64           200 xdg-desktop-por
      [71766.725546] [   2239]  1000  2239     2256      491      384      107         0    61440      160           200 bash
      [71766.725548] [   2285]  1000  2285     5666      788      557      231         0    81920      128             0 sshd-session
      [71766.725550] [   2286]     0  2286     5554      711      512      199         0    90112        0             0 sshd-session
      [71766.725552] [   2288]  1000  2288     2282      165       32      133         0    57344      512             0 bash
      [71766.725554] [   2319]  1000  2319     5595      770      590      180         0    90112        0             0 sshd-session
      [71766.725557] [   2331]  1000  2331      675      117        0      117         0    45056        0             0 sftp-server
      [71766.725559] [   4065]  1000  4065     4282      188       80      108         0    73728      576             0 tmux: client
      [71766.725561] [   4067]  1000  4067     8892      235       72      163         0   102400     1664             0 tmux: server
      [71766.725563] [   5508]  1000  5508     2323      258      128      130         0    61440      480             0 bash
      [71766.725566] [  38132]  1000 38132     2290      187       64      123         0    57344      512             0 bash
      [71766.725569] [  38178]  1000 38178     5185     1453     1275      178         0    73728        0             0 nvtop
      [71766.725571] [  50374]     0 50374   127230     1314     1115      199         0   266240     5024             0 fwupd
      [71766.725574] [  50951]  1000 50951     1771      206       64      142         0    57344        0             0 bash
      [71766.725576] [ 137326]     0 137326     5246      368      224      144         0    81920      224             0 cupsd
      [71766.725579] [ 137327]     0 137327    48293      395      224      171         0   131072      544             0 cups-browsed
      [71766.725581] [ 416850]     0 416850     5553      758      544      214         0    90112        0             0 sshd-session
      [71766.725584] [ 416920]  1000 416920     5594      714      558      156         0    90112       32             0 sshd-session
      [71766.725586] [ 416921]  1000 416921     2282      562      480       82         0    65536       96             0 bash
      [71766.725588] [ 416960]  1000 416960     4282      280      112      168         0    65536       64             0 tmux: client
      [71766.725591] [ 416961]  1000 416961     2323      454      384       70         0    57344      224             0 bash
      [71766.725593] [ 438401]     0 438401     5552      722      576      146         0    94208        0             0 sshd-session
      [71766.725596] [ 438453]  1000 438453     5633      850      621      229         0    94208       32             0 sshd-session
      [71766.725598] [ 438468]     0 438468     5552      745      512      233         0    81920        0             0 sshd-session
      [71766.725601] [ 438470]  1000 438470     2282      591      480      111         0    53248       96             0 bash
      [71766.725603] [ 438525]  1000 438525     5593      792      621      171         0    81920        0             0 sshd-session
      [71766.725605] [ 438547]  1000 438547      675       81        0       81         0    49152        0             0 sftp-server
      [71766.725607] [ 438608]  1000 438608     4282     1338     1136      202         0    73728        0             0 tmux: client
      [71766.725610] [ 793687]  1000 793687     1771      170       32      138         0    57344       32             0 run-qwen3-vulka
      [71766.725612] [ 793688]  1000 793688 17163172  6781252  6781035      217         0 88559616  1995769             0 llama-server
      [71766.725615] [ 945076]   116 945076    11117      188       32      156         0    77824      128             0 pickup
      [71766.725619] [1047815]  1000 1047815     1395      103        0      103         0    53248        0             0 sleep
      [71766.725621] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/[email protected]/tmux-spawn-ff073762-1455-4c6b-8b1c-b737ee738d0c.scope,task=llama-server,pid=793688,uid=1000
      [71766.725781] Out of memory: Killed process 793688 (llama-server) total-vm:68652688kB, anon-rss:27124140kB, file-rss:868kB, shmem-rss:0kB, UID:1000 pgtables:86484kB oom_score_adj:0
      
      1 条回复 最后回复
      0
      • B blackjack

        @joker_chang 说:

        @rock-shi 那就对了,24G跑128K上下文+MTP资源不够

        我27 q4量化,kv均q8_0量化,上下文128k,MTP, 5090laptop 24GRAM,开thinking,50+tps,快的起飞啊

        J 离线
        J 离线
        joker_chang
        编写于 最后由 编辑
        #18

        @blackjack 我在论坛大神的指点下,也起飞了😄

        llama-server的启动参数

        --reasoning off ^
        --n-gpu-layers -1 ^
        --ctx-size 131072 ^
        --batch-size 2048 ^
        --ubatch-size 1024 ^
        --flash-attn on ^
        --cache-type-k q4_0 ^
        --cache-type-v q4_0 ^
        --spec-type draft-mtp,ngram-mod ^
        --spec-draft-n-max 3 ^
        --spec-ngram-mod-n-max 5 ^
        --spec-ngram-mod-n-min 3 ^
        --temp 0.7 ^
        --parallel 1

        1 条回复 最后回复
        0
        • B blackjack

          @joker_chang 说:

          7900xtx+Ubuntu性能这么好?
          我Windows10+RTX3090Ti,

          --n-gpu-layers 999 ^
          --ctx-size 131072 ^
          --batch-size 2048 ^
          --ubatch-size 1024 ^
          --flash-attn on ^
          --cache-type-k q4_0 ^
          --cache-type-v q4_0 ^
          --cache-type-k-draft q4_0 ^
          --cache-type-v-draft q4_0 ^

          不开MTP跑Qwen3.6 27B只能跑到30tokens/s;
          开MTP变得更慢

          特别是在长上下文时,例如:我让模型分析一个大约128K的md文件,然后就爆了

          你可以查一下编译llama-server的时候,用的mmq还是cuBLAS,或者有没有fallback到cuBLAS。亲测,两者性能差距巨大。

          33077301-ba4c-4f17-8c7f-8bf20a217b19-image.jpeg

          J 离线
          J 离线
          joker_chang
          编写于 最后由 编辑
          #19

          @blackjack 我用的是llama.cpp官方release:llama-b9329-bin-win-cuda-12.4-x64

          之前我自己编译,不知道是什么参数不对,build后的llama-server怎么调参数都只有10tokens/s

          1 条回复 最后回复
          0
          • B blackjack

            @joker_chang 说:

            @rock-shi 那就对了,24G跑128K上下文+MTP资源不够

            我27 q4量化,kv均q8_0量化,上下文128k,MTP, 5090laptop 24GRAM,开thinking,50+tps,快的起飞啊

            ran zR 离线
            ran zR 离线
            ran z
            编写于 最后由 ran z 编辑
            #20

            @blackjack 说:

            @joker_chang 说:

            @rock-shi 那就对了,24G跑128K上下文+MTP资源不够

            我27 q4量化,kv均q8_0量化,上下文128k,MTP, 5090laptop 24GRAM,开thinking,50+tps,快的起飞啊

            厉害!一样的卡,大哥能给个作业抄吗?14900k,32g内存,llama.cpp,感谢!

            ran zR 1 条回复 最后回复
            0
            • ran zR ran z

              @blackjack 说:

              @joker_chang 说:

              @rock-shi 那就对了,24G跑128K上下文+MTP资源不够

              我27 q4量化,kv均q8_0量化,上下文128k,MTP, 5090laptop 24GRAM,开thinking,50+tps,快的起飞啊

              厉害!一样的卡,大哥能给个作业抄吗?14900k,32g内存,llama.cpp,感谢!

              ran zR 离线
              ran zR 离线
              ran z
              编写于 最后由 编辑
              #21
              此主題已被删除!
              1 条回复 最后回复
              0
              • L 离线
                L 离线
                laobenxiong
                编写于 最后由 laobenxiong 编辑
                #22

                关于 hermes 接入 llama-server, 这两天有两个观察:

                1. hermes 新会话的第一条 prompt 大概是不到 20k token, 7900xtx大概需要30~40秒pp, 然后tg. 观察 llama-server 的log, 发现第一条prompt之后紧接着会更一个小 prompt, 这个 prompt 会把前面的 checkpoint (大概每8k个token一个 checkpoint)都冲掉, 这样下一次再接着聊, 前面的~20K prompt 还得重新 pp. 让 hermes 自己检查了一下, 第二个小 prompt 是它发的 title generation request. 为了避免这种情况, 可以禁止 title generation, 或者设一个辅助 aux model来生成 title (比如我让在线的 deepseek-v4-flash 干所有的 aux 工作);
                2. hermes stream mode 下有一个环境变量 HERMES_STREAM_READ_TIMEOUT, 它 控制收到第一个回复token的 timeout, 缺省为120s. 而在7900xtx 下pp花的时间大概是这样的(q5_1/q5_1 cache quant):
                 -  20k:  40s
                 -  40k: 100s
                 -  60k: 170s
                 -  80k: 260s
                 - 100k: 360s
                 - 120k: 460s
                 - 140k: 580s
                 - 160k: 710s
                

                如果hermes有~50k的prompt, 赶上 llama-server cache checkpoint 刚好都是清空的情况下, pp没有完成之前, hermes就超时了.超时以后 hermes会中断当前请求,再发第二次(共三次). 如果第二次又刚好checkpoint被清空(我碰到过,具体原因还没搞明白),那么三次必然都会失败. 碰到这种情况, 可以把这个环境变量增大一下.

                今天更新llama.cpp到最新, 还碰到了 llama-server RSS 超大给 oom-kill的情况, 以及 llama-server/hermes 进入死循环的情况. 具体还没有时间搞清楚. 目前我回到了 b9305.

                目前的启动脚本如下:

                #!/bin/bash
                
                LLAMA_SERVER=/home/bruin/github/llama.cpp/build-vulkan/bin/llama-server
                
                TOKEN_PER_CKPT=8192    # token per checkpoint, seems llama.cpp hardcoded
                NUM_CKPT=32
                CTX_SIZE=$((TOKEN_PER_CKPT * NUM_CKPT))
                
                ARGS=(
                  --model              /home/bruin/Qwen3.6-27B-Q4_K_M.gguf
                  --mmproj             /opt/gguf-models/unsloth/Qwen3.6-27B-MTP-GGUF/mmproj-BF16.gguf
                  #--chat-template-file /opt/gguf-models/froggeric/Qwen-Fixed-Chat-Templates/chat_template.jinja
                  --spec-type          draft-mtp
                  --spec-draft-n-max   2                       # Max draft tokens
                  --ctx-checkpoints ${NUM_CKPT}                # 8k token per ckpt
                  --ctx-size ${CTX_SIZE}                       # 262144 for 256k context
                  #--swa-full                                   # qwen3.6-27b does not support it
                  --parallel   1                               # Single slot
                  --flash-attn on                              # Enable FlashAttention
                  --n-gpu-layers 999                           # All layers to GPU
                  --cache-type-k q5_1                          # Quantize KV cache keys
                  --cache-type-v q5_1                          # Quantize KV cache values
                  #--fit off                                    #
                  --threads 16                                 # CPU threads helping tg
                  --threads-batch 16                           # CPU threads helping pg
                  --batch-size 2048                            # Batch size
                  --ubatch-size 1024                           # Micro‑batch size
                  --cache-ram 0                                # seems not working
                  --reasoning auto                             # Auto reasoning
                  --reasoning-format deepseek                  # Reasoning format
                  --reasoning-budget 1024                      # Reasoning budget
                  --log-verbosity 4                            # Log verbosity
                  --host 0.0.0.0 --port 8000                   # Listen on all interfaces, port 8000
                  --cont-batching                              # Continuous batching
                  --no-warmup                                  # Skip warmup
                  --no-mmap                                    # Don’t memory‑map model
                  --mlock                                      # Lock model in RAM
                  --jinja                                      # Jinja chat template
                  --metrics                                    # View metrics by accessing http://<ip:port>/metrics
                )
                
                # print the cmdline
                echo "${LLAMA_SERVER}"
                for ((i=0; i<${#ARGS[@]}; i+=2)); do
                  echo "${ARGS[i]} ${ARGS[i+1]}"
                done
                
                # run the cmd
                ${LLAMA_SERVER} "${ARGS[@]}"
                
                John AtoJ 1 条回复 最后回复
                1
                • L laobenxiong

                  关于 hermes 接入 llama-server, 这两天有两个观察:

                  1. hermes 新会话的第一条 prompt 大概是不到 20k token, 7900xtx大概需要30~40秒pp, 然后tg. 观察 llama-server 的log, 发现第一条prompt之后紧接着会更一个小 prompt, 这个 prompt 会把前面的 checkpoint (大概每8k个token一个 checkpoint)都冲掉, 这样下一次再接着聊, 前面的~20K prompt 还得重新 pp. 让 hermes 自己检查了一下, 第二个小 prompt 是它发的 title generation request. 为了避免这种情况, 可以禁止 title generation, 或者设一个辅助 aux model来生成 title (比如我让在线的 deepseek-v4-flash 干所有的 aux 工作);
                  2. hermes stream mode 下有一个环境变量 HERMES_STREAM_READ_TIMEOUT, 它 控制收到第一个回复token的 timeout, 缺省为120s. 而在7900xtx 下pp花的时间大概是这样的(q5_1/q5_1 cache quant):
                   -  20k:  40s
                   -  40k: 100s
                   -  60k: 170s
                   -  80k: 260s
                   - 100k: 360s
                   - 120k: 460s
                   - 140k: 580s
                   - 160k: 710s
                  

                  如果hermes有~50k的prompt, 赶上 llama-server cache checkpoint 刚好都是清空的情况下, pp没有完成之前, hermes就超时了.超时以后 hermes会中断当前请求,再发第二次(共三次). 如果第二次又刚好checkpoint被清空(我碰到过,具体原因还没搞明白),那么三次必然都会失败. 碰到这种情况, 可以把这个环境变量增大一下.

                  今天更新llama.cpp到最新, 还碰到了 llama-server RSS 超大给 oom-kill的情况, 以及 llama-server/hermes 进入死循环的情况. 具体还没有时间搞清楚. 目前我回到了 b9305.

                  目前的启动脚本如下:

                  #!/bin/bash
                  
                  LLAMA_SERVER=/home/bruin/github/llama.cpp/build-vulkan/bin/llama-server
                  
                  TOKEN_PER_CKPT=8192    # token per checkpoint, seems llama.cpp hardcoded
                  NUM_CKPT=32
                  CTX_SIZE=$((TOKEN_PER_CKPT * NUM_CKPT))
                  
                  ARGS=(
                    --model              /home/bruin/Qwen3.6-27B-Q4_K_M.gguf
                    --mmproj             /opt/gguf-models/unsloth/Qwen3.6-27B-MTP-GGUF/mmproj-BF16.gguf
                    #--chat-template-file /opt/gguf-models/froggeric/Qwen-Fixed-Chat-Templates/chat_template.jinja
                    --spec-type          draft-mtp
                    --spec-draft-n-max   2                       # Max draft tokens
                    --ctx-checkpoints ${NUM_CKPT}                # 8k token per ckpt
                    --ctx-size ${CTX_SIZE}                       # 262144 for 256k context
                    #--swa-full                                   # qwen3.6-27b does not support it
                    --parallel   1                               # Single slot
                    --flash-attn on                              # Enable FlashAttention
                    --n-gpu-layers 999                           # All layers to GPU
                    --cache-type-k q5_1                          # Quantize KV cache keys
                    --cache-type-v q5_1                          # Quantize KV cache values
                    #--fit off                                    #
                    --threads 16                                 # CPU threads helping tg
                    --threads-batch 16                           # CPU threads helping pg
                    --batch-size 2048                            # Batch size
                    --ubatch-size 1024                           # Micro‑batch size
                    --cache-ram 0                                # seems not working
                    --reasoning auto                             # Auto reasoning
                    --reasoning-format deepseek                  # Reasoning format
                    --reasoning-budget 1024                      # Reasoning budget
                    --log-verbosity 4                            # Log verbosity
                    --host 0.0.0.0 --port 8000                   # Listen on all interfaces, port 8000
                    --cont-batching                              # Continuous batching
                    --no-warmup                                  # Skip warmup
                    --no-mmap                                    # Don’t memory‑map model
                    --mlock                                      # Lock model in RAM
                    --jinja                                      # Jinja chat template
                    --metrics                                    # View metrics by accessing http://<ip:port>/metrics
                  )
                  
                  # print the cmdline
                  echo "${LLAMA_SERVER}"
                  for ((i=0; i<${#ARGS[@]}; i+=2)); do
                    echo "${ARGS[i]} ${ARGS[i+1]}"
                  done
                  
                  # run the cmd
                  ${LLAMA_SERVER} "${ARGS[@]}"
                  
                  John AtoJ 在线
                  John AtoJ 在线
                  John Ato
                  编写于 最后由 编辑
                  #23

                  @laobenxiong 你这套参数接近目前的极限了,还有一个SMITHY可以试试,--cache-ram多轮对话好像有用。

                  L 1 条回复 最后回复
                  0
                  • John AtoJ John Ato

                    @laobenxiong 你这套参数接近目前的极限了,还有一个SMITHY可以试试,--cache-ram多轮对话好像有用。

                    L 离线
                    L 离线
                    laobenxiong
                    编写于 最后由 编辑
                    #24

                    @John-Ato smithy是什么? 没搜到

                    1 条回复 最后回复
                    0
                    • G 离线
                      G 离线
                      goodhat5405
                      编写于 最后由 编辑
                      #25

                      最近在用这个llama.cp +qwen 35 a3b。 m5 max 蛮好的 百来token

                      1 条回复 最后回复
                      0

                      你好!看起来您对这段对话很感兴趣,但您还没有一个账号。

                      厌倦了每次访问都刷到同样的帖子?您注册账号后,您每次返回时都能精准定位到您上次浏览的位置,并可选择接收新回复通知(通过邮件或推送通知)。您还能收藏书签、为帖子顶,向社区成员表达您的欣赏。

                      有了你的建议,这篇帖子会更精彩哦 💗

                      注册 登录
                      回复
                      • 在新帖中回复
                      登录后回复
                      • 从旧到新
                      • 从新到旧
                      • 最多赞同


                      • 登录

                      • 没有帐号? 注册

                      • 登录或注册以进行搜索。
                      • 第一个帖子
                        最后一个帖子
                      0
                      • 版块
                      • 最新
                      • 标签
                      • 热门
                      • 用户
                      • 群组