@Liang-Wang 5090的性能情况供参考
Windows 11 Pro (Build 26200)
├── Ryzen 9 9950X3D · 64GB RAM · RTX 5090 32GB
└── WSL2 (Ubuntu 24.04) — vmmemWSL 30.3GB
├── llama.cpp v9294 (CUDA 后端)
│ ├── Qwen3.6-27B-Q5_K_M → :8080 (主模型)
│ └── MiniCPM-V 2.6-Q3 → :8081 (视觉)
├── Hermes Agent v0.14.0 (Python 3.11.15)
~/llama.cpp/build/bin/llama-bench
--model ~/models/Qwen3-27B/Qwen3.6-27B-Q5_K_M.gguf
--n-gpu-layers 999
--flash-attn 1
-p 512,4096,32768
-n 128
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 32606 MiB):
Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes, VRAM: 32606 MiB
| model | size | params | backend | ngl | fa | test | t/s |
|---|---|---|---|---|---|---|---|
| qwen35 27B Q5_K - Medium | 18.46 GiB | 27.32 B | CUDA | 999 | 1 | pp512 | 3563.38 ± 231.17 |
| qwen35 27B Q5_K - Medium | 18.46 GiB | 27.32 B | CUDA | 999 | 1 | pp4096 | 3498.68 ± 9.65 |
| qwen35 27B Q5_K - Medium | 18.46 GiB | 27.32 B | CUDA | 999 | 1 | pp32768 | 3340.48 ± 350.69 |
| qwen35 27B Q5_K - Medium | 18.46 GiB | 27.32 B | CUDA | 999 | 1 | tg128 | 62.49 ± 0.99 |
build: d14ce3dab (9235)