Hermes Agent 最新版本 v0.17.0 (June 19,2026)部署本地模型 bug
本地模型:qwen3.6-27b-fp8
推理引擎:vLLM 0.23.0
顯卡:NVIDIA RTX PRO 6000 Blackwell Workstation Edition
OS: Ubuntu 24.04.4 LTS
今天當我更新 Hermes Agent 從 v0.14.0 到 最新版本 v.017.0 後頻繁出現一下報錯碼,在加入 max_tokens 設定後問題得以解決。
報錯碼
API call failed (attempt 1/3): BadRequestError [HTTP 400]
Provider: custom Model: qwen36-27b
Endpoint: http://127.0.0.1:8000/v1
Error: HTTP 400: This model's maximum context length is 65536 tokens. However, you requested 65536 output tokens and your prompt contains 81579 characters (more than 0 characters, which is the upper bound for 0 input tokens). Please reduce the length of the input prompt or the number of requested output tokens. (par
Details: {'message': "This model's maximum context length is 65536 tokens. However, you requested 65536 output tokens and your prompt contains 81579 characters (more than 0 characters, which is the upper bound for 0 input tokens). Please reduce the length of the input prompt or the number of requested output
Elapsed: 0.21s Context: 2 msgs, ~4,861 tokens
Output cap too large for current prompt — retrying with max_tokens=38,279 (available_tokens=38,343; context_length unchanged at 65,536)
API call failed (attempt 1/3): BadRequestError [HTTP 400]
Provider: custom Model: qwen36-27b
Endpoint: http://127.0.0.1:8000/v1
Error: HTTP 400: This model's maximum context length is 65536 tokens. However, you requested 65536 output tokens and your prompt contains 98728 characters (more than 0 characters, which is the upper bound for 0 input tokens). Please reduce the length of the input prompt or the number of requested output tokens. (par
Details: {'message': "This model's maximum context length is 65536 tokens. However, you requested 65536 output tokens and your prompt contains 98728 characters (more than 0 characters, which is the upper bound for 0 input tokens). Please reduce the length of the input prompt or the number of requested output
Elapsed: 0.04s Context: 5 msgs, ~9,388 tokens
Output cap too large for current prompt — retrying with max_tokens=32,562 (available_tokens=32,626; context_length unchanged at 65,536)
修改前 config.yaml 設置
model:
provider: custom
default: qwen36-27b
base_url: http://127.0.0.1:8000/v1
解決方法:
加入 max_tokens: 8192
*可以根據需求調整 max_tokens 參數
修改後 config.yaml 設置
model:
provider: custom
default: qwen36-27b
base_url: http://127.0.0.1:8000/v1
max_tokens: 8192
希望這個對大家有幫助
Alan