我把qwen3-vl 8b q4當OCR用,還不錯,主要是不佔用太多內存大概6G,qwen3.6-27b q4推理能力更強只是需要更大內存,大概16G以上
A
alanwoo
@alanwoo
-
关于业务AI升级的几个疑问请教各位大佬:1、视觉图片识别开源模型哪个合适?2、OCR识别哪家开源模型做得好?3、小型应用什么工具开发比较好? -
Hermes Agent 最新版本 v0.17.0 部署本地模型 bug -
Hermes Agent 最新版本 v0.17.0 部署本地模型 bugHermes Agent 最新版本 v0.17.0 (June 19,2026)部署本地模型 bug
本地模型:qwen3.6-27b-fp8
推理引擎:vLLM 0.23.0
顯卡:NVIDIA RTX PRO 6000 Blackwell Workstation Edition
OS: Ubuntu 24.04.4 LTS今天當我更新 Hermes Agent 從 v0.14.0 到 最新版本 v.017.0 後頻繁出現一下報錯碼,在加入 max_tokens 設定後問題得以解決。
報錯碼
API call failed (attempt 1/3): BadRequestError [HTTP 400] Provider: custom Model: qwen36-27b Endpoint: http://127.0.0.1:8000/v1 Error: HTTP 400: This model's maximum context length is 65536 tokens. However, you requested 65536 output tokens and your prompt contains 81579 characters (more than 0 characters, which is the upper bound for 0 input tokens). Please reduce the length of the input prompt or the number of requested output tokens. (par Details: {'message': "This model's maximum context length is 65536 tokens. However, you requested 65536 output tokens and your prompt contains 81579 characters (more than 0 characters, which is the upper bound for 0 input tokens). Please reduce the length of the input prompt or the number of requested output Elapsed: 0.21s Context: 2 msgs, ~4,861 tokens Output cap too large for current prompt — retrying with max_tokens=38,279 (available_tokens=38,343; context_length unchanged at 65,536) API call failed (attempt 1/3): BadRequestError [HTTP 400] Provider: custom Model: qwen36-27b Endpoint: http://127.0.0.1:8000/v1 Error: HTTP 400: This model's maximum context length is 65536 tokens. However, you requested 65536 output tokens and your prompt contains 98728 characters (more than 0 characters, which is the upper bound for 0 input tokens). Please reduce the length of the input prompt or the number of requested output tokens. (par Details: {'message': "This model's maximum context length is 65536 tokens. However, you requested 65536 output tokens and your prompt contains 98728 characters (more than 0 characters, which is the upper bound for 0 input tokens). Please reduce the length of the input prompt or the number of requested output Elapsed: 0.04s Context: 5 msgs, ~9,388 tokens Output cap too large for current prompt — retrying with max_tokens=32,562 (available_tokens=32,626; context_length unchanged at 65,536)修改前 config.yaml 設置
model: provider: custom default: qwen36-27b base_url: http://127.0.0.1:8000/v1解決方法:
加入 max_tokens: 8192
*可以根據需求調整 max_tokens 參數修改後 config.yaml 設置
model: provider: custom default: qwen36-27b base_url: http://127.0.0.1:8000/v1 max_tokens: 8192希望這個對大家有幫助
Alan