Files
ephron-ren-prd/prd-qqbot-media-support.md

18 KiB
Raw Blame History

PRD: QQ Bot send_message 媒体附件支持修复

1. 问题描述

tools/send_message_tool.py 中的 _send_qqbot() 函数(第 1677 行)仅发送纯文本消息(msg_type: 0完全忽略了 media_files 参数

当 AI Agent 通过 send_message 工具向 QQ Bot 平台发送带有图片、音频、视频、文档等媒体附件的消息时,附件会被静默丢弃,用户只能收到文本内容。更糟糕的是,如果消息中只有媒体附件而没有文本内容,系统会直接返回错误提示"不支持 QQ Bot 媒体发送"。

与此同时,网关适配器 gateway/platforms/qqbot/adapter.py 中的 QQAdapter 类已经具备完整的媒体发送能力:

方法 行号 功能
send_image() 2601 发送图片(支持 URL 和本地文件)
send_image_file() 2627 发送本地图片文件
send_voice() 2641 发送语音消息
send_video() 2655 发送视频
send_document() 2669 发送文件/文档
_send_media() 2690 底层媒体上传(支持 HTTP URL 直传和本地文件分块上传)

核心矛盾:适配器已具备完整媒体能力,但 send_message 工具的路由层未对接。

2. 影响范围

直接影响

  • 所有通过 QQ Bot 平台发送带媒体附件消息的场景均受影响
  • 包括:图片、语音、视频、文档/PDF 等所有媒体类型

间接影响

  • 当 AI Agent 需要向 QQ 用户/群组发送截图、生成的图片、文件等内容时,用户无法收到
  • 影响 QQ Bot 平台的用户体验完整性

涉及平台

  • Platform.QQBOTQQ 机器人开放平台)

不受影响

  • 其他已正确接入媒体支持的平台Telegram、Discord、Matrix、微信、Signal、元宝、飞书
  • QQ Bot 网关适配器的入站消息处理(接收媒体消息正常)

3. 根因分析

3.1 调用链路分析

当前 QQ Bot 的消息发送路径:

send_message_tool()
  → 分块处理消息
  → 第 658-659 行: _send_qqbot(pconfig, chat_id, chunk)  ← 无 media_files 参数
  → _send_qqbot() 直接用 httpx 发 REST 请求
  → payload = {"content": message, "msg_type": 0}  ← 硬编码纯文本

对比元宝平台的正确路径:

send_message_tool()
  → 第 586-598 行: 检测到 YUANBAO + media_files
  → _send_yuanbao(chat_id, chunk, media_files=media_files)
  → get_active_adapter() 获取运行中的网关适配器
  → send_yuanbao_direct(adapter, chat_id, message, media_files=media_files)
  → 适配器处理媒体上传和发送

3.2 三个断点

  1. _send_qqbot() 函数签名缺少 media_files 参数(第 1677 行)
  2. 调用处未传递 media_files(第 659 行)
  3. QQAdapter 缺少 get_active() 单例模式——网关适配器无法被工具层获取

3.3 缺失的单例注册

元宝适配器 YuanbaoAdapter 拥有完整的单例模式(第 4392-4404 行):

  • _active_instance 类变量
  • get_active() 类方法
  • set_active() 类方法

QQAdapter 缺少这套机制,导致 send_message 工具无法获取正在运行的网关适配器实例。

4. 修复方案

4.1 修改概览

需要修改 2 个文件,共 4 处变更

文件 变更 说明
gateway/platforms/qqbot/adapter.py 添加单例模式 get_active() / set_active() + connect/disconnect 生命周期
gateway/platforms/qqbot/adapter.py 添加模块级 get_active_adapter() 供工具层导入
tools/send_message_tool.py 修改 _send_qqbot() 签名和实现 支持 media_files通过网关适配器路由
tools/send_message_tool.py 添加 QQBOT+媒体路由分支 在主函数中添加与元宝/飞书类似的媒体处理逻辑

4.2 代码 Diff

4.2.1 gateway/platforms/qqbot/adapter.py — 添加单例模式

QQAdapter 类定义中(第 155 行之后)添加单例注册机制:

 class QQAdapter(BasePlatformAdapter):
     """QQ Bot adapter backed by the official QQ Bot WebSocket Gateway + REST API."""

     # QQ Bot API does not support editing sent messages.
     SUPPORTS_MESSAGE_EDITING = False
     MAX_MESSAGE_LENGTH = MAX_MESSAGE_LENGTH
     _TYPING_INPUT_SECONDS = 60
     _TYPING_DEBOUNCE_SECONDS = 50

+    # -- Active instance registry (class-level singleton) ---
+
+    _active_instance: ClassVar[Optional["QQAdapter"]] = None
+
+    @classmethod
+    def get_active(cls) -> Optional["QQAdapter"]:
+        """Return the currently connected QQAdapter, or None."""
+        return cls._active_instance
+
+    @classmethod
+    def set_active(cls, adapter: Optional["QQAdapter"]) -> None:
+        """Register (or clear) the active adapter instance."""
+        cls._active_instance = adapter
+
     @property
     def _log_tag(self) -> str:

connect() 方法中注册活跃实例(第 301 行 _mark_connected() 之后):

             # 4. Start listeners
             self._listen_task = asyncio.create_task(self._listen_loop())
             self._heartbeat_task = asyncio.create_task(self._heartbeat_loop())
             self._mark_connected()
+            QQAdapter.set_active(self)
             logger.info("[%s] Connected", self._log_tag)
             return True

disconnect() 方法中清除活跃实例(第 314 行 _mark_disconnected() 之后):

     async def disconnect(self) -> None:
         """Close all connections and stop listeners."""
         self._running = False
         self._mark_disconnected()
+        if QQAdapter.get_active() is self:
+            QQAdapter.set_active(None)

在文件末尾添加模块级委托函数:

+# ---------------------------------------------------------------------------
+# Module-level thin delegates (preserve import compatibility for send_message tool)
+# ---------------------------------------------------------------------------
+
+
+def get_active_adapter() -> Optional["QQAdapter"]:
+    """Delegate to ``QQAdapter.get_active()``."""
+    return QQAdapter.get_active()

需要在文件顶部添加 ClassVar 导入(如果尚未导入):

-from typing import Any, ClassVar, Dict, List, Optional
+from typing import Any, ClassVar, Dict, List, Optional  # 确认 ClassVar 已导入

4.2.2 tools/send_message_tool.py — 添加 QQ Bot 媒体支持

变更 1修改 _send_qqbot() 函数签名和实现(第 1677 行)

-async def _send_qqbot(pconfig, chat_id, message):
-    """Send via QQBot using the REST API directly (no WebSocket needed).
-
-    Uses the QQ Bot Open Platform REST endpoints to get an access token
-    and post a message. Supports guild channels, C2C (private) chats,
-    and group chats by trying the appropriate endpoints.
-    """
-    try:
-        import httpx
-    except ImportError:
-        return _error("QQBot direct send requires httpx. Run: pip install httpx")
-
-    extra = pconfig.extra or {}
-    appid = extra.get("app_id") or os.getenv("QQ_APP_ID", "")
-    secret = (pconfig.token or extra.get("client_secret")
-              or os.getenv("QQ_CLIENT_SECRET", ""))
-    if not appid or not secret:
-        return _error("QQBot: QQ_APP_ID / QQ_CLIENT_SECRET not configured.")
-
-    try:
-        async with httpx.AsyncClient(timeout=15) as client:
-            # ... (rest of function unchanged until payload line)
-
-            payload = {"content": message[:4000], "msg_type": 0}
-
-            # ... (endpoint attempts unchanged)
-    except Exception as e:
-        return _error(f"QQBot send failed: {e}")
+async def _send_qqbot(pconfig, chat_id, message, media_files=None):
+    """Send via QQBot using the running gateway adapter.
+
+    When media_files are present, routes through the gateway adapter's
+    native media upload pipeline (send_image, send_document, etc.).
+    Falls back to REST API for text-only messages when the adapter
+    is not running.
+    """
+    # If we have media files, try the gateway adapter first
+    if media_files:
+        try:
+            from gateway.platforms.qqbot.adapter import get_active_adapter
+        except ImportError:
+            return _error("QQBot adapter module not available.")
+
+        adapter = get_active_adapter()
+        if adapter is None:
+            return _error(
+                "QQBot adapter is not running. "
+                "Start the gateway with qqbot platform enabled first "
+                "to send media attachments."
+            )
+
+        # Send text first (if any)
+        if message.strip():
+            text_result = await adapter.send(chat_id=chat_id, content=message)
+            if not text_result.success:
+                return {"error": f"QQBot text send failed: {text_result.error}"}
+
+        # Send each media file
+        last_result = None
+        for file_path, _is_url in media_files:
+            import mimetypes
+            mime, _ = mimetypes.guess_type(file_path)
+            mime = (mime or "").lower()
+
+            if mime.startswith("image/"):
+                result = await adapter.send_image(chat_id, file_path)
+            elif mime.startswith("video/"):
+                result = await adapter.send_video(chat_id, file_path)
+            elif mime.startswith("audio/"):
+                result = await adapter.send_voice(chat_id, file_path)
+            else:
+                result = await adapter.send_document(chat_id, file_path)
+
+            if not result.success:
+                return {"error": f"QQBot media send failed: {result.error}"}
+            last_result = result
+
+        return {
+            "success": True,
+            "platform": "qqbot",
+            "chat_id": chat_id,
+            "media_sent": len(media_files),
+        }
+
+    # Text-only: use REST API directly (no gateway needed)
+    try:
+        import httpx
+    except ImportError:
+        return _error("QQBot direct send requires httpx. Run: pip install httpx")
+
+    extra = pconfig.extra or {}
+    appid = extra.get("app_id") or os.getenv("QQ_APP_ID", "")
+    secret = (pconfig.token or extra.get("client_secret")
+              or os.getenv("QQ_CLIENT_SECRET", ""))
+    if not appid or not secret:
+        return _error("QQBot: QQ_APP_ID / QQ_CLIENT_SECRET not configured.")
+
+    try:
+        async with httpx.AsyncClient(timeout=15) as client:
+            # (existing REST API logic unchanged)
+            # Step 1: Get access token
+            token_resp = await client.post(
+                "https://bots.qq.com/app/getAppAccessToken",
+                json={"appId": str(appid), "clientSecret": str(secret)},
+            )
+            if token_resp.status_code != 200:
+                return _error(f"QQBot token request failed: {token_resp.status_code}")
+            token_data = token_resp.json()
+            access_token = token_data.get("access_token")
+            if not access_token:
+                return _error(f"QQBot: no access_token in response")
+
+            headers = {
+                "Authorization": f"QQBot {access_token}",
+                "Content-Type": "application/json",
+            }
+            payload = {"content": message[:4000], "msg_type": 0}
+
+            # Try channel endpoint first
+            url = f"https://api.sgroup.qq.com/channels/{chat_id}/messages"
+            resp = await client.post(url, json=payload, headers=headers)
+            if resp.status_code in (200, 201):
+                data = resp.json()
+                return {"success": True, "platform": "qqbot", "chat_id": chat_id,
+                        "message_id": data.get("id")}
+
+            # Try C2C endpoint
+            url_c2c = f"https://api.sgroup.qq.com/v2/users/{chat_id}/messages"
+            resp_c2c = await client.post(url_c2c, json=payload, headers=headers)
+            if resp_c2c.status_code in (200, 201):
+                data = resp_c2c.json()
+                return {"success": True, "platform": "qqbot", "chat_id": chat_id,
+                        "message_id": data.get("id")}
+
+            # Try group endpoint
+            url_group = f"https://api.sgroup.qq.com/v2/groups/{chat_id}/messages"
+            resp_group = await client.post(url_group, json=payload, headers=headers)
+            if resp_group.status_code in (200, 201):
+                data = resp_group.json()
+                return {"success": True, "platform": "qqbot", "chat_id": chat_id,
+                        "message_id": data.get("id")}
+
+            return _error(f"QQBot send failed: channel={resp.status_code} c2c={resp_c2c.status_code} group={resp_group.status_code}")
+    except Exception as e:
+        return _error(f"QQBot send failed: {e}")

变更 2在主函数中添加 QQBOT+媒体路由分支(第 585-598 行之后,第 600 行飞书分支之前)

+    # --- QQBot: native media attachment support via running gateway adapter ---
+    if platform == Platform.QQBOT and media_files:
+        last_result = None
+        for i, chunk in enumerate(chunks):
+            is_last = (i == len(chunks) - 1)
+            result = await _send_qqbot(
+                pconfig,
+                chat_id,
+                chunk,
+                media_files=media_files if is_last else None,
+            )
+            if isinstance(result, dict) and result.get("error"):
+                return result
+            last_result = result
+        return last_result
+
     # --- Feishu: native media attachment support via adapter ---
     if platform == Platform.FEISHU and media_files:

变更 3更新不支持媒体的平台列表错误信息(第 618-630 行)

     if media_files and not message.strip():
         return {
             "error": (
-                "send_message MEDIA delivery is currently only supported for telegram, discord, matrix, weixin, signal, yuanbao and feishu; "
+                "send_message MEDIA delivery is currently only supported for telegram, discord, matrix, weixin, signal, yuanbao, feishu and qqbot; "
                 f"target {platform.value} had only media attachments"
             )
         }
     warning = None
     if media_files:
         warning = (
             f"MEDIA attachments were omitted for {platform.value}; "
-            "native send_message media delivery is currently only supported for telegram, discord, matrix, weixin, signal, yuanbao and feishu"
+            "native send_message media delivery is currently only supported for telegram, discord, matrix, weixin, signal, yuanbao, feishu and qqbot"
         )

4.3 设计决策说明

决策 理由
文本和媒体分开发送 QQ Bot API 的 msg_type=0(文本)和 msg_type=7(媒体/富媒体)是不同的消息类型,无法在一条消息中混合
媒体附件仅在最后一个分块时发送 与元宝/飞书保持一致的模式,避免在每个文本分块后都重复发送媒体
纯文本消息仍保留 REST API 直发路径 向后兼容——网关未运行时,纯文本消息仍可正常发送,无需强制依赖网关
媒体类型通过 MIME 类型自动判断 复用 Python 标准库 mimetypes,无需额外依赖

5. 验证方法

5.1 单元测试

# 测试 1: QQAdapter 单例模式
async def test_qqbot_adapter_singleton():
    """验证 set_active/get_active 正确注册和清除实例。"""
    from gateway.platforms.qqbot.adapter import QQAdapter

    assert QQAdapter.get_active() is None

    # 模拟适配器实例
    mock_config = PlatformConfig(platform=Platform.QQBOT, ...)
    adapter = QQAdapter(mock_config)
    QQAdapter.set_active(adapter)
    assert QQAdapter.get_active() is adapter

    QQAdapter.set_active(None)
    assert QQAdapter.get_active() is None


# 测试 2: _send_qqbot 媒体路由
async def test_send_qqbot_with_media(mocker):
    """验证有 media_files 时通过网关适配器路由。"""
    mock_adapter = mocker.MagicMock()
    mock_adapter.send = mocker.AsyncMock(return_value=SendResult(success=True))
    mock_adapter.send_image = mocker.AsyncMock(return_value=SendResult(success=True))

    mocker.patch(
        "gateway.platforms.qqbot.adapter.QQAdapter.get_active",
        return_value=mock_adapter,
    )

    result = await _send_qqbot(
        pconfig=...,
        chat_id="test_chat",
        message="caption",
        media_files=[("/tmp/test.png", False)],
    )

    assert result["success"] is True
    assert result["media_sent"] == 1
    mock_adapter.send.assert_called_once()
    mock_adapter.send_image.assert_called_once()


# 测试 3: _send_qqbot 无网关时的错误处理
async def test_send_qqbot_media_no_adapter(mocker):
    """验证网关未运行时返回清晰错误。"""
    mocker.patch(
        "gateway.platforms.qqbot.adapter.QQAdapter.get_active",
        return_value=None,
    )

    result = await _send_qqbot(
        pconfig=...,
        chat_id="test_chat",
        message="",
        media_files=[("/tmp/test.png", False)],
    )

    assert "error" in result
    assert "adapter is not running" in result["error"]


# 测试 4: 纯文本消息保持原有行为
async def test_send_qqbot_text_only_unchanged(mocker):
    """验证纯文本消息不受修改影响,仍通过 REST API 发送。"""
    # (mock httpx and verify REST API path is taken)

5.2 集成测试

  1. 启动网关:确保 config.yamlplatforms.qq.enabled: true,网关成功连接 QQ Bot
  2. 发送纯文本:通过 send_message 向 QQ 用户发送纯文本,验证正常
  3. 发送图片:通过 send_message 向 QQ 用户发送带图片附件的消息,验证图片被原生上传并展示
  4. 发送文档:发送 PDF/文件附件,验证文件可下载
  5. 发送语音/视频:发送音频和视频文件,验证原生播放
  6. 无网关降级:停止网关后发送纯文本,验证仍通过 REST API 直发成功
  7. 无网关发媒体:停止网关后发送带媒体的消息,验证返回清晰错误提示

5.3 回归测试

# 运行现有测试套件,确保无回归
cd /home/ubuntu/.hermes/hermes-agent
source .venv/bin/activate
python -m pytest tests/ -x -q

# 运行 QQ Bot 相关测试(如有)
python -m pytest tests/ -k "qqbot" -v

5.4 手动验证清单

  • QQ Bot 网关正常连接(日志显示 QQBot: Connected
  • 纯文本消息发送成功
  • 图片附件消息发送成功(用户看到原生图片)
  • 文档附件消息发送成功(用户可下载)
  • 语音附件消息发送成功
  • 视频附件消息发送成功
  • 网关未运行时纯文本降级正常
  • 网关未运行时媒体附件返回清晰错误
  • 其他平台Telegram、Discord 等)发送不受影响