Files
agent-skills/sn-image-base/SKILL.md
Hermes Agent ccc63d1e70 first commit
2026-05-10 13:52:46 +08:00

13 KiB
Raw Blame History

name, description, triggers, metadata
name description triggers metadata
sn-image-base Base-layer skill for the SenseNova-Skills project, providing low-level APIs for image generation, recognition (VLM), and text optimization (LLM). This skill does not preprocess inputs; it only calls backend services and returns results. This skill is not user-facing and is intended for upper-layer skills only.
SenseNova-Skills Image Generation
SenseNova-Skills 图像基础工具
sn 图像基础工具
SenseNova 图像基础工具
SenseNova Image Generation
sn-image-base
project tier category user_visible
SenseNova-Skills 0 infrastructure false

sn-image-base

Dependency Installation

pip install -r requirements.txt

Overview

sn-image-base is the base-layer skill (tier 0) of the SenseNova-Skills project and provides three low-level tools:

  • sn-image-generate: image generation (calls text-to-image-no-enhance API)
  • sn-image-recognize: image recognition (uses VLM to analyze image content)
  • sn-text-optimize: text optimization (uses LLM to process text)

This skill does not perform any input preprocessing and only calls backend services to return results.

Tools List

sn-image-generate

Image generation tool that calls the text-to-image-no-enhance API.

--prompt is required; all other parameters are optional:

Parameter Type Default Description
--prompt string Required Prompt text for image generation
--negative-prompt string "" Negative prompt
--image-size string 2k Image size preset, supports 2k only
--aspect-ratio string 16:9 Aspect ratio, e.g. 1:1, 16:9, 9:16
--seed int None Random seed for reproducible generation
--unet-name string None Specify a UNet model name
--api-key string SN_IMAGE_GEN_API_KEY -> SN_API_KEY API key (CLI argument has priority; MissingApiKeyError is raised when all are empty)
--base-url string SN_IMAGE_GEN_BASE_URL -> SN_BASE_URL API base URL (CLI argument has priority)
--poll-interval float 5.0 Polling interval (seconds)
--timeout float 300.0 Timeout (seconds)
--insecure flag False Disable TLS verification
--save-path Path Auto-generated Save path

sn-image-recognize

Image recognition tool that uses VLM (Vision Language Model) to analyze image content. Supports multiple image inputs.

--images and --user-prompt (or --user-prompt-path) are required. All other parameters use three-level defaults (CLI > env var > built-in default):

Parameter Type Built-in Default Env Var Description
--api-key string No hardcoded default SN_VISION_API_KEY -> SN_CHAT_API_KEY -> SN_API_KEY Chat runtime API key; raises MissingApiKeyError when all are unset
--base-url string SN_CHAT_BASE_URL default SN_VISION_BASE_URL -> SN_CHAT_BASE_URL -> SN_BASE_URL Vision provider base URL; falls back to shared chat/global provider
--model string sensenova-6.7-flash-lite SN_VISION_MODEL -> SN_CHAT_MODEL Vision-capable model name
--vlm-type string openai-completions SN_VISION_TYPE -> SN_CHAT_TYPE Chat protocol type override
--user-prompt-path string None - Local file path, mutually exclusive with --user-prompt
--system-prompt-path string None - Local file path, mutually exclusive with --system-prompt

Available values for --vlm-type:

  • openai-completions: OpenAI-compatible /v1/chat/completions interface
  • anthropic-messages: Anthropic Messages /v1/messages interface

sn-text-optimize

Text optimization tool that uses LLM (Language Model) to optimize text content. Does not accept image inputs.

--user-prompt (or --user-prompt-path) is required. All other parameters use three-level defaults (CLI > env var > built-in default):

Parameter Type Built-in Default Env Var Description
--api-key string No hardcoded default SN_TEXT_API_KEY -> SN_CHAT_API_KEY -> SN_API_KEY Chat runtime API key; raises MissingApiKeyError when all are unset
--base-url string SN_CHAT_BASE_URL default SN_TEXT_BASE_URL -> SN_CHAT_BASE_URL -> SN_BASE_URL Text provider base URL; falls back to shared chat/global provider
--model string sensenova-6.7-flash-lite SN_TEXT_MODEL -> SN_CHAT_MODEL Text model name
--llm-type string openai-completions SN_TEXT_TYPE -> SN_CHAT_TYPE Chat protocol type override
--user-prompt-path string None - Local file path, mutually exclusive with --user-prompt
--system-prompt-path string None - Local file path, mutually exclusive with --system-prompt

Available values for --llm-type:

  • openai-completions: OpenAI-compatible /v1/chat/completions interface
  • anthropic-messages: Anthropic Messages /v1/messages interface

VLM vs LLM

Tool Model Type Image Input Interface Type Parameter
sn-image-recognize VLM (Vision Language Model) Yes, supports multiple images --vlm-type
sn-text-optimize LLM (Language Model) No, text only --llm-type

Usage

All tools are called through the unified sn_agent_runner.py entrypoint:

# Image generation (only prompt required; api-key/base-url have defaults)
python scripts/sn_agent_runner.py sn-image-generate \
    --prompt "..."

# Image generation (override base-url)
python scripts/sn_agent_runner.py sn-image-generate \
    --prompt "..." \
    --base-url "https://custom-endpoint.com/v1"

# Image generation (explicitly override api-key)
python scripts/sn_agent_runner.py sn-image-generate \
    --prompt "..." \
    --api-key "sk-xxx"

# Image recognition (VLM) - minimal call (uses built-in Sensenova defaults)
python scripts/sn_agent_runner.py sn-image-recognize \
    --user-prompt "Describe the image" \
    --images "path/to/image.png"

# Image recognition (VLM) - override to Anthropic Claude API compatible (messages interface)
python scripts/sn_agent_runner.py sn-image-recognize \
    --user-prompt "Describe the image" \
    --images "path/to/image.png" \
    --api-key "sk-ant-xxx" \
    --base-url "https://api.anthropic.com" \
    --model "claude-sonnet-4-6" \
    --vlm-type "anthropic-messages"

# Text optimization (LLM) - minimal call (uses built-in Sensenova defaults)
python scripts/sn_agent_runner.py sn-text-optimize \
    --user-prompt "Optimize the text: ..."

# Text optimization (LLM) - override to Anthropic Claude API compatible (messages interface)
python scripts/sn_agent_runner.py sn-text-optimize \
    --user-prompt "Optimize the text: ..." \
    --api-key "sk-ant-xxx" \
    --base-url "https://api.anthropic.com" \
    --model "claude-sonnet-4-6" \
    --llm-type "anthropic-messages"

Default Parameter Behavior

Authentication parameters for sn-image-generate have the following default behavior:

Parameter Default Override Description
--base-url SN_IMAGE_GEN_BASE_URL -> SN_BASE_URL --base-url "..." CLI argument has priority
--api-key SN_IMAGE_GEN_API_KEY -> SN_API_KEY --api-key "..." CLI argument has priority; throws MissingApiKeyError if all values are empty

sn-image-recognize and sn-text-optimize use priority: CLI argument > command-specific env var > shared SN_CHAT_* env var > global SN_* env var > built-in default.

Parameter Built-in Default Vision Env Var Text Env Var
--api-key None (must be provided) SN_VISION_API_KEY -> SN_CHAT_API_KEY -> SN_API_KEY SN_TEXT_API_KEY -> SN_CHAT_API_KEY -> SN_API_KEY
--base-url https://token.sensenova.cn/v1 SN_VISION_BASE_URL -> SN_CHAT_BASE_URL -> SN_BASE_URL SN_TEXT_BASE_URL -> SN_CHAT_BASE_URL -> SN_BASE_URL
--model sensenova-6.7-flash-lite SN_VISION_MODEL -> SN_CHAT_MODEL SN_TEXT_MODEL -> SN_CHAT_MODEL
--vlm-type / --llm-type openai-completions SN_VISION_TYPE -> SN_CHAT_TYPE SN_TEXT_TYPE -> SN_CHAT_TYPE

api_key resolution order (high to low): CLI --api-key > command-specific key (SN_VISION_API_KEY/SN_TEXT_API_KEY) > SN_CHAT_API_KEY > SN_API_KEY. If all are unset, MissingApiKeyError is raised.

Only --api-key must be provided via CLI or environment; base URL, model, and interface type have shared chat defaults.

Agent Configuration Integration

The agent can automatically read parameters from openclaw.json without manual input:

CLI Parameter openclaw.json Field Example
--base-url providers.<name>.baseUrl https://api.anthropic.com
--llm-type providers.<name>.api anthropic-messages / openai-completions
--vlm-type providers.<name>.api anthropic-messages / openai-completions
--model providers.<name>.models[].id claude-sonnet-4-6
--api-key providers.<name>.apiKey or env var sk-cp-...

Note: --llm-type and --vlm-type share the same providers.<name>.api field and are used by LLM and VLM tools respectively.

Mapping between provider.api and interface type:

api Value Corresponding --llm-type / --vlm-type Endpoint Path
anthropic-messages anthropic-messages /v1/messages
openai-completions openai-completions /v1/chat/completions
openai-responses (future extension) /responses

Mapping Between base-url and Interface Type

Different API types have different requirements for base-url format:

Type --llm-type / --vlm-type Recommended base-url Code Appended Path Final URL Example
LLM openai-completions https://token.sensenova.cn/v1 /chat/completions https://token.sensenova.cn/v1/chat/completions
LLM anthropic-messages https://api.anthropic.com/v1 /messages https://api.anthropic.com/v1/messages
VLM openai-completions https://token.sensenova.cn/v1 /chat/completions https://token.sensenova.cn/v1/chat/completions
VLM anthropic-messages https://api.anthropic.com/v1 /messages https://api.anthropic.com/v1/messages

Note:

  • Recommended chat base URLs include the provider API version path, for example /v1.
  • For compatibility, if the configured chat base URL has no path, the runner appends /v1/chat/completions or /v1/messages.
  • If the configured chat base URL already has a path such as /v1, the runner appends only /chat/completions or /messages.
  • Some providers use versioned paths other than /v1, such as Gemini's /v1beta/openai.

Output Format

All tools support two output formats:

  • --output-format text (default): outputs plain text result
  • --output-format json: outputs JSON, including status and elapsed_seconds (runtime in seconds, rounded to 2 decimals)

JSON output for sn-image-recognize and sn-text-optimize also includes model, base_url, and interface_type to verify the effective runtime configuration:

{
  "status": "ok",
  "result": "...",
  "model": "sensenova-6.7-flash-lite",
  "base_url": "https://token.sensenova.cn/v1",
  "interface_type": "openai-completions",
  "elapsed_seconds": 1.23
}

On failure:

{
  "status": "failed",
  "error": "error message",
  "elapsed_seconds": 0.05
}

Input/Output Specification

See references/api_spec.md for details.


⚠️ 厂商绑定:此 skill 绑定 SenseNova 专用 API图像生成、识别、文本优化无法替换为其他模型。如果 SenseNova 不再免费或无 plan此 skill 将不可用。

依赖: SN_API_KEY (SenseNova 平台 API key), Pillow (~/.hermes/hermes-agent/venv/bin/pip3 install Pillow) 配置参考: references/sensenova-config.md 可替代方案: comfyui (本地图像生成) + mmx vision (图像理解)

Pitfalls

Pillow 依赖未安装

Symptom: ModuleNotFoundError: No module named 'PIL' Root cause: sn-image-generate 使用 PIL 处理图像,但系统 Python 或 venv 中未安装 Pillow。 Fix: pip install Pillow(如果使用 hermes-agent 的 venv需要用 ~/.hermes/hermes-agent/venv/bin/pip3 install Pillow)。 Note: hermes-agent 的 Python 路径是 ~/.hermes/hermes-agent/venv/bin/python3,不是系统 python3。

API 限流策略

SenseNova 的限流是按 5 小时窗口计算,不是按分钟:

  • sensenova-6.7-flash-lite: 1500 次/5小时
  • sensenova-u1-fast: 1500 次/5小时
  • deepseek-v4-flash: 150 次/5小时最严

Base URL

所有 SenseNova 模型统一使用: https://token.sensenova.cn/v1