13 KiB
name, description, triggers, metadata
| name | description | triggers | metadata | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| sn-image-base | Base-layer skill for the SenseNova-Skills project, providing low-level APIs for image generation, recognition (VLM), and text optimization (LLM). This skill does not preprocess inputs; it only calls backend services and returns results. This skill is not user-facing and is intended for upper-layer skills only. |
|
|
sn-image-base
Dependency Installation
pip install -r requirements.txt
Overview
sn-image-base is the base-layer skill (tier 0) of the SenseNova-Skills project and provides three low-level tools:
sn-image-generate: image generation (calls text-to-image-no-enhance API)sn-image-recognize: image recognition (uses VLM to analyze image content)sn-text-optimize: text optimization (uses LLM to process text)
This skill does not perform any input preprocessing and only calls backend services to return results.
Tools List
sn-image-generate
Image generation tool that calls the text-to-image-no-enhance API.
--prompt is required; all other parameters are optional:
| Parameter | Type | Default | Description |
|---|---|---|---|
--prompt |
string | Required | Prompt text for image generation |
--negative-prompt |
string | "" |
Negative prompt |
--image-size |
string | 2k |
Image size preset, supports 2k only |
--aspect-ratio |
string | 16:9 |
Aspect ratio, e.g. 1:1, 16:9, 9:16 |
--seed |
int | None |
Random seed for reproducible generation |
--unet-name |
string | None |
Specify a UNet model name |
--api-key |
string | SN_IMAGE_GEN_API_KEY -> SN_API_KEY |
API key (CLI argument has priority; MissingApiKeyError is raised when all are empty) |
--base-url |
string | SN_IMAGE_GEN_BASE_URL -> SN_BASE_URL |
API base URL (CLI argument has priority) |
--poll-interval |
float | 5.0 |
Polling interval (seconds) |
--timeout |
float | 300.0 |
Timeout (seconds) |
--insecure |
flag | False |
Disable TLS verification |
--save-path |
Path | Auto-generated | Save path |
sn-image-recognize
Image recognition tool that uses VLM (Vision Language Model) to analyze image content. Supports multiple image inputs.
--images and --user-prompt (or --user-prompt-path) are required. All other parameters use three-level defaults (CLI > env var > built-in default):
| Parameter | Type | Built-in Default | Env Var | Description |
|---|---|---|---|---|
--api-key |
string | No hardcoded default | SN_VISION_API_KEY -> SN_CHAT_API_KEY -> SN_API_KEY |
Chat runtime API key; raises MissingApiKeyError when all are unset |
--base-url |
string | SN_CHAT_BASE_URL default |
SN_VISION_BASE_URL -> SN_CHAT_BASE_URL -> SN_BASE_URL |
Vision provider base URL; falls back to shared chat/global provider |
--model |
string | sensenova-6.7-flash-lite |
SN_VISION_MODEL -> SN_CHAT_MODEL |
Vision-capable model name |
--vlm-type |
string | openai-completions |
SN_VISION_TYPE -> SN_CHAT_TYPE |
Chat protocol type override |
--user-prompt-path |
string | None |
- | Local file path, mutually exclusive with --user-prompt |
--system-prompt-path |
string | None |
- | Local file path, mutually exclusive with --system-prompt |
Available values for --vlm-type:
openai-completions: OpenAI-compatible/v1/chat/completionsinterfaceanthropic-messages: Anthropic Messages/v1/messagesinterface
sn-text-optimize
Text optimization tool that uses LLM (Language Model) to optimize text content. Does not accept image inputs.
--user-prompt (or --user-prompt-path) is required. All other parameters use three-level defaults (CLI > env var > built-in default):
| Parameter | Type | Built-in Default | Env Var | Description |
|---|---|---|---|---|
--api-key |
string | No hardcoded default | SN_TEXT_API_KEY -> SN_CHAT_API_KEY -> SN_API_KEY |
Chat runtime API key; raises MissingApiKeyError when all are unset |
--base-url |
string | SN_CHAT_BASE_URL default |
SN_TEXT_BASE_URL -> SN_CHAT_BASE_URL -> SN_BASE_URL |
Text provider base URL; falls back to shared chat/global provider |
--model |
string | sensenova-6.7-flash-lite |
SN_TEXT_MODEL -> SN_CHAT_MODEL |
Text model name |
--llm-type |
string | openai-completions |
SN_TEXT_TYPE -> SN_CHAT_TYPE |
Chat protocol type override |
--user-prompt-path |
string | None |
- | Local file path, mutually exclusive with --user-prompt |
--system-prompt-path |
string | None |
- | Local file path, mutually exclusive with --system-prompt |
Available values for --llm-type:
openai-completions: OpenAI-compatible/v1/chat/completionsinterfaceanthropic-messages: Anthropic Messages/v1/messagesinterface
VLM vs LLM
| Tool | Model Type | Image Input | Interface Type Parameter |
|---|---|---|---|
sn-image-recognize |
VLM (Vision Language Model) | Yes, supports multiple images | --vlm-type |
sn-text-optimize |
LLM (Language Model) | No, text only | --llm-type |
Usage
All tools are called through the unified sn_agent_runner.py entrypoint:
# Image generation (only prompt required; api-key/base-url have defaults)
python scripts/sn_agent_runner.py sn-image-generate \
--prompt "..."
# Image generation (override base-url)
python scripts/sn_agent_runner.py sn-image-generate \
--prompt "..." \
--base-url "https://custom-endpoint.com/v1"
# Image generation (explicitly override api-key)
python scripts/sn_agent_runner.py sn-image-generate \
--prompt "..." \
--api-key "sk-xxx"
# Image recognition (VLM) - minimal call (uses built-in Sensenova defaults)
python scripts/sn_agent_runner.py sn-image-recognize \
--user-prompt "Describe the image" \
--images "path/to/image.png"
# Image recognition (VLM) - override to Anthropic Claude API compatible (messages interface)
python scripts/sn_agent_runner.py sn-image-recognize \
--user-prompt "Describe the image" \
--images "path/to/image.png" \
--api-key "sk-ant-xxx" \
--base-url "https://api.anthropic.com" \
--model "claude-sonnet-4-6" \
--vlm-type "anthropic-messages"
# Text optimization (LLM) - minimal call (uses built-in Sensenova defaults)
python scripts/sn_agent_runner.py sn-text-optimize \
--user-prompt "Optimize the text: ..."
# Text optimization (LLM) - override to Anthropic Claude API compatible (messages interface)
python scripts/sn_agent_runner.py sn-text-optimize \
--user-prompt "Optimize the text: ..." \
--api-key "sk-ant-xxx" \
--base-url "https://api.anthropic.com" \
--model "claude-sonnet-4-6" \
--llm-type "anthropic-messages"
Default Parameter Behavior
Authentication parameters for sn-image-generate have the following default behavior:
| Parameter | Default | Override | Description |
|---|---|---|---|
--base-url |
SN_IMAGE_GEN_BASE_URL -> SN_BASE_URL |
--base-url "..." |
CLI argument has priority |
--api-key |
SN_IMAGE_GEN_API_KEY -> SN_API_KEY |
--api-key "..." |
CLI argument has priority; throws MissingApiKeyError if all values are empty |
sn-image-recognize and sn-text-optimize use priority: CLI argument > command-specific env var > shared SN_CHAT_* env var > global SN_* env var > built-in default.
| Parameter | Built-in Default | Vision Env Var | Text Env Var |
|---|---|---|---|
--api-key |
None (must be provided) | SN_VISION_API_KEY -> SN_CHAT_API_KEY -> SN_API_KEY |
SN_TEXT_API_KEY -> SN_CHAT_API_KEY -> SN_API_KEY |
--base-url |
https://token.sensenova.cn/v1 |
SN_VISION_BASE_URL -> SN_CHAT_BASE_URL -> SN_BASE_URL |
SN_TEXT_BASE_URL -> SN_CHAT_BASE_URL -> SN_BASE_URL |
--model |
sensenova-6.7-flash-lite |
SN_VISION_MODEL -> SN_CHAT_MODEL |
SN_TEXT_MODEL -> SN_CHAT_MODEL |
--vlm-type / --llm-type |
openai-completions |
SN_VISION_TYPE -> SN_CHAT_TYPE |
SN_TEXT_TYPE -> SN_CHAT_TYPE |
api_key resolution order (high to low): CLI --api-key > command-specific key (SN_VISION_API_KEY/SN_TEXT_API_KEY) > SN_CHAT_API_KEY > SN_API_KEY. If all are unset, MissingApiKeyError is raised.
Only --api-key must be provided via CLI or environment; base URL, model, and interface type have shared chat defaults.
Agent Configuration Integration
The agent can automatically read parameters from openclaw.json without manual input:
| CLI Parameter | openclaw.json Field | Example |
|---|---|---|
--base-url |
providers.<name>.baseUrl |
https://api.anthropic.com |
--llm-type |
providers.<name>.api |
anthropic-messages / openai-completions |
--vlm-type |
providers.<name>.api |
anthropic-messages / openai-completions |
--model |
providers.<name>.models[].id |
claude-sonnet-4-6 |
--api-key |
providers.<name>.apiKey or env var |
sk-cp-... |
Note: --llm-type and --vlm-type share the same providers.<name>.api field and are used by LLM and VLM tools respectively.
Mapping between provider.api and interface type:
| api Value | Corresponding --llm-type / --vlm-type |
Endpoint Path |
|---|---|---|
anthropic-messages |
anthropic-messages |
/v1/messages |
openai-completions |
openai-completions |
/v1/chat/completions |
openai-responses |
(future extension) | /responses |
Mapping Between base-url and Interface Type
Different API types have different requirements for base-url format:
| Type | --llm-type / --vlm-type |
Recommended base-url | Code Appended Path | Final URL Example |
|---|---|---|---|---|
| LLM | openai-completions |
https://token.sensenova.cn/v1 |
/chat/completions |
https://token.sensenova.cn/v1/chat/completions |
| LLM | anthropic-messages |
https://api.anthropic.com/v1 |
/messages |
https://api.anthropic.com/v1/messages |
| VLM | openai-completions |
https://token.sensenova.cn/v1 |
/chat/completions |
https://token.sensenova.cn/v1/chat/completions |
| VLM | anthropic-messages |
https://api.anthropic.com/v1 |
/messages |
https://api.anthropic.com/v1/messages |
Note:
- Recommended chat base URLs include the provider API version path, for example
/v1. - For compatibility, if the configured chat base URL has no path, the runner appends
/v1/chat/completionsor/v1/messages. - If the configured chat base URL already has a path such as
/v1, the runner appends only/chat/completionsor/messages. - Some providers use versioned paths other than
/v1, such as Gemini's/v1beta/openai.
Output Format
All tools support two output formats:
--output-format text(default): outputs plain text result--output-format json: outputs JSON, includingstatusandelapsed_seconds(runtime in seconds, rounded to 2 decimals)
JSON output for sn-image-recognize and sn-text-optimize also includes model, base_url, and interface_type to verify the effective runtime configuration:
{
"status": "ok",
"result": "...",
"model": "sensenova-6.7-flash-lite",
"base_url": "https://token.sensenova.cn/v1",
"interface_type": "openai-completions",
"elapsed_seconds": 1.23
}
On failure:
{
"status": "failed",
"error": "error message",
"elapsed_seconds": 0.05
}
Input/Output Specification
See references/api_spec.md for details.
⚠️ 厂商绑定:此 skill 绑定 SenseNova 专用 API(图像生成、识别、文本优化),无法替换为其他模型。如果 SenseNova 不再免费或无 plan,此 skill 将不可用。
依赖: SN_API_KEY (SenseNova 平台 API key), Pillow (~/.hermes/hermes-agent/venv/bin/pip3 install Pillow)
配置参考: references/sensenova-config.md
可替代方案: comfyui (本地图像生成) + mmx vision (图像理解)
Pitfalls
Pillow 依赖未安装
Symptom: ModuleNotFoundError: No module named 'PIL'
Root cause: sn-image-generate 使用 PIL 处理图像,但系统 Python 或 venv 中未安装 Pillow。
Fix: pip install Pillow(如果使用 hermes-agent 的 venv,需要用 ~/.hermes/hermes-agent/venv/bin/pip3 install Pillow)。
Note: hermes-agent 的 Python 路径是 ~/.hermes/hermes-agent/venv/bin/python3,不是系统 python3。
API 限流策略
SenseNova 的限流是按 5 小时窗口计算,不是按分钟:
- sensenova-6.7-flash-lite: 1500 次/5小时
- sensenova-u1-fast: 1500 次/5小时
- deepseek-v4-flash: 150 次/5小时(最严)
Base URL
所有 SenseNova 模型统一使用: https://token.sensenova.cn/v1