first commit
This commit is contained in:
276
sn-image-base/SKILL.md
Normal file
276
sn-image-base/SKILL.md
Normal file
@@ -0,0 +1,276 @@
|
||||
---
|
||||
name: sn-image-base
|
||||
description: |
|
||||
Base-layer skill for the SenseNova-Skills project, providing low-level APIs for image generation, recognition (VLM), and text optimization (LLM).
|
||||
This skill does not preprocess inputs; it only calls backend services and returns results.
|
||||
This skill is not user-facing and is intended for upper-layer skills only.
|
||||
triggers:
|
||||
- "SenseNova-Skills Image Generation"
|
||||
- "SenseNova-Skills 图像基础工具"
|
||||
- "sn 图像基础工具"
|
||||
- "SenseNova 图像基础工具"
|
||||
- "SenseNova Image Generation"
|
||||
- "sn-image-base"
|
||||
metadata:
|
||||
project: SenseNova-Skills
|
||||
tier: 0
|
||||
category: infrastructure
|
||||
user_visible: false
|
||||
---
|
||||
|
||||
# sn-image-base
|
||||
|
||||
## Dependency Installation
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## Overview
|
||||
|
||||
`sn-image-base` is the base-layer skill (tier 0) of the SenseNova-Skills project and provides three low-level tools:
|
||||
|
||||
- `sn-image-generate`: image generation (calls text-to-image-no-enhance API)
|
||||
- `sn-image-recognize`: image recognition (uses VLM to analyze image content)
|
||||
- `sn-text-optimize`: text optimization (uses LLM to process text)
|
||||
|
||||
This skill **does not perform any input preprocessing** and only calls backend services to return results.
|
||||
|
||||
## Tools List
|
||||
|
||||
### sn-image-generate
|
||||
|
||||
Image generation tool that calls the text-to-image-no-enhance API.
|
||||
|
||||
`--prompt` is required; all other parameters are optional:
|
||||
|
||||
| Parameter | Type | Default | Description |
|
||||
|------|------|--------|------|
|
||||
| `--prompt` | string | **Required** | Prompt text for image generation |
|
||||
| `--negative-prompt` | string | `""` | Negative prompt |
|
||||
| `--image-size` | string | `2k` | Image size preset, supports `2k` only |
|
||||
| `--aspect-ratio` | string | `16:9` | Aspect ratio, e.g. `1:1`, `16:9`, `9:16` |
|
||||
| `--seed` | int | `None` | Random seed for reproducible generation |
|
||||
| `--unet-name` | string | `None` | Specify a UNet model name |
|
||||
| `--api-key` | string | `SN_IMAGE_GEN_API_KEY` -> `SN_API_KEY` | API key (CLI argument has priority; `MissingApiKeyError` is raised when all are empty) |
|
||||
| `--base-url` | string | `SN_IMAGE_GEN_BASE_URL` -> `SN_BASE_URL` | API base URL (CLI argument has priority) |
|
||||
| `--poll-interval` | float | `5.0` | Polling interval (seconds) |
|
||||
| `--timeout` | float | `300.0` | Timeout (seconds) |
|
||||
| `--insecure` | flag | `False` | Disable TLS verification |
|
||||
| `--save-path` | Path | Auto-generated | Save path |
|
||||
|
||||
### sn-image-recognize
|
||||
|
||||
Image recognition tool that uses VLM (Vision Language Model) to analyze image content. Supports multiple image inputs.
|
||||
|
||||
`--images` and `--user-prompt` (or `--user-prompt-path`) are required. All other parameters use three-level defaults (CLI > env var > built-in default):
|
||||
|
||||
| Parameter | Type | Built-in Default | Env Var | Description |
|
||||
|------|------|-----------|---------|------|
|
||||
| `--api-key` | string | No hardcoded default | `SN_VISION_API_KEY` -> `SN_CHAT_API_KEY` -> `SN_API_KEY` | Chat runtime API key; raises `MissingApiKeyError` when all are unset |
|
||||
| `--base-url` | string | `SN_CHAT_BASE_URL` default | `SN_VISION_BASE_URL` -> `SN_CHAT_BASE_URL` -> `SN_BASE_URL` | Vision provider base URL; falls back to shared chat/global provider |
|
||||
| `--model` | string | `sensenova-6.7-flash-lite` | `SN_VISION_MODEL` -> `SN_CHAT_MODEL` | Vision-capable model name |
|
||||
| `--vlm-type` | string | `openai-completions` | `SN_VISION_TYPE` -> `SN_CHAT_TYPE` | Chat protocol type override |
|
||||
| `--user-prompt-path` | string | `None` | - | Local file path, mutually exclusive with `--user-prompt` |
|
||||
| `--system-prompt-path` | string | `None` | - | Local file path, mutually exclusive with `--system-prompt` |
|
||||
|
||||
Available values for `--vlm-type`:
|
||||
|
||||
- `openai-completions`: OpenAI-compatible `/v1/chat/completions` interface
|
||||
- `anthropic-messages`: Anthropic Messages `/v1/messages` interface
|
||||
|
||||
### sn-text-optimize
|
||||
|
||||
Text optimization tool that uses LLM (Language Model) to optimize text content. Does not accept image inputs.
|
||||
|
||||
`--user-prompt` (or `--user-prompt-path`) is required. All other parameters use three-level defaults (CLI > env var > built-in default):
|
||||
|
||||
| Parameter | Type | Built-in Default | Env Var | Description |
|
||||
|------|------|-----------|---------|------|
|
||||
| `--api-key` | string | No hardcoded default | `SN_TEXT_API_KEY` -> `SN_CHAT_API_KEY` -> `SN_API_KEY` | Chat runtime API key; raises `MissingApiKeyError` when all are unset |
|
||||
| `--base-url` | string | `SN_CHAT_BASE_URL` default | `SN_TEXT_BASE_URL` -> `SN_CHAT_BASE_URL` -> `SN_BASE_URL` | Text provider base URL; falls back to shared chat/global provider |
|
||||
| `--model` | string | `sensenova-6.7-flash-lite` | `SN_TEXT_MODEL` -> `SN_CHAT_MODEL` | Text model name |
|
||||
| `--llm-type` | string | `openai-completions` | `SN_TEXT_TYPE` -> `SN_CHAT_TYPE` | Chat protocol type override |
|
||||
| `--user-prompt-path` | string | `None` | - | Local file path, mutually exclusive with `--user-prompt` |
|
||||
| `--system-prompt-path` | string | `None` | - | Local file path, mutually exclusive with `--system-prompt` |
|
||||
|
||||
Available values for `--llm-type`:
|
||||
|
||||
- `openai-completions`: OpenAI-compatible `/v1/chat/completions` interface
|
||||
- `anthropic-messages`: Anthropic Messages `/v1/messages` interface
|
||||
|
||||
## VLM vs LLM
|
||||
|
||||
| Tool | Model Type | Image Input | Interface Type Parameter |
|
||||
|------|----------|-----------------|-------------|
|
||||
| `sn-image-recognize` | VLM (Vision Language Model) | Yes, supports multiple images | `--vlm-type` |
|
||||
| `sn-text-optimize` | LLM (Language Model) | No, text only | `--llm-type` |
|
||||
|
||||
## Usage
|
||||
|
||||
All tools are called through the unified `sn_agent_runner.py` entrypoint:
|
||||
|
||||
```bash
|
||||
# Image generation (only prompt required; api-key/base-url have defaults)
|
||||
python scripts/sn_agent_runner.py sn-image-generate \
|
||||
--prompt "..."
|
||||
|
||||
# Image generation (override base-url)
|
||||
python scripts/sn_agent_runner.py sn-image-generate \
|
||||
--prompt "..." \
|
||||
--base-url "https://custom-endpoint.com/v1"
|
||||
|
||||
# Image generation (explicitly override api-key)
|
||||
python scripts/sn_agent_runner.py sn-image-generate \
|
||||
--prompt "..." \
|
||||
--api-key "sk-xxx"
|
||||
|
||||
# Image recognition (VLM) - minimal call (uses built-in Sensenova defaults)
|
||||
python scripts/sn_agent_runner.py sn-image-recognize \
|
||||
--user-prompt "Describe the image" \
|
||||
--images "path/to/image.png"
|
||||
|
||||
# Image recognition (VLM) - override to Anthropic Claude API compatible (messages interface)
|
||||
python scripts/sn_agent_runner.py sn-image-recognize \
|
||||
--user-prompt "Describe the image" \
|
||||
--images "path/to/image.png" \
|
||||
--api-key "sk-ant-xxx" \
|
||||
--base-url "https://api.anthropic.com" \
|
||||
--model "claude-sonnet-4-6" \
|
||||
--vlm-type "anthropic-messages"
|
||||
|
||||
# Text optimization (LLM) - minimal call (uses built-in Sensenova defaults)
|
||||
python scripts/sn_agent_runner.py sn-text-optimize \
|
||||
--user-prompt "Optimize the text: ..."
|
||||
|
||||
# Text optimization (LLM) - override to Anthropic Claude API compatible (messages interface)
|
||||
python scripts/sn_agent_runner.py sn-text-optimize \
|
||||
--user-prompt "Optimize the text: ..." \
|
||||
--api-key "sk-ant-xxx" \
|
||||
--base-url "https://api.anthropic.com" \
|
||||
--model "claude-sonnet-4-6" \
|
||||
--llm-type "anthropic-messages"
|
||||
```
|
||||
|
||||
### Default Parameter Behavior
|
||||
|
||||
Authentication parameters for `sn-image-generate` have the following default behavior:
|
||||
|
||||
| Parameter | Default | Override | Description |
|
||||
|------|--------|----------|------|
|
||||
| `--base-url` | `SN_IMAGE_GEN_BASE_URL` -> `SN_BASE_URL` | `--base-url "..."` | CLI argument has priority |
|
||||
| `--api-key` | `SN_IMAGE_GEN_API_KEY` -> `SN_API_KEY` | `--api-key "..."` | CLI argument has priority; throws `MissingApiKeyError` if all values are empty |
|
||||
|
||||
`sn-image-recognize` and `sn-text-optimize` use priority: **CLI argument > command-specific env var > shared `SN_CHAT_*` env var > global `SN_*` env var > built-in default**.
|
||||
|
||||
| Parameter | Built-in Default | Vision Env Var | Text Env Var |
|
||||
|------|-----------|-------------|-------------|
|
||||
| `--api-key` | None (must be provided) | `SN_VISION_API_KEY` -> `SN_CHAT_API_KEY` -> `SN_API_KEY` | `SN_TEXT_API_KEY` -> `SN_CHAT_API_KEY` -> `SN_API_KEY` |
|
||||
| `--base-url` | `https://token.sensenova.cn/v1` | `SN_VISION_BASE_URL` -> `SN_CHAT_BASE_URL` -> `SN_BASE_URL` | `SN_TEXT_BASE_URL` -> `SN_CHAT_BASE_URL` -> `SN_BASE_URL` |
|
||||
| `--model` | `sensenova-6.7-flash-lite` | `SN_VISION_MODEL` -> `SN_CHAT_MODEL` | `SN_TEXT_MODEL` -> `SN_CHAT_MODEL` |
|
||||
| `--vlm-type` / `--llm-type` | `openai-completions` | `SN_VISION_TYPE` -> `SN_CHAT_TYPE` | `SN_TEXT_TYPE` -> `SN_CHAT_TYPE` |
|
||||
|
||||
`api_key` resolution order (high to low): CLI `--api-key` > command-specific key (`SN_VISION_API_KEY`/`SN_TEXT_API_KEY`) > `SN_CHAT_API_KEY` > `SN_API_KEY`. If all are unset, `MissingApiKeyError` is raised.
|
||||
|
||||
Only `--api-key` must be provided via CLI or environment; base URL, model, and interface type have shared chat defaults.
|
||||
|
||||
## Agent Configuration Integration
|
||||
|
||||
The agent can automatically read parameters from `openclaw.json` without manual input:
|
||||
|
||||
| CLI Parameter | openclaw.json Field | Example |
|
||||
|-----------|-------------------|--------|
|
||||
| `--base-url` | `providers.<name>.baseUrl` | `https://api.anthropic.com` |
|
||||
| `--llm-type` | `providers.<name>.api` | `anthropic-messages` / `openai-completions` |
|
||||
| `--vlm-type` | `providers.<name>.api` | `anthropic-messages` / `openai-completions` |
|
||||
| `--model` | `providers.<name>.models[].id` | `claude-sonnet-4-6` |
|
||||
| `--api-key` | `providers.<name>.apiKey` or env var | `sk-cp-...` |
|
||||
|
||||
Note: `--llm-type` and `--vlm-type` share the same `providers.<name>.api` field and are used by LLM and VLM tools respectively.
|
||||
|
||||
Mapping between `provider.api` and interface type:
|
||||
|
||||
| api Value | Corresponding `--llm-type` / `--vlm-type` | Endpoint Path |
|
||||
|--------|----------------------------------|---------------|
|
||||
| `anthropic-messages` | `anthropic-messages` | `/v1/messages` |
|
||||
| `openai-completions` | `openai-completions` | `/v1/chat/completions` |
|
||||
| `openai-responses` | (future extension) | `/responses` |
|
||||
|
||||
## Mapping Between base-url and Interface Type
|
||||
|
||||
Different API types have different requirements for base-url format:
|
||||
|
||||
| Type | `--llm-type` / `--vlm-type` | Recommended base-url | Code Appended Path | Final URL Example |
|
||||
|------|------------------------------|---------------|--------------|---------------|
|
||||
| LLM | `openai-completions` | `https://token.sensenova.cn/v1` | `/chat/completions` | `https://token.sensenova.cn/v1/chat/completions` |
|
||||
| LLM | `anthropic-messages` | `https://api.anthropic.com/v1` | `/messages` | `https://api.anthropic.com/v1/messages` |
|
||||
| VLM | `openai-completions` | `https://token.sensenova.cn/v1` | `/chat/completions` | `https://token.sensenova.cn/v1/chat/completions` |
|
||||
| VLM | `anthropic-messages` | `https://api.anthropic.com/v1` | `/messages` | `https://api.anthropic.com/v1/messages` |
|
||||
|
||||
**Note**:
|
||||
|
||||
- Recommended chat base URLs include the provider API version path, for example `/v1`.
|
||||
- For compatibility, if the configured chat base URL has no path, the runner appends `/v1/chat/completions` or `/v1/messages`.
|
||||
- If the configured chat base URL already has a path such as `/v1`, the runner appends only `/chat/completions` or `/messages`.
|
||||
- Some providers use versioned paths other than `/v1`, such as Gemini's `/v1beta/openai`.
|
||||
|
||||
## Output Format
|
||||
|
||||
All tools support two output formats:
|
||||
|
||||
- `--output-format text` (default): outputs plain text result
|
||||
- `--output-format json`: outputs JSON, including `status` and `elapsed_seconds` (runtime in seconds, rounded to 2 decimals)
|
||||
|
||||
JSON output for `sn-image-recognize` and `sn-text-optimize` also includes `model`, `base_url`, and `interface_type` to verify the effective runtime configuration:
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "ok",
|
||||
"result": "...",
|
||||
"model": "sensenova-6.7-flash-lite",
|
||||
"base_url": "https://token.sensenova.cn/v1",
|
||||
"interface_type": "openai-completions",
|
||||
"elapsed_seconds": 1.23
|
||||
}
|
||||
```
|
||||
|
||||
On failure:
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "failed",
|
||||
"error": "error message",
|
||||
"elapsed_seconds": 0.05
|
||||
}
|
||||
```
|
||||
|
||||
## Input/Output Specification
|
||||
|
||||
See `references/api_spec.md` for details.
|
||||
|
||||
|
||||
---
|
||||
|
||||
> ⚠️ **厂商绑定**:此 skill 绑定 SenseNova 专用 API(图像生成、识别、文本优化),无法替换为其他模型。如果 SenseNova 不再免费或无 plan,此 skill 将不可用。
|
||||
>
|
||||
**依赖**: SN_API_KEY (SenseNova 平台 API key), Pillow (`~/.hermes/hermes-agent/venv/bin/pip3 install Pillow`)
|
||||
**配置参考**: `references/sensenova-config.md`
|
||||
**可替代方案**: comfyui (本地图像生成) + mmx vision (图像理解)
|
||||
|
||||
## Pitfalls
|
||||
|
||||
### Pillow 依赖未安装
|
||||
**Symptom**: `ModuleNotFoundError: No module named 'PIL'`
|
||||
**Root cause**: sn-image-generate 使用 PIL 处理图像,但系统 Python 或 venv 中未安装 Pillow。
|
||||
**Fix**: `pip install Pillow`(如果使用 hermes-agent 的 venv,需要用 `~/.hermes/hermes-agent/venv/bin/pip3 install Pillow`)。
|
||||
**Note**: hermes-agent 的 Python 路径是 `~/.hermes/hermes-agent/venv/bin/python3`,不是系统 python3。
|
||||
|
||||
### API 限流策略
|
||||
SenseNova 的限流是按 **5 小时窗口**计算,不是按分钟:
|
||||
- sensenova-6.7-flash-lite: 1500 次/5小时
|
||||
- sensenova-u1-fast: 1500 次/5小时
|
||||
- deepseek-v4-flash: 150 次/5小时(最严)
|
||||
|
||||
### Base URL
|
||||
所有 SenseNova 模型统一使用: `https://token.sensenova.cn/v1`
|
||||
|
||||
Reference in New Issue
Block a user