Files
Hermes Agent ccc63d1e70 first commit
2026-05-10 13:52:46 +08:00

18 KiB
Raw Permalink Blame History

name, description, metadata, triggers
name description metadata triggers
sn-infographic 生成专业信息图。87 种布局 x 66 种风格,支持多轮自动评审优化。 需要 SenseNova API。触发词infographic、信息图、可视化、visual summary。 适合追求自动化质量把控的场景。无 SenseNova API 时用 baoyu-infographic。
project tier category priority user_visible
SenseNova-Skills 1 scene 9 true
infographic
information graphic
infographics generation
visual summary
data visualization
visual explanation
diagram
生成信息图
信息图生成
生成 infographic
信息图表
图表生成
数据可视化
图解

sn-infographic

Info graphic generation scene skill (tier 1), relying on the sn-image-generate, sn-image-recognize, and sn-text-optimize tools provided by sn-image-base (tier 0).

Features:

  • Evaluation of prompt quality (auto mode)
  • Prompt expansion (force/auto mode)
  • Multiple rounds of image generation and VLM review
  • Output the best result based on quality ranking

Input Specification

Parameter Type Default Value Description
user_prompt string Required User original request
max_rounds int 1 Maximum number of generation rounds
output_mode string friendly Output mode: friendly / verbose
prompts_expand_mode string auto expand strategy: auto / force / disable

API Configuration

All API calls in this skill are executed through the sn_agent_runner.py of the sn-image-base skill, with authentication parameters using default values (CLI > environment variables > built-in defaults),无需显式传入。

Call Type Tool Authentication Parameters Description
LLM sn-text-optimize (evaluation/expansion) Default reads SN_TEXT_API_KEY -> SN_CHAT_API_KEY -> SN_API_KEY Built-in default points to Sensenova internal network service
VLM sn-image-recognize (image review) Default reads SN_VISION_API_KEY -> SN_CHAT_API_KEY -> SN_API_KEY Built-in default points to Sensenova internal network service
Image Generation sn-image-generate Default reads SN_IMAGE_GEN_API_KEY -> SN_API_KEY; SN_IMAGE_GEN_API_KEY is only needed for image-specific override Default uses image generation configuration of sn-image-base

When encountering MissingApiKeyError or needing to specify a model: pass explicitly via CLI parameters, parameter reference $SN_IMAGE_BASE/references/api_spec.md.

$SN_IMAGE_BASE path explanation: $SN_IMAGE_BASE is the installation directory of the sn-image-base skill (SKILL.md exists). The agent can locate this path by skill name sn-image-base in the list of installed skills.

Architecture: Main Agent + Worker Agent

This skill uses a two-tier agent architecture:

Role Responsibility
Main Agent Receive user request, normalize parameters, send preflight, start Worker, collect results, send text and images to user
Worker Agent Execute orchestration loop (expand → multiple rounds of generation + review → sort), return structured JSON

Responsibility Boundaries:

  • Worker Agent does not send any messages to the user directly, only returns structured JSON
  • Main Agent is responsible for sending all user-visible messages
  • Worker Agent's last message must be and only be the JSON string defined in the Return Contract
  • Worker Agent's internal VLM calls always execute directly, without spawning subagents

Workflow

Main Agent Workflow

  1. Extract user_prompt, max_rounds (default 1), output_mode (default friendly), and prompts_expand_mode (default auto) from user request
  2. Send uniform preflight message: "Using sn-infographic skill to generate infographic, please wait..."
  3. Start Worker Agent (Sub-Agent), passing in complete parameters and working directory
  4. When Worker Agent returns status=ok and need_main_agent_send=true:
    • max_rounds = 1: Send a one-sentence description of the image content, then send the rank=1 single image
    • max_rounds > 1, friendly mode: Generate a one-sentence natural language description based on result and violations, send the evaluation text, then send the rank=1 single image
    • max_rounds > 1, verbose mode: Send complete text summary message, then send all images in rank order to the user
  5. If Worker Agent returns status=error, report the real error field content to the user

Worker Agent Workflow

Worker Agent receives user_prompt, max_rounds, output_mode, prompts_expand_mode, and the working directory of this skill (SN_IMAGE_INFOG).

Step 0 — Initialization

  1. Generate task_id (using timestamp, format YYYYMMDD_HHMMSS)
  2. Create a uniform temporary directory: /tmp/openclaw/sn-infographic/<task_id>/ as TEMP_DIR
  3. Initialize an empty rounds list
  4. Infer aspect_ratio (default 16:9) and image_size (default 2k) from user_prompt based on the rules in $SKILL_DIR/references/runtime-parameters.md

Step 1 — prompts_expand_mode Processing

disable mode:

  • Skip expand, directly use user_prompt as expanded_prompt

  • Assign variable and write to temporary directory:

    EXPANDED_PROMPT="$USER_PROMPT"
    echo "$EXPANDED_PROMPT" > "$TEMP_DIR/expanded-prompt.txt"
    
  • Record prompts_expand_skipped = true

force mode:

  • Directly execute Step 2

auto mode:

  1. Call sn-text-optimize for evaluation

  2. Parse JSON, extract required_results and optional_results

  3. Determine logic:

    • required_pass: All answer in required_results are "yes"
    • optional_pass: The number of answer="yes" in optional_results / total ≥ 0.6
    • should_expand = not (required_pass and optional_pass)
  4. If JSON parsing fails, default should_expand = true (conservative strategy)

  5. If should_expand = false: Skip Step 2, assign variable and write to temporary directory, record prompts_expand_skipped = true:

    EXPANDED_PROMPT="$USER_PROMPT"
    echo "$EXPANDED_PROMPT" > "$TEMP_DIR/expanded-prompt.txt"
    
  6. If should_expand = true: Execute Step 2

Evaluation Call (using sn-image-base's sn-text-optimize tool):

python "$SN_IMAGE_BASE/scripts/sn_agent_runner.py" sn-text-optimize \
  --system-prompt-path "$SKILL_DIR/references/evaluation-standard.md" \
  --user-prompt "$USER_PROMPT" \
  --output-format json

Step 2 — Content Analysis + Layout & Style Selection + Prompt Expansion

2.0 Content Analysis (using sn-image-base's sn-text-optimize tool):

ANALYSIS=$(python "$SN_IMAGE_BASE/scripts/sn_agent_runner.py" sn-text-optimize \
  --system-prompt-path "$SKILL_DIR/references/analysis-framework.md" \
  --user-prompt "$USER_PROMPT" \
  --output-format json)

Save analysis result stdout to analysis.json in temporary directory $TEMP_DIR/analysis.json:

echo "$ANALYSIS" > "$TEMP_DIR/analysis.json"

2.1 Layout & Style Selection

  1. Read analysis result from temporary directory $TEMP_DIR/analysis.json;
ANALYSIS=$(cat "$TEMP_DIR/analysis.json")
  1. Based on data_type, tone, audience, select layout and style based on the rules in $SKILL_DIR/references/layout-style-selection.md;
  2. Read layout/style definition files:
LAYOUT_DEF=$(cat "$SKILL_DIR/references/layouts/<layout>.md")
STYLE_DEF=$(cat "$SKILL_DIR/references/styles/<style>.md")

If file does not exist, fallback to hub-spoke + corporate-memphis.

  1. Save selection result to temporary directory: $TEMP_DIR/layout-style.json;

Format of layout-style.json:

{
  "layout": "<layout>",
  "style": "<style>"
}

2.2 Structured Content Generation

Read analysis result and structured content template, convert user_prompt into a design-ready structured content based on the template rules:

ANALYSIS=$(cat "$TEMP_DIR/analysis.json")
LAYOUT_STYLE=$(cat "$TEMP_DIR/layout-style.json")
STRUCTURED_CONTENT_TEMPLATE=$(cat "$SKILL_DIR/references/structured-content-template.md")

Follow the three phases defined in the template (High-Level Outline → Section Development → Data Integrity Check), combine the learning objectives, visual opportunities, and key data in analysis.json, generate structured content, and save it to the temporary directory:

cat > "$TEMP_DIR/structured-content.md" << 'EOF'
<Content generated based on structured-content-template.md format>
EOF

Rules: All data must be preserved exactly. Do not rewrite. Do not add information that is not in the source.

2.3 Prompt Expansion (using sn-image-base's sn-text-optimize tool):

Read structured content and layout/style selection from temporary directory, dynamically concatenate system prompt, and write to temporary file:

STRUCTURED_CONTENT=$(cat "$TEMP_DIR/structured-content.md")
LAYOUT_STYLE=$(cat "$TEMP_DIR/layout-style.json")
LAYOUT=$(echo "$LAYOUT_STYLE" | jq -r '.layout')
STYLE=$(echo "$LAYOUT_STYLE" | jq -r '.style')
LAYOUT_DEF=$(cat "$SKILL_DIR/references/layouts/${LAYOUT}.md")
STYLE_DEF=$(cat "$SKILL_DIR/references/styles/${STYLE}.md")

cat > "$TEMP_DIR/expand-system-prompt.md" << EOF
$(cat "$SKILL_DIR/references/prompts-expand-system.md")

---

## Selected Layout: $LAYOUT

$LAYOUT_DEF

---

## Selected Style: $STYLE

$STYLE_DEF

---

## Output Template Reference

$(cat "$SKILL_DIR/references/base-prompt.md")
EOF

Use the content of structured-content.md as user-prompt, read system prompt from temporary file and call sn-text-optimize:

python "$SN_IMAGE_BASE/scripts/sn_agent_runner.py" sn-text-optimize \
  --system-prompt-path "$TEMP_DIR/expand-system-prompt.md" \
  --user-prompt "$STRUCTURED_CONTENT" \
  --output-format json

Parse JSON stdout, extract result field as expanded_prompt, and write to temporary directory:

echo "$EXPANDED_PROMPT" > "$TEMP_DIR/expanded-prompt.txt"

If parsing fails or truncation is suspected (the returned content is incomplete), notify the user and terminate the workflow.

Step 3 — Image Generation Loop

Execute round from 1 to max_rounds sequentially:

Generate Image (using sn-image-base's sn-image-generate tool):

python "$SN_IMAGE_BASE/scripts/sn_agent_runner.py" sn-image-generate \
  --prompt "$EXPANDED_PROMPT" \
  --image-size "$IMAGE_SIZE" \
  --aspect-ratio "$ASPECT_RATIO" \
  --save-path "$TEMP_DIR/round_<N>.png" \
  -o json

Review Image (only executed when max_rounds > 1):

VLM configuration requirements:

  • When max_rounds > 1, call VLM for review
  • Select VLM model from OpenClaw configuration as parameter for image recognition
  • If no suitable VLM model exists in OpenClaw configuration:
    • Notify user that current parameter combination cannot be executed
    • Suggest adding VLM configuration or changing max_rounds to 1 to avoid VLM calls
  • If VLM call times out or fails: do not fallback, report the real error directly
python "$SN_IMAGE_BASE/scripts/sn_agent_runner.py" sn-image-recognize \
  --system-prompt-path "$SN_IMAGE_INFOG/references/prompts-critic-system.md" \
  --user-prompt "Evaluate the diagram in the image against the rules. Output your assessment." \
  --images "$TEMP_DIR/round_<N>.png" \
  --output-format json

System prompt comes from references/prompts-critic-system.md, user prompt is provided directly.

Save Round Result

{
  "round": 1,
  "image": "$TEMP_DIR/round_1.png",
  "result": "PASS|FAIL",
  "violations_count": 0,
  "violations": [],
  "reasoning": "<Reasoning process, empty string when max_rounds=1>",
  "timing": {
    "image_generation": { "elapsed_seconds": 12.34, "model": "sn_image_model" },
    "vlm_review": { "elapsed_seconds": 5.67, "model": "sensenova-6.7-flash-lite" }
  }
}

Note: elapsed_seconds is read from the --output-format json return of each CLI call; image_generation.model is fixed to the hardcoded placeholder "sn_image_model" (sn-image-generate does not return the model field); vlm_review.model is read from the JSON return of sn-image-recognize. timing.vlm_review is omitted when max_rounds=1.

Early Termination Check (only executed when max_rounds > 1):

  • If result=PASS, immediately exit the loop, do not continue generating
  • If result=FAIL, continue to the next round (if there are remaining rounds)

Step 4 — Image Quality Ranking

Sort images by violations_count ascending + round ascending, return structured JSON to Main Agent.

Return Contract

After Worker Agent completes, its last message must be and only be the following JSON string (bare JSON, no code fences, no preceding or trailing text).

Normal Flow:

{
  "status": "ok",
  "need_main_agent_send": true,
  "output_mode": "friendly|verbose",
  "expanded_prompt": "<always contains when output_mode=verbose; value is original user_prompt when prompts_expand_skipped=true, otherwise is expanded result>",
  "prompts_expand_skipped": true,
  "early_terminated": true,
  "timing": {
    "total_elapsed_seconds": 35.12,
    "prompt_detection": { "elapsed_seconds": 2.11, "model": "sensenova-6.7-flash-lite" },
    "content_analysis": { "elapsed_seconds": 3.22, "model": "sensenova-6.7-flash-lite" },
    "prompt_expand": { "elapsed_seconds": 8.45, "model": "sensenova-6.7-flash-lite" }
  },
  "rounds": [
    {
      "round": 1,
      "image": "$TEMP_DIR/round_1.png",
      "result": "PASS|FAIL",
      "violations_count": 0,
      "violations": [],
      "reasoning": "<Reasoning process, empty string when max_rounds=1>",
      "timing": {
        "image_generation": { "elapsed_seconds": 12.34, "model": "sn_image_model" },
        "vlm_review": { "elapsed_seconds": 5.67, "model": "sensenova-6.7-flash-lite" }
      }
    }
  ]
}

Error Flow:

{
  "status": "error",
  "error": "<Actual error information>"
}

Rules:

  • status=ok must contain need_main_agent_send: true
  • expanded_prompt must contain when output_mode=verbose; value is original user_prompt when prompts_expand_skipped=true
  • prompts_expand_skipped must contain when expand is not executed (value is true), covering two cases: prompts_expand_mode=disable and prompts_expand_mode=auto and evaluation passes and skip expand
  • early_terminated must contain when early termination (value is true), omitted when normal execution completes
  • violations is an array of strings, from review results
  • reasoning is an empty string when max_rounds=1
  • Top-level timing contains:
    • total_elapsed_seconds: Worker Agent's wall time from Step 0 to returning JSON, calculated by Worker Agent itself
    • prompt_detection: Step 1 evaluation call, containing elapsed_seconds and model (read from sn-text-optimize JSON return); omitted when prompts_expand_mode=disable
    • content_analysis: Step 2.0 content analysis call, containing elapsed_seconds and model (read from sn-text-optimize JSON return); omitted when expand is skipped
    • prompt_expand: Step 2.3 prompt expansion call, containing elapsed_seconds and model (read from sn-text-optimize JSON return); omitted when expand is skipped
  • rounds[].timing.image_generation.model is fixed to the hardcoded placeholder "sn_image_model"
  • rounds[].timing.vlm_review is omitted when max_rounds=1

Output Format

friendly mode (default)

Text Summary:

  • when max_rounds = 1: Generate a one-sentence description of the image content based on expanded_prompt,不超过50字
  • when max_rounds > 1: Generate a one-sentence description of the image content based on result and violations,不超过50字
    • result=PASS: Describe in a positive tone
    • result=FAIL (1-2 violations): Gently point out specific issues
    • result=FAIL (3 or more): Objectively summarize the main issues

Image: rank=1 best single image

verbose mode

Quality ranking result (high -> low)
---
Expanded prompt: [expanded | not expanded, using original prompt]
<expanded_prompt>
---
#1 round=<n> result=<PASS|FAIL> violations=<n> [early terminated]
#2 round=<n> result=<PASS|FAIL> violations=<n>
...
---
Time statistics: Total <total>s | Prompt evaluation <t>s | Content analysis <t>s | Prompt expansion <t>s | Image generation <t>s×<n> rounds | VLM review <t>s×<n> rounds
---
Images (sent in rank order)

Call Relationship

  • Bottom-level dependency: sn-image-basesn-image-generate, sn-image-recognize, sn-text-optimize

References

  • references/analysis-framework.md - Analysis methodology
  • references/base-prompt.md - Prompt template
  • references/evaluation-standard.md - Evaluation standard
  • references/layout-style-selection.md - Layout and style selection rules
  • references/prompts-expand-system.md - Prompt expansion system prompt
  • references/prompts-critic-system.md - Prompt critic system prompt
  • references/runtime-parameters.md - Runtime parameters
  • references/structured-content-template.md - Structured content template
  • references/layouts/<layout>.md - Layout definitions (87 layouts)
  • references/styles/<style>.md - Style definitions (66 styles)

⚠️ 厂商绑定:此 skill 依赖 sn-image-baseSenseNova 图像生成 API无法替换为其他模型。如果 SenseNova 不再免费或无 plan此 skill 将不可用。

依赖: SN_API_KEY (SenseNova 平台 API key), sn-image-base, Pillow 配置参考: references/sensenova-config.md(见 sn-image-base 可替代方案: comfyui (本地图像生成,但无信息图模板和 VLM 评审)