Files
agent-skills/software-development/hermes-agent-skill-authoring/references/skill-routing-optimization.md
Hermes Agent ccc63d1e70 first commit
2026-05-10 13:52:46 +08:00

3.4 KiB
Raw Blame History

Skill Description Optimization for Routing

Based on SkillRouter (arXiv:2603.22455) methodology.

Core Finding

In large, overlapping skill pools, full skill text is the critical routing signal — not just name + metadata. Hiding skill body causes 31-44pp drop in routing accuracy at 80K scale. For Hermes at ~120 skills, the impact is smaller but still meaningful for overlapping clusters.

Description Writing Rules

1. Trigger Words (Required)

Every description must include explicit trigger words — the exact phrases users would say.

Bad:  "Generates professional infographics."
Good: "生成信息图。触发词infographic、信息图、可视化、visual summary。"

2. Negative Boundaries ("Don't use for")

For skills in overlapping domains, specify what they DON'T cover.

Good: "触发词:学术论文、文献调研。不用于:通用搜索(用 web_search。"

3. No Competitive Recommendations

Never recommend skill B inside skill A's description.

Bad:  "For multi-source search, prefer sn-search-academic over arxiv."
Good: Each skill describes itself independently.

4. No Implementation Details

Use user-facing concepts, not internal names.

Bad:  "Requires SN_API_KEY via sn-image-base's sn_agent_runner.py."
Good: "Requires SenseNova API."

5. Pipeline Relationships (for sub-skills)

If a skill is part of a pipeline, label its stage.

Good: "[sn-deep-research 子阶段] 按 plan.json 执行单维度搜索。"
Good: "[sn-deep-research 最终阶段] 基于 synthesis.md 写最终报告。"

6. Differentiation Over Function Listing

When multiple skills serve similar goals, describe what makes THIS one distinct.

Bad:  "生成信息图" (both sn-infographic and baoyu-infographic say this)
Good: sn-infographic: "87 种布局,支持多轮自动评审优化。"
      baoyu-infographic: "21 种布局,有用户交互确认流程。"

Overlap Detection

"Overlap" = same user intent AND same implementation approach. Two skills are complementary (keep both) when:

  • Same output type, different tech stack (Python vs Node.js)
  • Same domain, different complexity level (lightweight vs full-featured)
  • Same tool, different workflow (quick vs QA-heavy)

Examples of complementary pairs that should NOT be merged:

  • pptx-generator (python-pptx) + powerpoint (pptxgenjs)
  • WeChat-article-reader (Python/Markdown) + wechat-article-extractor (Node.js/JSON)

Usage Measurement

To find which skills are actually used:

  1. Search ~/.hermes/state.dbmessages table for skill_view tool results
  2. Search ~/.hermes/sessions/*.jsonl for skill_view function calls
  3. .json files in sessions/ are request dumps — no message history
  4. Auto-loaded skills (via system prompt matching) don't generate skill_view calls — counts are lower bounds
-- Find skill_view results in SQLite
SELECT content FROM messages 
WHERE role = 'tool' 
AND content LIKE '%"skill_dir"%'
AND content LIKE '%"success": true%';

Pool Size vs Description Quality

At Hermes's current scale (~120 skills):

  • Reducing pool size (removing unused skills) has the highest impact
  • Improving descriptions helps for the remaining overlapping clusters
  • Code-level changes (prompt restructuring) are NOT worth the complexity

The optimal strategy: delete genuinely unused skills → fix descriptions for overlapping pairs → stop.