# Skill Description Optimization for Routing Based on [SkillRouter (arXiv:2603.22455)](https://arxiv.org/abs/2603.22455) methodology. ## Core Finding In large, overlapping skill pools, **full skill text is the critical routing signal** — not just name + metadata. Hiding skill body causes 31-44pp drop in routing accuracy at 80K scale. For Hermes at ~120 skills, the impact is smaller but still meaningful for overlapping clusters. ## Description Writing Rules ### 1. Trigger Words (Required) Every description must include explicit trigger words — the exact phrases users would say. ``` Bad: "Generates professional infographics." Good: "生成信息图。触发词:infographic、信息图、可视化、visual summary。" ``` ### 2. Negative Boundaries ("Don't use for") For skills in overlapping domains, specify what they DON'T cover. ``` Good: "触发词:学术论文、文献调研。不用于:通用搜索(用 web_search)。" ``` ### 3. No Competitive Recommendations Never recommend skill B inside skill A's description. ``` Bad: "For multi-source search, prefer sn-search-academic over arxiv." Good: Each skill describes itself independently. ``` ### 4. No Implementation Details Use user-facing concepts, not internal names. ``` Bad: "Requires SN_API_KEY via sn-image-base's sn_agent_runner.py." Good: "Requires SenseNova API." ``` ### 5. Pipeline Relationships (for sub-skills) If a skill is part of a pipeline, label its stage. ``` Good: "[sn-deep-research 子阶段] 按 plan.json 执行单维度搜索。" Good: "[sn-deep-research 最终阶段] 基于 synthesis.md 写最终报告。" ``` ### 6. Differentiation Over Function Listing When multiple skills serve similar goals, describe what makes THIS one distinct. ``` Bad: "生成信息图" (both sn-infographic and baoyu-infographic say this) Good: sn-infographic: "87 种布局,支持多轮自动评审优化。" baoyu-infographic: "21 种布局,有用户交互确认流程。" ``` ## Overlap Detection "Overlap" = same user intent AND same implementation approach. Two skills are **complementary** (keep both) when: - Same output type, different tech stack (Python vs Node.js) - Same domain, different complexity level (lightweight vs full-featured) - Same tool, different workflow (quick vs QA-heavy) Examples of complementary pairs that should NOT be merged: - `pptx-generator` (python-pptx) + `powerpoint` (pptxgenjs) - `WeChat-article-reader` (Python/Markdown) + `wechat-article-extractor` (Node.js/JSON) ## Usage Measurement To find which skills are actually used: 1. Search `~/.hermes/state.db` → `messages` table for `skill_view` tool results 2. Search `~/.hermes/sessions/*.jsonl` for `skill_view` function calls 3. `.json` files in sessions/ are request dumps — no message history 4. Auto-loaded skills (via system prompt matching) don't generate `skill_view` calls — counts are lower bounds ```sql -- Find skill_view results in SQLite SELECT content FROM messages WHERE role = 'tool' AND content LIKE '%"skill_dir"%' AND content LIKE '%"success": true%'; ``` ## Pool Size vs Description Quality At Hermes's current scale (~120 skills): - **Reducing pool size** (removing unused skills) has the highest impact - **Improving descriptions** helps for the remaining overlapping clusters - **Code-level changes** (prompt restructuring) are NOT worth the complexity The optimal strategy: delete genuinely unused skills → fix descriptions for overlapping pairs → stop.