Files
agent-skills/autonomous-ai-agents/hermes-agent/references/skill-management-pitfalls.md
Hermes Agent ccc63d1e70 first commit
2026-05-10 13:52:46 +08:00

2.7 KiB
Raw Permalink Blame History

Skill Management Pitfalls

Learned from attempting to optimize the skill library based on SkillRouter paper findings.

Pitfall 1: "Same Output" ≠ "Functionally Overlapping"

Wrong: Deleted pptx-generator (python-pptx) because powerpoint (pptxgenjs) also makes .pptx files. Right: Different tech stacks = different fallback options. python-pptx is pure Python, pptxgenjs needs Node.js. Keep both.

Rule: Two skills overlap only when they use the same tools AND serve the same user intent. Same output format is not enough.

Pitfall 2: Don't Cross-Reference in Descriptions

Wrong: In arxiv's description: "需要多源学术搜索优先用 sn-search-academic" Right: Each skill describes itself only. No competitive recommendations.

Why: Creates circular dependencies. If skill A recommends B, and B recommends A, the LLM loops.

Pitfall 3: Don't Expose Implementation Details

Wrong: In sn-infographic description: "需要 SN_API_KEY" Right: "需要 SenseNova API"

Rule: Descriptions should express user-facing capabilities, not internal tool/API names.

Pitfall 4: Check Hard Dependencies Before Deleting

Wrong: Marked sn-research-planning for deletion because it was "never called." Right: sn-deep-research calls it via skill_view("sn-research-planning") at runtime. Deleting breaks the pipeline.

How to check:

# Search all SKILL.md files for references to the target skill name
# Only HARD dependencies count: skill_view("target-name") or "读取 target-name"
# "Related skills" mentions are SOFT and don't block deletion

Pitfall 5: "Never skill_view'd" ≠ "Unused"

Skills can be auto-loaded via the system prompt's "MUST load" instruction without explicit skill_view() calls. Session data only shows explicit tool calls.

Better metric: Check if the skill is referenced as a runtime dependency by other skills.

Pitfall 6: Don't Batch Recommendations Without Verification

Wrong: Generated all 87 recommendations at once, sent to user, then had to fix multiple errors. Right: Verify each category before sending. Check dependencies. Then send once.

User feedback: "你这建议就不能确认好之后再发给我吗" (Can you verify before sending?)

Description Quality Formula

Good skill description = What it does + Trigger words + Negative boundary

Example:

"深度调研全流程编排器(入口 skill。自动完成规划→分维度取证→综合→成稿。
触发词:深度研究/调研/全面研究/调研报告/deep research。
不用于:单点事实问答、一句话摘要。"
  • What: 深度调研全流程编排器
  • Triggers: 深度研究/调研/全面研究
  • Boundary: 不用于单点事实问答