# SkillRouter: Key Takeaways for LLM Agent Skill Routing Paper: https://arxiv.org/abs/2603.22455 (Apr 2026, Alibaba) Code: https://github.com/zhengyanzhao1997/SkillRouter Models: https://huggingface.co/pipizhao/SkillRouter-Embedding-0.6B, SkillRouter-Reranker-0.6B ## Core Finding At ~80K skill scale with heavy overlap, exposing only name+description causes 31-44pp Hit@1 drop vs full skill text. Full body is THE critical routing signal, not metadata. ## Architecture (1.2B total) ``` query → SR-Emb-0.6B (bi-encoder) → top-20 from 80K → SR-Rank-0.6B (cross-encoder) → final rank ``` ## Training Recipe ### Data: 37,979 synthetic (query, skill) pairs - Skills sampled with category stratification from ~80K pool - Queries generated by GPT-4o-mini; prompt forbids revealing skill name - Benchmark skills excluded from training ### Hard Negative Mining (10 per query) - 4 semantic neighbors (embedding NN) - 3 BM25 lexical matches - 2 same-category distractors - 1 random cross-category ### False Negative Filtering (critical — +4.0pp) Three-layer filter removes ~10% of mined negatives: 1. Name dedup (24,879 pairs) 2. Body trigram Jaccard > 0.6 (13,860 pairs) 3. Embedding cosine > 0.92 (326 pairs) ### Loss: Listwise CE >> Pointwise BCE - Pointwise: 43.3% Hit@1 (fails because homogeneous candidates get similar scores) - Listwise: 74.0% Hit@1 (compares candidates against each other) - This is THE key training choice for reranker ### Hyperparams - Encoder: InfoNCE τ=0.05, LR 2e-5, batch 8, GA 4, 1 epoch, max 2048 tokens - Reranker: Listwise CE τ=1.0, LR 1e-5, 1 epoch, max 4096 tokens - Both: single GPU, Qwen3-Emb/Rank-0.6B base ### Input Templates - Encoder query: `Instruct: ...\nQuery: ` (1500 char cap) - Encoder skill: ` | | ` (no instruction prefix) - Reranker: `: ...\n: ...\n: | | ` ## Results | System | Params | Avg Hit@1 | Speed | |--------|--------|-----------|-------| | Qwen3-Emb-8B + Qwen3-Rank-8B | 16B | 68.0% | 0.32 QPS | | SR-Emb-0.6B + SR-Rank-0.6B | 1.2B | 74.0% | 1.83 QPS | | SR-Emb-8B + SR-Rank-8B | 16B | 76.0% | - | ## Relevance to Hermes - Hermes currently exposes ~100 skills via name+desc in system prompt, full SKILL.md on demand - At current scale this works; at 1000+ skills, a routing layer becomes necessary - False-negative filtering concept applies to Hermes skill deduplication - Listwise reranking matters when many skills look similar (e.g., multiple research skills)