63 lines
2.5 KiB
Markdown
63 lines
2.5 KiB
Markdown
# SkillRouter: Key Takeaways for LLM Agent Skill Routing
|
|
|
|
Paper: https://arxiv.org/abs/2603.22455 (Apr 2026, Alibaba)
|
|
Code: https://github.com/zhengyanzhao1997/SkillRouter
|
|
Models: https://huggingface.co/pipizhao/SkillRouter-Embedding-0.6B, SkillRouter-Reranker-0.6B
|
|
|
|
## Core Finding
|
|
|
|
At ~80K skill scale with heavy overlap, exposing only name+description causes 31-44pp Hit@1 drop vs full skill text. Full body is THE critical routing signal, not metadata.
|
|
|
|
## Architecture (1.2B total)
|
|
|
|
```
|
|
query → SR-Emb-0.6B (bi-encoder) → top-20 from 80K → SR-Rank-0.6B (cross-encoder) → final rank
|
|
```
|
|
|
|
## Training Recipe
|
|
|
|
### Data: 37,979 synthetic (query, skill) pairs
|
|
- Skills sampled with category stratification from ~80K pool
|
|
- Queries generated by GPT-4o-mini; prompt forbids revealing skill name
|
|
- Benchmark skills excluded from training
|
|
|
|
### Hard Negative Mining (10 per query)
|
|
- 4 semantic neighbors (embedding NN)
|
|
- 3 BM25 lexical matches
|
|
- 2 same-category distractors
|
|
- 1 random cross-category
|
|
|
|
### False Negative Filtering (critical — +4.0pp)
|
|
Three-layer filter removes ~10% of mined negatives:
|
|
1. Name dedup (24,879 pairs)
|
|
2. Body trigram Jaccard > 0.6 (13,860 pairs)
|
|
3. Embedding cosine > 0.92 (326 pairs)
|
|
|
|
### Loss: Listwise CE >> Pointwise BCE
|
|
- Pointwise: 43.3% Hit@1 (fails because homogeneous candidates get similar scores)
|
|
- Listwise: 74.0% Hit@1 (compares candidates against each other)
|
|
- This is THE key training choice for reranker
|
|
|
|
### Hyperparams
|
|
- Encoder: InfoNCE τ=0.05, LR 2e-5, batch 8, GA 4, 1 epoch, max 2048 tokens
|
|
- Reranker: Listwise CE τ=1.0, LR 1e-5, 1 epoch, max 4096 tokens
|
|
- Both: single GPU, Qwen3-Emb/Rank-0.6B base
|
|
|
|
### Input Templates
|
|
- Encoder query: `Instruct: ...\nQuery: <text>` (1500 char cap)
|
|
- Encoder skill: `<name> | <desc:300> | <body:2500>` (no instruction prefix)
|
|
- Reranker: `<Instruct>: ...\n<Query>: ...\n<Document>: <name> | <desc:500> | <body:2000>`
|
|
|
|
## Results
|
|
| System | Params | Avg Hit@1 | Speed |
|
|
|--------|--------|-----------|-------|
|
|
| Qwen3-Emb-8B + Qwen3-Rank-8B | 16B | 68.0% | 0.32 QPS |
|
|
| SR-Emb-0.6B + SR-Rank-0.6B | 1.2B | 74.0% | 1.83 QPS |
|
|
| SR-Emb-8B + SR-Rank-8B | 16B | 76.0% | - |
|
|
|
|
## Relevance to Hermes
|
|
- Hermes currently exposes ~100 skills via name+desc in system prompt, full SKILL.md on demand
|
|
- At current scale this works; at 1000+ skills, a routing layer becomes necessary
|
|
- False-negative filtering concept applies to Hermes skill deduplication
|
|
- Listwise reranking matters when many skills look similar (e.g., multiple research skills)
|