fix: anchor stopwords - remove generic question patterns causing cross-topic contamination
- Add ANCHOR_STOPWORDS set in anchor.py (真正通用的疑问pattern) - Filter Chinese n-grams against stopwords in extract() - Update sparse.py content_words extraction to use stopword-filtered query - Diagnosis: 'Git rebase vs merge' query now correctly excludes Redis/asyncio blocks - Phase1 results: Full CGK 42.6 tokens avg, 0% contamination (vs Last-5 67.6 tokens, 100%) - Phase2 ablation: Gate-only accounts for most of the benefit - Phase3 sensitivity: OVERLAP/NEW_RATIO thresholds insensitive on clean data; RECENT_WINDOW is the primary token budget control Known honest limitations: - Test set is clean 4-topic synthetic data (no real dirty dialogue) - No strong baselines (BM25 ablation incomplete) - No answer-level evaluation (only retrieval blocks measured) - No parameter sensitivity on noisy real-world data - Zero contamination on 5 queries is not generalizable
This commit is contained in:
157
experiments/phase1_baseline_results.json
Normal file
157
experiments/phase1_baseline_results.json
Normal file
@@ -0,0 +1,157 @@
|
||||
{
|
||||
"Last-3": {
|
||||
"avg_tokens": 43.6,
|
||||
"contamination_rate": 100.0,
|
||||
"raw": [
|
||||
{
|
||||
"label": "问PG",
|
||||
"pt": 42,
|
||||
"cont": true
|
||||
},
|
||||
{
|
||||
"label": "问Git",
|
||||
"pt": 44,
|
||||
"cont": true
|
||||
},
|
||||
{
|
||||
"label": "问Redis",
|
||||
"pt": 43,
|
||||
"cont": true
|
||||
},
|
||||
{
|
||||
"label": "问asyncio",
|
||||
"pt": 43,
|
||||
"cont": true
|
||||
},
|
||||
{
|
||||
"label": "再问Git",
|
||||
"pt": 46,
|
||||
"cont": true
|
||||
}
|
||||
]
|
||||
},
|
||||
"Last-5": {
|
||||
"avg_tokens": 67.6,
|
||||
"contamination_rate": 100.0,
|
||||
"raw": [
|
||||
{
|
||||
"label": "问PG",
|
||||
"pt": 66,
|
||||
"cont": true
|
||||
},
|
||||
{
|
||||
"label": "问Git",
|
||||
"pt": 68,
|
||||
"cont": true
|
||||
},
|
||||
{
|
||||
"label": "问Redis",
|
||||
"pt": 67,
|
||||
"cont": true
|
||||
},
|
||||
{
|
||||
"label": "问asyncio",
|
||||
"pt": 67,
|
||||
"cont": true
|
||||
},
|
||||
{
|
||||
"label": "再问Git",
|
||||
"pt": 70,
|
||||
"cont": true
|
||||
}
|
||||
]
|
||||
},
|
||||
"Last-10": {
|
||||
"avg_tokens": 137.6,
|
||||
"contamination_rate": 100.0,
|
||||
"raw": [
|
||||
{
|
||||
"label": "问PG",
|
||||
"pt": 136,
|
||||
"cont": true
|
||||
},
|
||||
{
|
||||
"label": "问Git",
|
||||
"pt": 138,
|
||||
"cont": true
|
||||
},
|
||||
{
|
||||
"label": "问Redis",
|
||||
"pt": 137,
|
||||
"cont": true
|
||||
},
|
||||
{
|
||||
"label": "问asyncio",
|
||||
"pt": 137,
|
||||
"cont": true
|
||||
},
|
||||
{
|
||||
"label": "再问Git",
|
||||
"pt": 140,
|
||||
"cont": true
|
||||
}
|
||||
]
|
||||
},
|
||||
"BM25-5": {
|
||||
"avg_tokens": 70.6,
|
||||
"contamination_rate": 60.0,
|
||||
"raw": [
|
||||
{
|
||||
"label": "问PG",
|
||||
"pt": 68,
|
||||
"cont": true
|
||||
},
|
||||
{
|
||||
"label": "问Git",
|
||||
"pt": 74,
|
||||
"cont": true
|
||||
},
|
||||
{
|
||||
"label": "问Redis",
|
||||
"pt": 70,
|
||||
"cont": false
|
||||
},
|
||||
{
|
||||
"label": "问asyncio",
|
||||
"pt": 67,
|
||||
"cont": true
|
||||
},
|
||||
{
|
||||
"label": "再问Git",
|
||||
"pt": 74,
|
||||
"cont": false
|
||||
}
|
||||
]
|
||||
},
|
||||
"Full CGK": {
|
||||
"avg_tokens": 42.6,
|
||||
"contamination_rate": 0.0,
|
||||
"raw": [
|
||||
{
|
||||
"label": "问PG",
|
||||
"pt": 18,
|
||||
"cont": false
|
||||
},
|
||||
{
|
||||
"label": "问Git",
|
||||
"pt": 59,
|
||||
"cont": false
|
||||
},
|
||||
{
|
||||
"label": "问Redis",
|
||||
"pt": 19,
|
||||
"cont": false
|
||||
},
|
||||
{
|
||||
"label": "问asyncio",
|
||||
"pt": 56,
|
||||
"cont": false
|
||||
},
|
||||
{
|
||||
"label": "再问Git",
|
||||
"pt": 61,
|
||||
"cont": false
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user