2.2 KiB
2.2 KiB
MiMo-v2.5-pro API Performance Profile
Empirically tested on https://token-plan-sgp.xiaomimimo.com/v1 (2026-05-29).
Latency by Prompt Size
| Prompt Size | Items | Response Time | Status |
|---|---|---|---|
| ~500 chars | 1-2 | 2-4s | ✅ Reliable |
| ~4,500 chars | 15 | ~73s | ✅ OK |
| ~7,400 chars | 25 | >120s | ❌ Timeout |
| ~10,900 chars | 35 | >120s | ❌ Timeout |
| ~19,000 chars | 65-70 | >150s | ❌ Timeout |
Key Constraints
- Max reliable prompt size: ~5K chars / ~18 items for structured output tasks
- Output token generation is slow (~50-80 tokens/s for large JSON outputs)
- Simple prompts (<1K) are fast and reliable (2-4s)
- Latency is highly variable — same prompt can take 73s or timeout at 150s
- Temperature 0.2 used for structured output consistency
Implications for Cron Jobs
- Pre-filter aggressively before sending to LLM: dedupe + source priority + cap at 18 items
- Cron timeout 300s budget: ~35s data fetch + ~80s LLM = ~115s typical, but retries can push to 250s+
- Set LLM urllib timeout to 150s (not 300s — it won't help, just wastes cron budget)
- Retry 2x max (not 3x) to stay within 300s cron budget
- If LLM consistently times out, check if API is rate-limited (test with simple prompt first)
Workaround: Pre-filter Pattern
def _prefilter_items(raw_items, max_items=18):
"""Dedupe + prioritize before LLM call."""
seen = set()
filtered = []
priority_sources = {'AI HOT': 1, '橘鸦AI早报': 1, 'InfoQ AI': 2, '量子位': 2}
sorted_items = sorted(raw_items, key=lambda r: priority_sources.get(r.get('source_group', ''), 3))
for item in sorted_items:
norm = re.sub(r'[^\w\u4e00-\u9fff]+', '', item['title_raw'].lower())
if not norm or len(norm) < 3 or norm in seen:
continue
seen.add(norm)
filtered.append(item)
if len(filtered) >= max_items:
break
return filtered
Alternative Providers (tested same day)
- Findmini (gpt-5.4):
https://api.findmini.top/gpt/v1— returned 503 - OpenRouter (free models): returned 429 rate limit
- MiMo small prompts: consistently 2-4s, reliable for simple tasks