MiMo-v2.5-pro API Performance Profile

Empirically tested on https://token-plan-sgp.xiaomimimo.com/v1 (2026-05-29).

Latency by Prompt Size

Prompt Size	Items	Response Time	Status
~500 chars	1-2	2-4s	✅ Reliable
~4,500 chars	15	~73s	✅ OK
~7,400 chars	25	>120s	❌ Timeout
~10,900 chars	35	>120s	❌ Timeout
~19,000 chars	65-70	>150s	❌ Timeout

Key Constraints

Max reliable prompt size: ~5K chars / ~18 items for structured output tasks
Output token generation is slow (~50-80 tokens/s for large JSON outputs)
Simple prompts (<1K) are fast and reliable (2-4s)
Latency is highly variable — same prompt can take 73s or timeout at 150s
Temperature 0.2 used for structured output consistency

Implications for Cron Jobs

Pre-filter aggressively before sending to LLM: dedupe + source priority + cap at 18 items
Cron timeout 300s budget: ~35s data fetch + ~80s LLM = ~115s typical, but retries can push to 250s+
Set LLM urllib timeout to 150s (not 300s — it won't help, just wastes cron budget)
Retry 2x max (not 3x) to stay within 300s cron budget
If LLM consistently times out, check if API is rate-limited (test with simple prompt first)

Workaround: Pre-filter Pattern

def _prefilter_items(raw_items, max_items=18):
    """Dedupe + prioritize before LLM call."""
    seen = set()
    filtered = []
    priority_sources = {'AI HOT': 1, '橘鸦AI早报': 1, 'InfoQ AI': 2, '量子位': 2}
    sorted_items = sorted(raw_items, key=lambda r: priority_sources.get(r.get('source_group', ''), 3))
    for item in sorted_items:
        norm = re.sub(r'[^\w\u4e00-\u9fff]+', '', item['title_raw'].lower())
        if not norm or len(norm) < 3 or norm in seen:
            continue
        seen.add(norm)
        filtered.append(item)
        if len(filtered) >= max_items:
            break
    return filtered

Alternative Providers (tested same day)

Findmini (gpt-5.4): https://api.findmini.top/gpt/v1 — returned 503
OpenRouter (free models): returned 429 rate limit
MiMo small prompts: consistently 2-4s, reliable for simple tasks

2.2 KiB Raw Blame History

MiMo-v2.5-pro API Performance Profile

Latency by Prompt Size

Key Constraints

Implications for Cron Jobs

Workaround: Pre-filter Pattern

Alternative Providers (tested same day)

2.2 KiB

Raw Blame History