first commit

2026-05-10 13:52:46 +08:00
commit ccc63d1e70
4583 changed files with 584341 additions and 0 deletions
--- a/research/llm-model-comparison/references/chinese-model-platforms.md
+++ b/research/llm-model-comparison/references/chinese-model-platforms.md
@@ -0,0 +1,40 @@
+# Chinese AI Model Platforms Reference
+
+## Major Providers & Model Families
+
+| Provider | Platform | Model Family | Notes |
+|----------|----------|-------------|-------|
+| 商汤 SenseTime | cloud.sensenova.cn | SenseNova (6.7B, U1, etc.) | Named as `sensenova-*` in APIs |
+| 深度求索 DeepSeek | platform.deepseek.com | DeepSeek-V3/V4, R1, Coder | `deepseek-*` naming |
+| 阿里 Alibaba | dashscope.aliyun.com | Qwen (通义千问) | `qwen-*` naming |
+| 字节跳动 ByteDance | volcengine.com | Doubao (豆包) | `doubao-*` naming |
+| 月之暗面 Moonshot | platform.moonshot.cn | Kimi | `moonshot-*` naming |
+| 智谱 Zhipu | open.bigmodel.cn | GLM (ChatGLM) | `glm-*` naming |
+| 百度 Baidu | cloud.baidu.com | 文心 ERNIE | `ernie-*` naming |
+| 零一万物 01.AI | platform.lingyiwanwu.com | Yi | `yi-*` naming |
+| MiniMax | platform.minimaxi.com | MiniMax (M2.7, etc.) | `minimax-*` naming |
+| 小米 Xiaomi | mimo.xiaomi.com | MiMo | `mimo-*` naming |
+
+## Common Model Naming Patterns
+
+- `*-flash` / `*-lite` → lightweight/fast inference variants
+- `*-fast` → speed-optimized, may sacrifice some quality
+- `*-instruct` → instruction-tuned for chat
+- `*-coder` / `*-code` → code-specialized
+- `*-v1`, `*-v2`, `*-v3` → version iterations
+- Parameter count often embedded: `6.7B`, `72B`, etc.
+
+## How to Research an Unknown Model
+
+1. **mmx search** with model name + "评测" or "benchmark"
+2. Check the provider's official docs (see table above)
+3. Check LMSYS Chatbot Arena leaderboard (lmarena.ai)
+4. Check non-linear Chinese LLM benchmark (github.com/jeinlee1991/chinese-llm-benchmark)
+
+## Quick Classification Heuristics
+
+- If name contains a provider prefix (sensenova, deepseek, qwen...) → look up that provider
+- If name contains parameter count (6.7B, 7B, 72B) → compare against known models of similar size
+- If name contains "flash/lite/fast" → speed variant, likely lower quality than base model
+- "Lite" models: often 1B-7B range, good for simple tasks
+- "Flash/Fast" models: optimized inference, may use MoE or quantization
--- a/research/llm-model-comparison/references/model-benchmarks-2026-05.md
+++ b/research/llm-model-comparison/references/model-benchmarks-2026-05.md
@@ -0,0 +1,86 @@
+# Model Benchmark Data — May 2026
+
+## Chinese LLM Benchmark (non-linear ReLE)
+Source: github.com/jeinlee1991/chinese-llm-benchmark
+
+### 通用能力 (General Capability)
+| 排名 | 模型 | 准确率 | 耗时 | 花费/千次(元) |
+|------|------|--------|------|---------------|
+| 28 | MiniMax-M2.7 | 65.1% | 110s | 42.7 |
+| 35 | MiMo-V2.5-Pro | ~71.4%* | 56s | 64.3 |
+
+*MiMo-V2.5-Pro 数据来自单独评测文章，排名从第35位跃升至第7位。
+
+### 中文指令遵从
+| 排名 | 模型 | 准确率 | 耗时 |
+|------|------|--------|------|
+| 30 | MiniMax-M2.7 | 42.9% | 51s |
+
+### BFCL-V3 (Function Calling)
+| 排名 | 模型 | 准确率 |
+|------|------|--------|
+| 2 | MiniMax-M2.7 | 76.5% |
+| 12 | MiniMax-M2.5 | 70.5% |
+
+## MiMo-V2.5-Pro Key Metrics
+Source: 小米官方 + Artificial Analysis
+
+- GDPVal-AA (Elo): 1581 — 全球开源模型第一
+- ClawEval: 63.8
+- τ³-Bench: 72.9
+- SWE-bench Pro: 接近 Claude Opus 4.6 / GPT-5.4 水平
+- Token 效率: 较 Kimi 提升 42%
+- 参数: 1T (Pro), 310B (标准版)
+- 上下文: 1M tokens
+- 协议: MIT (完全开源)
+- Coding 能力: 较上代提升 8.8% (53.1% → 61.9%)
+
+## MiniMax M2.7 Key Metrics
+Source: MiniMax 官方
+
+- SWE-bench Pro: 56.22%
+- 自我进化: 通过 Agent Harness 参与自身训练，30-50% 研发工作量可由模型承担
+- 核心定位: Agent 旗舰模型
+- 状态: 闭源商用 API
+- 港股表现: 股价 886 港元/股 (2026年2月)
+
+## Arcee Trinity Large Key Metrics
+Source: Arcee AI 官方 + 技术报告
+
+- 参数: 400B 总参数，13B 激活/token (MoE)
+- 架构: AFMoE (Attention-First Mixture-of-Experts)
+- 专家数: 128 experts, 8 active per token
+- 上下文: 131K tokens
+- 生成速度: 200+ tokens/s
+- 响应延迟: sub-3s
+- 协议: Apache 2.0 (完全开源，可商用)
+- 性能: 与 Llama 4 Maverick 400B、GLM-4.5 相当
+- 训练方: Arcee AI + Prime Intellect + DatologyAI
+- 定位: 美国企业发布的最大开源模型之一
+
+## Quick Reference: Model Tier List (May 2026)
+
+### Tier 1 — 顶级闭源
+- GPT-5.4 / GPT-5.5 (OpenAI)
+- Claude Opus 4.6 (Anthropic)
+- Gemini 3.1 Pro (Google)
+
+### Tier 1.5 — 准顶级 / 开源最强
+- MiMo-V2.5-Pro (小米) — 开源第一梯队
+- Kimi-K2-Thinking (月之暗面)
+- GLM-5.1 (智谱AI)
+
+### Tier 2 — 强劲商用
+- MiniMax M2.7 — 中文顶级，Agent 强
+- Qwen3.5-Plus (阿里)
+- DeepSeek V4-Pro
+
+### Tier 2.5 — 优秀开源
+- Trinity Large (Arcee) — 400B MoE，英文优化
+- Qwen3.5-27B / Qwen3.6-35B
+- GLM-4.7 (智谱AI)
+
+### Tier 3 — 高效/轻量
+- Trinity Mini (26B, 3B active)
+- Gemini 3.1 Flash Lite
+- Qwen3.5-Flash