init: AI日报 pipeline 完整代码 + 技能文档 + 运行记录

This commit is contained in:
2026-06-04 10:38:44 +08:00
commit 94e18ce22d
10 changed files with 1728 additions and 0 deletions

View File

@@ -0,0 +1,65 @@
# Rendering & Guide Formatting Reference
## `clean_guide_text(text)` function (in `blog_markdown()`)
Strips unwanted artifacts from LLM-generated guide text:
```python
def clean_guide_text(text):
# Strip all [N] reference numbers
text = re.sub(r'\[\d+\]', '', text)
text = re.sub(r'\[N\]', '', text).strip()
# Strip "主线判断:" prefix
text = re.sub(r'^主线判断[:]\s*', '', text)
# Clean extra whitespace
text = re.sub(r'\s+', ' ', text).strip()
return text
```
## Summary section rendering
Type labels map: `{'strong': '强信号', 'medium': '中信号', 'risk': '待验证'}`
Output format per type group:
```
## 总结
**强信号**
- **标题从text第一句提取**
解释内容...
- **标题**
解释内容...
**中信号**
- **标题**
解释内容...
**待验证**
- **标题**
解释内容...
```
Title extraction logic:
1. Try splitting on `` or `:` — if prefix < 60 chars, use as title
2. Otherwise, split on `。!?` and use first sentence as title
## Title translation (Stage 2a)
Titles are translated from English to Chinese in Stage 2a. Rules:
- Brand names preserved: GPT-5, Codex, Gemini, OpenAI, Meta, etc.
- Technical terms with no good Chinese equivalent: keep English
- Everything else: translate to natural Chinese
- LLM prompt explicitly states: "英文品牌名/模型名保留原样,其余翻译为中文"
## LLM prompt for guide (as of 2026-05-30)
Key instructions to LLM:
- 不要空泛总结(如"行业焦点转向XX"),要指向具体事件
- 不要引用编号如[1][3],读者看不到对应关系
- 不要建议("开发者应该..."之类删掉)
- 每条控制在2-3句话以内
- 用大白话,不要学术腔