first commit
This commit is contained in:
96
content-ops/blog-review-workflow/SKILL.md
Normal file
96
content-ops/blog-review-workflow/SKILL.md
Normal file
@@ -0,0 +1,96 @@
|
||||
---
|
||||
name: blog-review-workflow
|
||||
description: Iterative blog review using subagents — write, review, fix, re-review until quality threshold met.
|
||||
version: 1.0.0
|
||||
author: Hermes Agent
|
||||
license: MIT
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [blog, review, subagent, quality, content-ops]
|
||||
---
|
||||
|
||||
# Blog Review Workflow
|
||||
|
||||
Use this workflow when publishing a blog post that requires quality assurance. The pattern is: write → subagent review → fix → subagent re-review → micro-adjust → publish.
|
||||
|
||||
## When to Use
|
||||
- Blog posts based on external data/evaluations (must verify factual accuracy)
|
||||
- Posts where the user explicitly asks for quality review
|
||||
- Any post where accuracy and fairness matter (comparisons, reviews, analyses)
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Write Draft
|
||||
Write the blog post, save locally, publish as draft via content-ops-agent API.
|
||||
|
||||
### Step 2: First Subagent Review
|
||||
Delegate to a subagent with NO conversation context — it should only read the source data and the blog draft.
|
||||
|
||||
**Critical: The subagent must clone/read the original data source independently.** Do not pass the data through context — let the subagent verify facts against the ground truth.
|
||||
|
||||
Review dimensions:
|
||||
1. **Factual accuracy** — data, rankings, conclusions match source?
|
||||
2. **Analysis depth** — original insights vs just rephrasing?
|
||||
3. **Logical coherence** — flow, no contradictions?
|
||||
4. **Technical accuracy** — domain concepts correct?
|
||||
5. **Readability** — accessible to target audience?
|
||||
6. **Fairness** — balanced treatment of all subjects?
|
||||
7. **Completeness** — important info not omitted?
|
||||
|
||||
Output format:
|
||||
- Overall score (1-10)
|
||||
- Per-dimension scores
|
||||
- Specific issue list (with line numbers/quotes)
|
||||
- Actionable fix suggestions
|
||||
|
||||
### Step 3: Fix Based on Review
|
||||
Apply fixes. Common patterns:
|
||||
- Factual errors → correct data, add caveats
|
||||
- Depth issues → add original analysis frameworks (taxonomy, cost/perf, etc.)
|
||||
- Fairness issues → equal treatment of all subjects (don't soften one while harshening another)
|
||||
- Missing content → add overlooked but important findings
|
||||
|
||||
### Step 4: Second Subagent Review
|
||||
Re-review with focus on:
|
||||
- Are the N issues from round 1 fixed?
|
||||
- Any NEW issues introduced?
|
||||
- Overall quality improvement?
|
||||
|
||||
### Step 5: Micro-adjustments
|
||||
Fix any remaining low-priority issues from round 2. Update the draft.
|
||||
|
||||
### Step 6: Confirm with User
|
||||
Present the review results and ask if they want to publish or make further changes.
|
||||
|
||||
## Pitfalls
|
||||
|
||||
### sed for content insertion can duplicate
|
||||
When using `sed` to insert content at a pattern match, be aware that if the pattern matches multiple times, the insertion will happen at each match. Use Python for complex content modifications instead:
|
||||
```python
|
||||
# Better approach for conditional insertion
|
||||
marker = "### Target Section"
|
||||
parts = content.split(marker)
|
||||
# Process carefully, handle duplicates
|
||||
```
|
||||
|
||||
### Subagent file access
|
||||
The subagent needs terminal access to clone repos and use curl. Always include `terminal` and `file` in toolsets. If the blog uses an API, include `web` toolset.
|
||||
|
||||
### Community feedback as review signal
|
||||
When reviewing blog posts that reference external content, the original source's comment section may be inaccessible (e.g., WeChat requires login). Instead, gather community feedback from:
|
||||
- **GitHub API**: `curl https://api.github.com/repos/OWNER/REPO` → stars, forks, issues
|
||||
- **mmx search**: `"topic" 评价 OR 反馈 OR 体验 OR 用过` across platforms
|
||||
- **GitHub issues**: specific bug reports or feature requests that reveal user pain points
|
||||
This data enriches the "公正性" and "完整性" review dimensions.
|
||||
|
||||
### Token security
|
||||
Never hardcode the service token in the subagent task description. Instead, tell the subagent to use environment variables or read from a known location.
|
||||
|
||||
## Quality Thresholds
|
||||
- **≥ 8.0**: Ready to publish
|
||||
- **7.0-7.9**: Minor fixes needed
|
||||
- **6.0-6.9**: Significant rework required
|
||||
- **< 6.0**: Major rewrite needed
|
||||
|
||||
## Reference
|
||||
This workflow was developed during a blog post evaluation of 6 AI models' iOS development capabilities. The first review scored 6.5/10 with 21 issues. After fixes, the second review scored 8.2/10 with only 3 low-priority remaining issues.
|
||||
@@ -0,0 +1,35 @@
|
||||
# Example: AI Model Evaluation Blog Post Review
|
||||
|
||||
## Context
|
||||
Blog post titled "6款AI模型iOS开发能力深度评测" based on @solidus's evaluation data.
|
||||
|
||||
## First Review (6.5/10) — Key Issues Found
|
||||
|
||||
### Critical Factual Errors
|
||||
1. **Opus scoring misleading**: 95/100 based on only 8 core practical questions, while other models scored on 84 questions. Placed in same table without caveat.
|
||||
2. **"Two evaluation systems" described as three**: Title said "两套" but listed three.
|
||||
3. **GLM highest main score but ranked 3rd**: No explanation of why (XII pressure test only 79 vs Sonnet 87).
|
||||
|
||||
### Fairness Issues
|
||||
4. **Double standard on API fabrication**: MiMo's fabricated `sending` syntax got bold + "最危险的失败模式", while Sonnet's fabricated iOS API got only "翻车" (casual). Fix: equal treatment.
|
||||
5. **Selective month-end drift comparison**: Only showed Opus (best) vs Kimi (worst), ignoring DeepSeek/GLM also solved it correctly.
|
||||
|
||||
### Depth Issues
|
||||
6. **5 "deep analysis" questions were just rephrased** from the source report's summary section.
|
||||
7. **Scenario recommendations copied verbatim** from source report.
|
||||
|
||||
### Missing Content
|
||||
8. Kimi's `fatalError` in production code (critical engineering flaw)
|
||||
9. GLM's CSV export syntax error (won't compile)
|
||||
10. Sonnet's TWO failures in graphics test (API fabrication + ACES formula)
|
||||
|
||||
## Second Review (8.2/10) — Remaining Low-Priority Issues
|
||||
1. SE proposal number reference (SE-0371 vs SE-0427)
|
||||
2. Opus 95-score description could be more precise
|
||||
3. Missing "legacy Swift 5 project" recommendation scenario
|
||||
|
||||
## Lessons Learned
|
||||
- Always add caveats when comparing scores with different sample sizes
|
||||
- Equal treatment: if you harshly criticize one model for X, do the same for all models that did X
|
||||
- Original analysis frameworks (failure mode taxonomy, cost/perf analysis) add genuine depth
|
||||
- Subagent review with NO context forces independent verification against source data
|
||||
Reference in New Issue
Block a user