first commit

This commit is contained in:
Hermes Agent
2026-05-10 13:52:46 +08:00
commit ccc63d1e70
4583 changed files with 584341 additions and 0 deletions

View File

@@ -0,0 +1,444 @@
---
name: spike
description: "Throwaway experiments to validate an idea before build."
version: 1.0.0
author: Hermes Agent (adapted from gsd-build/get-shit-done)
license: MIT
metadata:
hermes:
tags: [spike, prototype, experiment, feasibility, throwaway, exploration, research, planning, mvp, proof-of-concept]
related_skills: [sketch, writing-plans, subagent-driven-development, plan]
---
# Spike
Use this skill when the user wants to **feel out an idea** before committing to a real build — validating feasibility, comparing approaches, or surfacing unknowns that no amount of research will answer. Spikes are disposable by design. Throw them away once they've paid their debt.
Load this when the user says things like "let me try this", "I want to see if X works", "spike this out", "before I commit to Y", "quick prototype of Z", "is this even possible?", or "compare A vs B".
## When NOT to use this
- The answer is knowable from docs or reading code — just do research, don't build
- The work is production path — use `writing-plans` / `plan` instead
- The idea is already validated — jump straight to implementation
## If the user has the full GSD system installed
If `gsd-spike` shows up as a sibling skill (installed via `npx get-shit-done-cc --hermes`), prefer **`gsd-spike`** when the user wants the full GSD workflow: persistent `.planning/spikes/` state, MANIFEST tracking across sessions, Given/When/Then verdict format, and commit patterns that integrate with the rest of GSD. This skill is the lightweight standalone version for users who don't have (or don't want) the full system.
## Core method
Regardless of scale, every spike follows this loop:
```
decompose → research → build → verdict
↑__________________________________________↓
iterate on findings
```
### 1. Decompose
Break the user's idea into **2-5 independent feasibility questions**. Each question is one spike. Present them as a table with Given/When/Then framing:
| # | Spike | Validates (Given/When/Then) | Risk |
|---|-------|----------------------------|------|
| 001 | websocket-streaming | Given a WS connection, when LLM streams tokens, then client receives chunks < 100ms | High |
| 002a | pdf-parse-pdfjs | Given a multi-page PDF, when parsed with pdfjs, then structured text is extractable | Medium |
| 002b | pdf-parse-camelot | Given a multi-page PDF, when parsed with camelot, then structured text is extractable | Medium |
**Spike types:**
- **standard** — one approach answering one question
- **comparison** — same question, different approaches (shared number, letter suffix `a`/`b`/`c`)
**Good spike questions:** specific feasibility with observable output.
**Bad spike questions:** too broad, no observable output, or just "read the docs about X".
**Order by risk.** The spike most likely to kill the idea runs first. No point prototyping the easy parts if the hard part doesn't work.
**Skip decomposition** only if the user already knows exactly what they want to spike and says so. Then take their idea as a single spike.
### 2. Align (for multi-spike ideas)
Present the spike table. Ask: "Build all in this order, or adjust?" Let the user drop, reorder, or re-frame before you write any code.
### 3. Research (per spike, before building)
Spikes are not research-free — you research enough to pick the right approach, then you build. Per spike:
1. **Brief it.** 2-3 sentences: what this spike is, why it matters, key risk.
2. **Surface competing approaches** if there's real choice:
| Approach | Tool/Library | Pros | Cons | Status |
|----------|-------------|------|------|--------|
| ... | ... | ... | ... | maintained / abandoned / beta |
3. **Pick one.** State why. If 2+ are credible, build quick variants within the spike.
4. **Skip research** for pure logic with no external dependencies.
Use Hermes tools for the research step:
- `web_search("python websocket streaming libraries 2025")` — find candidates
- `web_extract(urls=["https://websockets.readthedocs.io/..."])` — read the actual docs (returns markdown)
- `terminal("pip show websockets | grep Version")` — check what's installed in the project's venv
For libraries without docs pages, clone and read their `README.md` / `examples/` via `read_file`. Context7 MCP (if the user has it configured) is also a good source — `mcp_*_resolve-library-id` then `mcp_*_query-docs`.
### 4. Build
One directory per spike. Keep it standalone.
```
spikes/
├── 001-websocket-streaming/
│ ├── README.md
│ └── main.py
├── 002a-pdf-parse-pdfjs/
│ ├── README.md
│ └── parse.js
└── 002b-pdf-parse-camelot/
├── README.md
└── parse.py
```
**Bias toward something the user can interact with.** Spikes fail when the only output is a log line that says "it works." The user wants to *feel* the spike working. Default choices, in order of preference:
1. A runnable CLI that takes input and prints observable output
2. A minimal HTML page that demonstrates the behavior
3. A small web server with one endpoint
4. A unit test that exercises the question with recognizable assertions
**Depth over speed.** Never declare "it works" after one happy-path run. Test edge cases. Follow surprising findings. The verdict is only trustworthy when the investigation was honest.
**Avoid** unless the spike specifically requires it: complex package management, build tools/bundlers, Docker, env files, config systems. Hardcode everything — it's a spike.
**Building one spike** — a typical tool sequence:
```
terminal("mkdir -p spikes/001-websocket-streaming")
write_file("spikes/001-websocket-streaming/README.md", "# 001: websocket-streaming\n\n...")
write_file("spikes/001-websocket-streaming/main.py", "...")
terminal("cd spikes/001-websocket-streaming && python3 main.py")
# Observe output, iterate.
```
**Parallel comparison spikes (002a / 002b) — delegate.** When two approaches can run in parallel and both need real engineering (not 10-line prototypes), fan out with `delegate_task`:
```
delegate_task(tasks=[
{"goal": "Build 002a-pdf-parse-pdfjs: ...", "toolsets": ["terminal", "file", "web"]},
{"goal": "Build 002b-pdf-parse-camelot: ...", "toolsets": ["terminal", "file", "web"]},
])
```
Each subagent returns its own verdict; you write the head-to-head.
### 5. Verdict
Each spike's `README.md` closes with:
```markdown
## Verdict: VALIDATED | PARTIAL | INVALIDATED
### What worked
- ...
### What didn't
- ...
### Surprises
- ...
### Recommendation for the real build
- ...
```
**VALIDATED** = the core question was answered yes, with evidence.
**PARTIAL** = it works under constraints X, Y, Z — document them.
**INVALIDATED** = doesn't work, for this reason. This is a successful spike.
## Comparison spikes
When two approaches answer the same question (002a / 002b), build them **back to back**, then do a head-to-head comparison at the end:
```markdown
## Head-to-head: pdfjs vs camelot
| Dimension | pdfjs (002a) | camelot (002b) |
|-----------|--------------|----------------|
| Extraction quality | 9/10 structured | 7/10 table-only |
| Setup complexity | npm install, 1 line | pip + ghostscript |
| Perf on 100-page PDF | 3s | 18s |
| Handles rotated text | no | yes |
**Winner:** pdfjs for our use case. Camelot if we need table-first extraction later.
```
## Frontier mode (picking what to spike next)
If spikes already exist and the user says "what should I spike next?", walk the existing directories and look for:
- **Integration risks** — two validated spikes that touch the same resource but were tested independently
- **Data handoffs** — spike A's output was assumed compatible with spike B's input; never proven
- **Gaps in the vision** — capabilities assumed but unproven
- **Alternative approaches** — different angles for PARTIAL or INVALIDATED spikes
Propose 2-4 candidates as Given/When/Then. Let the user pick.
## Output
- Create `spikes/` (or `.planning/spikes/` if the user is using GSD conventions) in the repo root
- One dir per spike: `NNN-descriptive-name/`
- `README.md` per spike captures question, approach, results, verdict
- Keep the code throwaway — a spike that takes 2 days to "clean up for production" was a bad spike
## Attribution
Adapted from the GSD (Get Shit Done) project's `/gsd-spike` workflow — MIT © 2025 Lex Christopherson ([gsd-build/get-shit-done](https://github.com/gsd-build/get-shit-done)). The full GSD system offers persistent spike state, MANIFEST tracking, and integration with a broader spec-driven development pipeline; install with `npx get-shit-done-cc --hermes --global`.
## 中文实验记录模板
### 实验 README 模板
```markdown
# [编号]: [实验名称]
## 实验问题
**核心问题:** [用一句话描述要验证什么]
**Given/When/Then 格式:**
- Given: [前置条件]
- When: [触发条件]
- Then: [预期结果]
**风险等级:** 高 / 中 / 低
---
## 技术方案
### 方案 A: [方案名称]
- **工具/库:** [名称]
- **优点:** [列表]
- **缺点:** [列表]
- **状态:** 维护中 / 已弃用 / 测试版
### 方案 B: [方案名称]
- **工具/库:** [名称]
- **优点:** [列表]
- **缺点:** [列表]
- **状态:** 维护中 / 已弃用 / 测试版
**选择:** [选择哪个方案及原因]
---
## 实现
### 文件结构
```
spikes/[编号]-[名称]/
├── README.md
├── main.py
└── requirements.txt
```
### 核心代码
```python
# 主要实现代码
```
### 运行命令
```bash
cd spikes/[编号]-[名称]
pip install -r requirements.txt
python main.py
```
---
## 测试结果
### 功能测试
| 测试用例 | 输入 | 预期输出 | 实际输出 | 结果 |
|----------|------|----------|----------|------|
| 测试 1 | [输入] | [预期] | [实际] | ✅/❌ |
| 测试 2 | [输入] | [预期] | [实际] | ✅/❌ |
### 边界测试
| 测试用例 | 输入 | 预期输出 | 实际输出 | 结果 |
|----------|------|----------|----------|------|
| 边界 1 | [输入] | [预期] | [实际] | ✅/❌ |
| 边界 2 | [输入] | [预期] | [实际] | ✅/❌ |
### 性能测试
| 测试场景 | 数据量 | 耗时 | 内存使用 | 结果 |
|----------|--------|------|----------|------|
| 场景 1 | [数据量] | [耗时] | [内存] | ✅/❌ |
| 场景 2 | [数据量] | [耗时] | [内存] | ✅/❌ |
---
## 发现
### 成功点
- [成功点 1]
- [成功点 2]
### 失败点
- [失败点 1]
- [失败点 2]
### 意外发现
- [意外发现 1]
- [意外发现 2]
---
## 结论
**验证结果:** ✅ 有效 / ⚠️ 部分有效 / ❌ 无效
**约束条件:**
- [约束 1]
- [约束 2]
**建议:**
- [建议 1]
- [建议 2]
---
## 下一步
- [ ] [下一步行动 1]
- [ ] [下一步行动 2]
```
### 对比实验报告模板
```markdown
# 对比实验: [方案 A] vs [方案 B]
## 实验背景
**要解决的问题:** [问题描述]
**评估维度:**
1. [维度 1]
2. [维度 2]
3. [维度 3]
---
## 方案对比
| 维度 | 方案 A: [名称] | 方案 B: [名称] |
|------|----------------|----------------|
| [维度 1] | [评估] | [评估] |
| [维度 2] | [评估] | [评估] |
| [维度 3] | [评估] | [评估] |
| 易用性 | [评估] | [评估] |
| 性能 | [评估] | [评估] |
| 维护性 | [评估] | [评估] |
---
## 详细分析
### 方案 A: [名称]
**优点:**
- [优点 1]
- [优点 2]
**缺点:**
- [缺点 1]
- [缺点 2]
**适用场景:**
- [场景 1]
- [场景 2]
### 方案 B: [名称]
**优点:**
- [优点 1]
- [优点 2]
**缺点:**
- [缺点 1]
- [缺点 2]
**适用场景:**
- [场景 1]
- [场景 2]
---
## 性能对比
| 测试场景 | 方案 A | 方案 B | 差异 |
|----------|--------|--------|------|
| [场景 1] | [数据] | [数据] | [差异] |
| [场景 2] | [数据] | [数据] | [差异] |
---
## 结论
**推荐方案:** [方案 A / 方案 B]
**原因:**
1. [原因 1]
2. [原因 2]
3. [原因 3]
**适用条件:**
- [条件 1]
- [条件 2]
**不适用条件:**
- [条件 1]
- [条件 2]
---
## 决策记录
| 决策点 | 选项 | 选择 | 理由 |
|--------|------|------|------|
| [决策 1] | A / B | [选择] | [理由] |
| [决策 2] | A / B | [选择] | [理由] |
```
### 中文实验流程
```markdown
## 实验流程
### 1. 分解问题
将用户的想法分解为 **2-5 个独立的可行性问题**。每个问题是一个实验。
### 2. 对齐(多实验想法)
展示实验表。询问:"按此顺序构建,还是调整?" 让用户删除、重新排序或重新定义。
### 3. 研究(每个实验,构建前)
- 简要说明2-3 句话描述实验是什么、为什么重要、关键风险
- 列出竞争方案(如果有真正的选择)
- 选择一个。说明原因。如果有 2+ 个可信方案,在实验中构建快速变体
- 纯逻辑无外部依赖时跳过研究
### 4. 构建
每个实验一个目录。保持独立。
**优先选择用户可以交互的东西。** 实验失败时,唯一输出是日志行"它能工作"。用户想要*感受*实验在工作。
**深度优于速度。** 永远不要在一次快乐路径运行后就宣布"它能工作"。测试边界情况。跟随令人惊讶的发现。
### 5. 结论
每个实验的 `README.md` 以结论结束:
- **✅ 有效** = 核心问题被肯定回答,有证据
- **⚠️ 部分有效** = 在约束 X、Y、Z 下工作 — 记录它们
- **❌ 无效** = 不工作,说明原因。这是一个成功的实验
```