first commit
This commit is contained in:
419
dogfood/SKILL.md
Normal file
419
dogfood/SKILL.md
Normal file
@@ -0,0 +1,419 @@
|
||||
---
|
||||
name: dogfood
|
||||
description: "Exploratory QA of web apps: find bugs, evidence, reports."
|
||||
version: 1.0.0
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [qa, testing, browser, web, dogfood]
|
||||
related_skills: []
|
||||
---
|
||||
|
||||
# Dogfood: Systematic Web Application QA Testing
|
||||
|
||||
## Overview
|
||||
|
||||
This skill guides you through systematic exploratory QA testing of web applications. It supports **two execution modes** depending on context:
|
||||
|
||||
1. **Browser-first** (default): Use browser toolset to navigate, interact, and capture evidence live.
|
||||
2. **Source-code-first** (fallback): When browser automation is unavailable, slow, or times out, analyze the source code to enumerate all routes/endpoints/pages, then build a comprehensive test plan document. Execute browser tests selectively afterward.
|
||||
|
||||
For **multi-service sites** (e.g., separate auth/blog/canvas services), prefer the source-code-first approach — it produces more complete coverage faster than crawling.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Browser toolset: either the built-in tools (`browser_navigate`, `browser_snapshot`, etc.) **or** Playwright Python (see `references/playwright-qa.md`) — optional if using source-code-first mode
|
||||
- A target URL and testing scope from the user
|
||||
- Source code access (repo clone or codebase) — strongly recommended for multi-service sites
|
||||
|
||||
## Inputs
|
||||
|
||||
The user provides:
|
||||
1. **Target URL** — the entry point for testing
|
||||
2. **Scope** — what areas/features to focus on (or "full site" for comprehensive testing)
|
||||
3. **Output directory** (optional) — where to save screenshots and the report (default: `./dogfood-output`)
|
||||
|
||||
## Workflow
|
||||
|
||||
Follow this 5-phase systematic workflow:
|
||||
|
||||
### Phase 1: Plan
|
||||
|
||||
1. Create the output directory structure:
|
||||
```
|
||||
{output_dir}/
|
||||
├── screenshots/ # Evidence screenshots
|
||||
└── report.md # Final report (generated in Phase 5)
|
||||
```
|
||||
2. Identify the testing scope based on user input.
|
||||
3. Build a rough sitemap by planning which pages and features to test:
|
||||
- Landing/home page
|
||||
- Navigation links (header, footer, sidebar)
|
||||
- Key user flows (sign up, login, search, checkout, etc.)
|
||||
- Forms and interactive elements
|
||||
- Edge cases (empty states, error pages, 404s)
|
||||
|
||||
### Phase 2: Explore
|
||||
|
||||
For each page or feature in your plan:
|
||||
|
||||
1. **Navigate** to the page:
|
||||
```
|
||||
browser_navigate(url="https://example.com/page")
|
||||
```
|
||||
|
||||
2. **Take a snapshot** to understand the DOM structure:
|
||||
```
|
||||
browser_snapshot()
|
||||
```
|
||||
|
||||
3. **Check the console** for JavaScript errors:
|
||||
```
|
||||
browser_console(clear=true)
|
||||
```
|
||||
Do this after every navigation and after every significant interaction. Silent JS errors are high-value findings.
|
||||
|
||||
4. **Take an annotated screenshot** to visually assess the page and identify interactive elements:
|
||||
```
|
||||
browser_vision(question="Describe the page layout, identify any visual issues, broken elements, or accessibility concerns", annotate=true)
|
||||
```
|
||||
The `annotate=true` flag overlays numbered `[N]` labels on interactive elements. Each `[N]` maps to ref `@eN` for subsequent browser commands.
|
||||
|
||||
5. **Test interactive elements** systematically:
|
||||
- Click buttons and links: `browser_click(ref="@eN")`
|
||||
- Fill forms: `browser_type(ref="@eN", text="test input")`
|
||||
- Test keyboard navigation: `browser_press(key="Tab")`, `browser_press(key="Enter")`
|
||||
- Scroll through content: `browser_scroll(direction="down")`
|
||||
- Test form validation with invalid inputs
|
||||
- Test empty submissions
|
||||
|
||||
6. **After each interaction**, check for:
|
||||
- Console errors: `browser_console()`
|
||||
- Visual changes: `browser_vision(question="What changed after the interaction?")`
|
||||
- Expected vs actual behavior
|
||||
|
||||
### Phase 3: Collect Evidence
|
||||
|
||||
For every issue found:
|
||||
|
||||
1. **Take a screenshot** showing the issue:
|
||||
```
|
||||
browser_vision(question="Capture and describe the issue visible on this page", annotate=false)
|
||||
```
|
||||
Save the `screenshot_path` from the response — you will reference it in the report.
|
||||
|
||||
2. **Record the details**:
|
||||
- URL where the issue occurs
|
||||
- Steps to reproduce
|
||||
- Expected behavior
|
||||
- Actual behavior
|
||||
- Console errors (if any)
|
||||
- Screenshot path
|
||||
|
||||
3. **Classify the issue** using the issue taxonomy (see `references/issue-taxonomy.md`):
|
||||
- Severity: Critical / High / Medium / Low
|
||||
- Category: Functional / Visual / Accessibility / Console / UX / Content
|
||||
|
||||
### Phase 4: Categorize
|
||||
|
||||
1. Review all collected issues.
|
||||
2. De-duplicate — merge issues that are the same bug manifesting in different places.
|
||||
3. Assign final severity and category to each issue.
|
||||
4. Sort by severity (Critical first, then High, Medium, Low).
|
||||
5. Count issues by severity and category for the executive summary.
|
||||
|
||||
### Phase 5: Report
|
||||
|
||||
Generate the final report using the template at `templates/dogfood-report-template.md`.
|
||||
|
||||
The report must include:
|
||||
1. **Executive summary** with total issue count, breakdown by severity, and testing scope
|
||||
2. **Per-issue sections** with:
|
||||
- Issue number and title
|
||||
- Severity and category badges
|
||||
- URL where observed
|
||||
- Description of the issue
|
||||
- Steps to reproduce
|
||||
- Expected vs actual behavior
|
||||
- Screenshot references (use `MEDIA:<screenshot_path>` for inline images)
|
||||
- Console errors if relevant
|
||||
3. **Summary table** of all issues
|
||||
4. **Testing notes** — what was tested, what was not, any blockers
|
||||
|
||||
Save the report to `{output_dir}/report.md`.
|
||||
|
||||
## Alternative Workflow: Source-Code-First (for multi-service / slow-browser sites)
|
||||
|
||||
When the target site has source code available, or browser automation is too slow/times out:
|
||||
|
||||
### Step 1: Clone and Map the Codebase
|
||||
```bash
|
||||
git clone <repo_url> /tmp/qa-target
|
||||
cd /tmp/qa-target
|
||||
find . -name "routes*.py" -o -name "main.py" -o -name "pages.py" -o -name "admin.py" | sort
|
||||
```
|
||||
|
||||
### Step 2: Enumerate All Routes
|
||||
Read each route file and extract:
|
||||
- HTTP method + path (e.g., `GET /posts/{slug}`)
|
||||
- Required auth/permissions
|
||||
- Rate limits
|
||||
- Form fields and validation rules
|
||||
|
||||
Build a **complete URL inventory** — this is your test matrix.
|
||||
|
||||
### Step 3: Analyze Static Assets and Templates
|
||||
Check template files for:
|
||||
- CSS variable definitions (look for `:root` blocks)
|
||||
- JS includes (what scripts are loaded vs missing)
|
||||
- Encoding issues (BOM markers, leading newlines before DOCTYPE)
|
||||
- Accessibility: `alt` attributes, `user-scalable`, skip links
|
||||
|
||||
### Step 4: HTTP-Level Testing (no browser needed)
|
||||
Use `curl` to test:
|
||||
- Page loads (HTTP status codes)
|
||||
- Static asset availability
|
||||
- Response headers (security headers, CSP)
|
||||
- Redirect chains (login flows)
|
||||
- API endpoints (with/without auth cookies)
|
||||
|
||||
### Step 5: Generate Structured Test Plan
|
||||
Output a markdown document with:
|
||||
- Service architecture table
|
||||
- Test accounts and auth mechanism
|
||||
- Per-module test case tables: `编号 | 测试内容 | 测试步骤 | 预期结果 | 优先级`
|
||||
- Known issues found during source analysis
|
||||
- Cross-cutting concerns (consistency, accessibility, security)
|
||||
|
||||
### Step 6: Selective Browser Execution
|
||||
Only use browser automation for:
|
||||
- Login/register interactive flows
|
||||
- Visual verification of known issues
|
||||
- Console error capture
|
||||
- Screenshot evidence
|
||||
|
||||
## Alternative Workflow: Test Plan Gap Analysis (improving existing plans)
|
||||
|
||||
When the user already has a test plan and wants to **improve/complete it** against source code:
|
||||
|
||||
### Step 1: Load Both Inputs in Parallel
|
||||
```
|
||||
1. Read the existing test-plan.md
|
||||
2. Clone or pull the target repo
|
||||
3. Use delegate_task to analyze ALL route files + security mechanisms in one pass
|
||||
- Extract: endpoints, form fields, rate limits, cookie params, CSRF mechanism,
|
||||
validation rules, ownership models, public APIs
|
||||
- Focus on FEATURES not COVERED by the existing plan
|
||||
```
|
||||
|
||||
### Step 2: Systematic Comparison
|
||||
For each service, compare source code findings against existing test cases:
|
||||
- **Endpoints**: Are all HTTP methods + paths covered?
|
||||
- **Validation rules**: Password complexity, username blacklists, email uniqueness, slug format
|
||||
- **Rate limits**: Are all limiters documented with correct values?
|
||||
- **Security mechanisms**: CSRF token format/expiry, cookie attributes, redirect validation
|
||||
- **APIs**: Public JSON APIs, service-to-service APIs, ownership isolation
|
||||
- **Edge cases**: CRLF normalization, content size limits, cascading deletes
|
||||
|
||||
### Step 3: Patch, Don't Rewrite
|
||||
Use targeted `patch` edits to add missing test cases within existing sections:
|
||||
- Insert after the related existing case (e.g., `A-015a` after `A-015`)
|
||||
- Use sub-numbering convention: `X-NNNa` for insertions between `X-0NN` and `X-0NN+1`
|
||||
- Preserve existing case numbers — never renumber
|
||||
- Add new subsections only when the entire category is missing (e.g., Service API)
|
||||
|
||||
### Step 4: Update Statistics
|
||||
After all patches:
|
||||
```bash
|
||||
# Count per-module
|
||||
grep -c '^| H-[0-9]' test-plan.md # Home
|
||||
grep -c '^| A-[0-9]' test-plan.md # Auth
|
||||
# ... etc for each prefix
|
||||
|
||||
# Count total (catches sub-numbered too)
|
||||
grep -c '^| [A-Z]*-[0-9]' test-plan.md
|
||||
```
|
||||
Update both the header stats table and the footer summary table.
|
||||
Bump the version number.
|
||||
|
||||
### Step 5: Commit
|
||||
```bash
|
||||
git add test-plan.md && git commit -m "vN.M: 完善测试计划,新增 X 个测试用例 (old→new)" && git push
|
||||
```
|
||||
|
||||
### Pitfalls for Gap Analysis
|
||||
- **Don't renumber existing cases**: Use `X-NNNa` sub-numbering to insert between existing cases. Renumbering breaks any existing references (issue tracker, test automation).
|
||||
- **Count carefully**: `grep -c '^| X-[0-9]'` misses sub-numbered entries like `A-015a`. Use `'^| [A-Z]*-[0-9]'` for total count, but per-module counts with the prefix filter are usually accurate enough.
|
||||
- **Don't duplicate**: Check if a concept is already covered under a different name before adding. "草稿可见" and "草稿预览" might be the same test.
|
||||
- **delegate_task for source analysis**: Don't read 40+ route files manually. A single delegate_task with a well-structured prompt produces a complete analysis in one pass.
|
||||
|
||||
## Alternative Workflow: Module-by-Module Testing with Incremental Commits
|
||||
|
||||
When the user has an existing test plan (e.g., `test-plan.md` in a repo) and wants to execute it module by module, committing results after each:
|
||||
|
||||
### Step 1: Initialize Results Document
|
||||
Create `test-results.md` with a summary table and placeholder sections for every module. Include: module name, status (⏳), execution time, and empty test result tables.
|
||||
|
||||
### Step 2: Test Module → Update → Commit Loop
|
||||
For each module:
|
||||
1. Execute tests (curl for HTTP-level, Playwright for browser-level)
|
||||
2. Update the module's section in `test-results.md` with results
|
||||
3. Update the summary table (pass/fail/blocked counts)
|
||||
4. Add a "模块 N 小结" section with key findings
|
||||
5. Add a "💡 模块 N 优化建议" section with prioritized recommendations (user explicitly wants these persisted in the document, not just in chat)
|
||||
6. `git add test-results.md && git commit -m "模块N: 通过X/失败Y" && git push`
|
||||
7. Report progress to user before starting next module
|
||||
8. **⚠️ If any test modified content, restore it BEFORE committing**
|
||||
|
||||
### Step 3: Parallel Delegation
|
||||
Use `delegate_task` with 3 parallel tasks for curl-based modules. Each task tests a group of modules and returns JSON results. Browser-based modules must run sequentially.
|
||||
|
||||
### Pitfalls for Module-by-Module
|
||||
- **Don't wait until the end to commit**: Session may break, losing all work
|
||||
- **Restore content after destructive tests**: Save state before, verify after
|
||||
- **Rate limiting blocks repeated tests**: Test rate-limited endpoints last
|
||||
- **CSRF token sync**: Use same cookie jar for GET+POST (see `references/multi-service-qa.md`)
|
||||
- **Optimization suggestions go in the document, not just the chat**: User wants them persisted
|
||||
|
||||
## Deliverables: Split Test Plan + Issue List
|
||||
|
||||
**Always split QA output into two separate documents:**
|
||||
|
||||
1. **`test-plan.md`** — Structured test cases with execution steps and expected results
|
||||
2. **`issue-list.md`** — Known issues found during analysis, with severity and fix suggestions
|
||||
|
||||
Do NOT merge them into one report. Users need the test plan for execution assignment and the issue list for bug tracking. Each document should be self-contained.
|
||||
|
||||
### Test Plan Structure
|
||||
- Service architecture table (services, URLs, ports)
|
||||
- Test accounts and auth mechanism explanation
|
||||
- Per-module test case tables: `编号 | 测试内容 | 测试步骤 | 预期结果 | 账号 | 优先级`
|
||||
- Cover both public pages AND admin/management pages
|
||||
- Include a cross-cutting section for security, consistency, accessibility
|
||||
|
||||
### Issue List Structure
|
||||
- Summary table (count by severity level)
|
||||
- Per-issue entries: module, page, phenomenon, impact, root cause, source code location, fix suggestion, priority
|
||||
- Consistency matrix (which pages have which features/assets)
|
||||
- Fix priority recommendation (immediate / soon / backlog)
|
||||
|
||||
## Comprehensive QA Dimensions Checklist
|
||||
|
||||
When the user asks for "full" or "complete" testing, cover ALL of these dimensions. If you only covered page navigation and login flows, the plan is incomplete.
|
||||
|
||||
### Core (always cover)
|
||||
1. **Functional** — Page loads, navigation, CRUD operations, form submissions
|
||||
2. **Auth & Permissions** — Login/logout, RBAC, cross-service cookie propagation, admin vs user access
|
||||
3. **Input Validation** — Form validation, empty submissions, boundary values, special characters
|
||||
|
||||
### Security (cover for any site with user input)
|
||||
4. **Cookie Security** — HttpOnly, Secure, SameSite attributes; Max-Age; domain scope
|
||||
5. **CSRF Protection** — Token presence, double-submit pattern, token expiry, replay resistance
|
||||
6. **Redirect Safety** — Open redirect via `redirect` parameter; validate against allowed domains
|
||||
7. **Rate Limiting** — Per-endpoint limits; account lockout; IP-based limits
|
||||
8. **File Upload Safety** — Allowed extensions, size limits, filename sanitization, path traversal prevention
|
||||
9. **Input Injection** — XSS in user-generated content, SQL injection attempts, path traversal in slugs
|
||||
|
||||
### Session & State
|
||||
10. **Token Lifecycle** — Expiration behavior, role changes mid-session (DB role vs token role), token format validation
|
||||
11. **Concurrent Access** — Race conditions on shared resources, optimistic locking
|
||||
|
||||
### Content & Rendering
|
||||
12. **Edge Case Content** — Empty states, very long text, special characters (CJK, emoji), Markdown/LaTeX rendering
|
||||
13. **Encoding** — BOM markers, UTF-8 consistency, DOCTYPE prefix cleanliness
|
||||
|
||||
### SEO & Metadata
|
||||
14. **Meta Tags** — `<title>`, `<meta description>`, canonical URLs
|
||||
15. **Open Graph** — `og:title`, `og:description`, `og:image`, `og:url` per page
|
||||
16. **Structured Data** — robots.txt, sitemap.xml, RSS feed validity
|
||||
|
||||
### Accessibility
|
||||
17. **WCAG Basics** — `alt` attributes, `user-scalable`, color contrast, skip-to-content links
|
||||
18. **Keyboard Navigation** — Tab order, focus management, ARIA labels
|
||||
|
||||
### Performance & Compatibility
|
||||
19. **Page Load** — Static asset availability, CDN reliability (especially in China), resource count
|
||||
20. **Responsive Design** — Breakpoints, mobile layout, touch targets
|
||||
21. **Cross-Browser** — Chrome/Firefox/Safari/Edge rendering differences
|
||||
|
||||
### Operations
|
||||
22. **Health Checks** — `/health` endpoint availability per service
|
||||
23. **Error Handling** — 404 pages, 500 error responses, graceful degradation
|
||||
24. **Logging & Audit** — Audit trail for admin actions, login attempts
|
||||
|
||||
### Consistency (cross-service)
|
||||
25. **Asset Inclusion** — Which pages include mobile.css, loader.js, etc.
|
||||
26. **Navigation** — Which pages have site-wide nav bar
|
||||
27. **Security Headers** — X-Content-Type-Options, X-Frame-Options, Referrer-Policy, CSP
|
||||
|
||||
## Pitfalls
|
||||
|
||||
- **🔴 CRITICAL: Always backup content before write operations**: When testing CRUD endpoints (save, publish, create, update), the test payload (including XSS test strings, dummy data, empty fields) CAN overwrite real production content. Before any write test:
|
||||
1. `curl -s -b cookies SITE/admin` → extract current content_json / initialContent → save to `/tmp/backup_<service>.json`
|
||||
2. Perform test
|
||||
3. Restore original content via Playwright (set form fields + `collectFormData()` + submit)
|
||||
- **This is not optional.** A session that deletes user content without restoring it is a failed session.
|
||||
- **🔴 CRITICAL: Restore content IMMEDIATELY after destructive tests**: Don't wait until end of session. If a test modifies content, restore it in the same turn. Session interruptions, timeouts, or context limits can prevent later restoration.
|
||||
- **🔴 CRITICAL: XSS payloads in form fields persist**: When you fill a form field with `<script>alert(1)</script>` for XSS testing, that value gets saved to the database if the form is submitted. Always use Playwright's `page.evaluate()` to set values directly on form elements, NOT `page.fill()` which triggers input events that may activate auto-save.
|
||||
- **⚠️ Do NOT parallelize browser delegate_tasks for QA**: Each browser interaction is slow (navigate + snapshot + screenshot = 10-30s). 3 parallel browser tasks will all timeout at 600s. Run browser tests sequentially or use source-code-first mode.
|
||||
- **⚠️ Curl-only delegate tasks also timeout with large batches**: A delegate_task with 30+ curl test cases can hit the 600s limit (each curl call = 1-3s + overhead). Split large test batches into smaller tasks (~15-20 cases each) or use `execute_code` with `from hermes_tools import terminal` for direct in-process execution (faster, no delegation overhead).
|
||||
- **⚠️ Client-side-only validation is a security finding**: When CSP blocks inline JS (see `script-src-elem` pitfall), any validation that only exists in client JS (password strength, field format, confirmation matching) becomes bypassable. Always test registration/submission with curl to verify server-side validation exists independently.
|
||||
- **⚠️ API authentication order matters**: Some endpoints validate request body BEFORE checking authentication, returning 422 (validation error) instead of 401 (unauthenticated). Test: `curl -X POST /api/endpoint -d 'invalid'` without auth — should get 401, not 422. This is a security issue (leaks endpoint existence and field requirements).
|
||||
- **⚠️ Fulltext search can silently fail**: Search endpoints with `mode=fulltext` may return 0 results while `mode=simple` works fine. Always test both modes with the same query. Common causes: search index not built, tokenizer (jieba) not installed, BM25 ranking misconfigured.
|
||||
- **⚠️ Rate limiting blocks subsequent tests**: Registration endpoints with strict limits (e.g., 6/hour) will block all remaining registration-related tests with 429. Strategy: test non-registration endpoints first, registration tests last, and note which tests were blocked.
|
||||
- **⚠️ Present the test plan BEFORE executing**: Show the user the complete test plan first. If they say "is this really all of it?", the plan is missing dimensions. Refer to the Comprehensive QA Dimensions Checklist above.
|
||||
- **⚠️ "全部加上" means ALL dimensions**: When the user says to add everything, do not skip any dimension. Write all 25+ categories into the test plan even if some have only 1-2 test cases.
|
||||
- **Multi-service auth**: Sites with shared cookies (e.g., `.ephron.ren` domain) need login on ONE service first, then verify cookie propagation to others. Don't try to login on each service independently.
|
||||
- **Encoding bugs**: Always hex-dump HTML source to check for BOM markers (`ef bb bf`) or leading newlines before DOCTYPE. Use: `xxd file.html | head -5`. For Python source files, also check: `xxd file.py | head -1`.
|
||||
- **CSRF tokens**: Many form submissions require CSRF tokens. Extract from the page first, then include in POST requests. Don't forget the CSRF cookie (`ephron_csrf`). Note: CSRF cookies are HttpOnly=false (by design, so JS can read them).
|
||||
- **Rate limits**: Note rate limit values from source code (e.g., `@limiter.limit("5/minute")`). When testing auth failures, stay under the limit or you'll get 429s that mask the real bug.
|
||||
- **Template vs runtime issues**: Some issues (empty content, missing sections) may be data issues, not code bugs. Verify by checking if the data source (database/content files) actually has content.
|
||||
- **File delivery fallback**: When sending files via QQ/WeChat fails, push to a Gitea repo as a fallback delivery mechanism.
|
||||
- **Source code security analysis**: Always check these files when available: `cookie_utils.py` (cookie params), `csrf.py` (CSRF mechanism), `redirect.py` (open redirect validation), `security_headers.py` (CSP/headers), `auth.py` (token format, lockout), `validators.py` (slug/path validation), `limiter.py` (rate limit config).
|
||||
- **⚠️ CSP `script-src-elem` silently kills inline JS**: When a page has inline `<script>` but buttons call functions defined there (e.g., `onclick="saveDraft()"`), always verify the CSP header. The `script-src-elem` directive **overrides** `script-src` for script elements — so `script-src 'unsafe-inline'` combined with `script-src-elem 'self' https://cdn.example.com` blocks ALL inline scripts. Symptoms: functions report "not defined", buttons do nothing, no network requests on click. Detection: check `typeof fnName` in browser console, or look for CSP error in console: `Executing inline script violates the following Content Security Policy directive 'script-src-elem'`. Fix: add `'unsafe-inline'` to `script-src-elem`, use nonce/hash, or extract inline scripts to external `.js` files.
|
||||
- **⚠️ CSP `form-action 'self'` blocks cross-origin redirects after form submission**: When a form POSTs to a same-origin endpoint (allowed by `form-action 'self'`), but the server responds with 303 redirect to a **different origin** (e.g., `auth.example.com` → `www.example.com`), the browser blocks the redirect. CSP `form-action` applies to the **entire redirect chain** resulting from form submission, not just the form's action URL. Symptoms: form appears to submit (POST in network tab), cookie gets set server-side, but page stays on the form URL — no navigation. Console error: `Sending form data to '...' violates the following Content Security Policy directive: "form-action 'self'"`. Detection: (1) test same-origin redirect (should work) vs cross-origin redirect (should fail); (2) `curl -sI` the 303 response — if it carries CSP with `form-action 'self'`, that's the blocker. Fix options: (a) skip CSP header on 303 redirect responses (empty body, CSP adds no protection); (b) use JS-based redirect instead of server-side 303; (c) add allowed origins to `form-action`. Key insight: this breaks any auth flow where login service is on a different subdomain than target pages. See `references/session-learnings-ephron-qa.md` for full reproduction steps.
|
||||
|
||||
## Scope Ambiguity Pitfall
|
||||
|
||||
When the user asks to "inspect a server" or "巡检服务器" **without providing a URL**:
|
||||
- Clarify whether they mean the **local machine Hermes runs on** (system resources, running processes, disk/memory) or a **remote web service** (HTTP endpoints, app health).
|
||||
- **Default assumption**: If the user mentions a domain name (e.g., "巡检 ephron.ren" or "check blog.ephron.ren"), they mean the remote web service. If they say "your server" or "the machine you're on", they mean the local machine.
|
||||
- When in doubt, ask: "是巡检本机还是远程服务?"
|
||||
|
||||
## Tools Reference
|
||||
|
||||
| Tool | Purpose |
|
||||
|------|---------|
|
||||
| `browser_navigate` | Go to a URL |
|
||||
| `browser_snapshot` | Get DOM text snapshot (accessibility tree) |
|
||||
| `browser_click` | Click an element by ref (`@eN`) or text |
|
||||
| `browser_type` | Type into an input field |
|
||||
| `browser_scroll` | Scroll up/down on the page |
|
||||
| `browser_back` | Go back in browser history |
|
||||
| `browser_press` | Press a keyboard key |
|
||||
| `browser_vision` | Screenshot + AI analysis; use `annotate=true` for element labels |
|
||||
| `browser_console` | Get JS console output and errors |
|
||||
| **Playwright Python** | Full browser automation via script — use when built-in tools unavailable or need programmatic control (see `references/playwright-qa.md`) |
|
||||
|
||||
## Related References
|
||||
|
||||
- `references/issue-taxonomy.md` — severity and category classification for issues
|
||||
- `references/server-inspection.md` — local server inspection checklist: system resources, listening ports, processes, Docker, security services; also covers scope ambiguity (local vs. remote), route file reading strategy, cross-service cookie auth testing, static analysis checks
|
||||
- `references/qa-dimensions-checklist.md` — comprehensive 25-dimension QA checklist for "full site" testing requests
|
||||
- `references/playwright-qa.md` — Playwright Python setup, patterns, event monitoring, CSP bug detection
|
||||
- `references/session-learnings-ephron-qa.md` — concrete findings from ephron.ren QA: CSP override, password validation gaps, fulltext search failure, delegate sizing
|
||||
|
||||
## Templates
|
||||
|
||||
- `templates/dogfood-report-template.md` — issue list template (the output with bugs found)
|
||||
- `templates/test-plan-template.md` — test plan template (structured test cases with steps)
|
||||
|
||||
## Tips
|
||||
|
||||
- **Always check `browser_console()` after navigating and after significant interactions.** Silent JS errors are among the most valuable findings.
|
||||
- **Use `annotate=true` with `browser_vision`** when you need to reason about interactive element positions or when the snapshot refs are unclear.
|
||||
- **Test with both valid and invalid inputs** — form validation bugs are common.
|
||||
- **Scroll through long pages** — content below the fold may have rendering issues.
|
||||
- **Test navigation flows** — click through multi-step processes end-to-end.
|
||||
- **Check responsive behavior** by noting any layout issues visible in screenshots.
|
||||
- **Don't forget edge cases**: empty states, very long text, special characters, rapid clicking.
|
||||
- When reporting screenshots to the user, include `MEDIA:<screenshot_path>` so they can see the evidence inline.
|
||||
Reference in New Issue
Block a user