25 KiB
name, description, version, metadata
| name | description | version | metadata | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| dogfood | Exploratory QA of web apps: find bugs, evidence, reports. | 1.0.0 |
|
Dogfood: Systematic Web Application QA Testing
Overview
This skill guides you through systematic exploratory QA testing of web applications. It supports two execution modes depending on context:
- Browser-first (default): Use browser toolset to navigate, interact, and capture evidence live.
- Source-code-first (fallback): When browser automation is unavailable, slow, or times out, analyze the source code to enumerate all routes/endpoints/pages, then build a comprehensive test plan document. Execute browser tests selectively afterward.
For multi-service sites (e.g., separate auth/blog/canvas services), prefer the source-code-first approach — it produces more complete coverage faster than crawling.
Prerequisites
- Browser toolset: either the built-in tools (
browser_navigate,browser_snapshot, etc.) or Playwright Python (seereferences/playwright-qa.md) — optional if using source-code-first mode - A target URL and testing scope from the user
- Source code access (repo clone or codebase) — strongly recommended for multi-service sites
Inputs
The user provides:
- Target URL — the entry point for testing
- Scope — what areas/features to focus on (or "full site" for comprehensive testing)
- Output directory (optional) — where to save screenshots and the report (default:
./dogfood-output)
Workflow
Follow this 5-phase systematic workflow:
Phase 1: Plan
- Create the output directory structure:
{output_dir}/ ├── screenshots/ # Evidence screenshots └── report.md # Final report (generated in Phase 5) - Identify the testing scope based on user input.
- Build a rough sitemap by planning which pages and features to test:
- Landing/home page
- Navigation links (header, footer, sidebar)
- Key user flows (sign up, login, search, checkout, etc.)
- Forms and interactive elements
- Edge cases (empty states, error pages, 404s)
Phase 2: Explore
For each page or feature in your plan:
-
Navigate to the page:
browser_navigate(url="https://example.com/page") -
Take a snapshot to understand the DOM structure:
browser_snapshot() -
Check the console for JavaScript errors:
browser_console(clear=true)Do this after every navigation and after every significant interaction. Silent JS errors are high-value findings.
-
Take an annotated screenshot to visually assess the page and identify interactive elements:
browser_vision(question="Describe the page layout, identify any visual issues, broken elements, or accessibility concerns", annotate=true)The
annotate=trueflag overlays numbered[N]labels on interactive elements. Each[N]maps to ref@eNfor subsequent browser commands. -
Test interactive elements systematically:
- Click buttons and links:
browser_click(ref="@eN") - Fill forms:
browser_type(ref="@eN", text="test input") - Test keyboard navigation:
browser_press(key="Tab"),browser_press(key="Enter") - Scroll through content:
browser_scroll(direction="down") - Test form validation with invalid inputs
- Test empty submissions
- Click buttons and links:
-
After each interaction, check for:
- Console errors:
browser_console() - Visual changes:
browser_vision(question="What changed after the interaction?") - Expected vs actual behavior
- Console errors:
Phase 3: Collect Evidence
For every issue found:
-
Take a screenshot showing the issue:
browser_vision(question="Capture and describe the issue visible on this page", annotate=false)Save the
screenshot_pathfrom the response — you will reference it in the report. -
Record the details:
- URL where the issue occurs
- Steps to reproduce
- Expected behavior
- Actual behavior
- Console errors (if any)
- Screenshot path
-
Classify the issue using the issue taxonomy (see
references/issue-taxonomy.md):- Severity: Critical / High / Medium / Low
- Category: Functional / Visual / Accessibility / Console / UX / Content
Phase 4: Categorize
- Review all collected issues.
- De-duplicate — merge issues that are the same bug manifesting in different places.
- Assign final severity and category to each issue.
- Sort by severity (Critical first, then High, Medium, Low).
- Count issues by severity and category for the executive summary.
Phase 5: Report
Generate the final report using the template at templates/dogfood-report-template.md.
The report must include:
- Executive summary with total issue count, breakdown by severity, and testing scope
- Per-issue sections with:
- Issue number and title
- Severity and category badges
- URL where observed
- Description of the issue
- Steps to reproduce
- Expected vs actual behavior
- Screenshot references (use
MEDIA:<screenshot_path>for inline images) - Console errors if relevant
- Summary table of all issues
- Testing notes — what was tested, what was not, any blockers
Save the report to {output_dir}/report.md.
Alternative Workflow: Source-Code-First (for multi-service / slow-browser sites)
When the target site has source code available, or browser automation is too slow/times out:
Step 1: Clone and Map the Codebase
git clone <repo_url> /tmp/qa-target
cd /tmp/qa-target
find . -name "routes*.py" -o -name "main.py" -o -name "pages.py" -o -name "admin.py" | sort
Step 2: Enumerate All Routes
Read each route file and extract:
- HTTP method + path (e.g.,
GET /posts/{slug}) - Required auth/permissions
- Rate limits
- Form fields and validation rules
Build a complete URL inventory — this is your test matrix.
Step 3: Analyze Static Assets and Templates
Check template files for:
- CSS variable definitions (look for
:rootblocks) - JS includes (what scripts are loaded vs missing)
- Encoding issues (BOM markers, leading newlines before DOCTYPE)
- Accessibility:
altattributes,user-scalable, skip links
Step 4: HTTP-Level Testing (no browser needed)
Use curl to test:
- Page loads (HTTP status codes)
- Static asset availability
- Response headers (security headers, CSP)
- Redirect chains (login flows)
- API endpoints (with/without auth cookies)
Step 5: Generate Structured Test Plan
Output a markdown document with:
- Service architecture table
- Test accounts and auth mechanism
- Per-module test case tables:
编号 | 测试内容 | 测试步骤 | 预期结果 | 优先级 - Known issues found during source analysis
- Cross-cutting concerns (consistency, accessibility, security)
Step 6: Selective Browser Execution
Only use browser automation for:
- Login/register interactive flows
- Visual verification of known issues
- Console error capture
- Screenshot evidence
Alternative Workflow: Test Plan Gap Analysis (improving existing plans)
When the user already has a test plan and wants to improve/complete it against source code:
Step 1: Load Both Inputs in Parallel
1. Read the existing test-plan.md
2. Clone or pull the target repo
3. Use delegate_task to analyze ALL route files + security mechanisms in one pass
- Extract: endpoints, form fields, rate limits, cookie params, CSRF mechanism,
validation rules, ownership models, public APIs
- Focus on FEATURES not COVERED by the existing plan
Step 2: Systematic Comparison
For each service, compare source code findings against existing test cases:
- Endpoints: Are all HTTP methods + paths covered?
- Validation rules: Password complexity, username blacklists, email uniqueness, slug format
- Rate limits: Are all limiters documented with correct values?
- Security mechanisms: CSRF token format/expiry, cookie attributes, redirect validation
- APIs: Public JSON APIs, service-to-service APIs, ownership isolation
- Edge cases: CRLF normalization, content size limits, cascading deletes
Step 3: Patch, Don't Rewrite
Use targeted patch edits to add missing test cases within existing sections:
- Insert after the related existing case (e.g.,
A-015aafterA-015) - Use sub-numbering convention:
X-NNNafor insertions betweenX-0NNandX-0NN+1 - Preserve existing case numbers — never renumber
- Add new subsections only when the entire category is missing (e.g., Service API)
Step 4: Update Statistics
After all patches:
# Count per-module
grep -c '^| H-[0-9]' test-plan.md # Home
grep -c '^| A-[0-9]' test-plan.md # Auth
# ... etc for each prefix
# Count total (catches sub-numbered too)
grep -c '^| [A-Z]*-[0-9]' test-plan.md
Update both the header stats table and the footer summary table. Bump the version number.
Step 5: Commit
git add test-plan.md && git commit -m "vN.M: 完善测试计划,新增 X 个测试用例 (old→new)" && git push
Pitfalls for Gap Analysis
- Don't renumber existing cases: Use
X-NNNasub-numbering to insert between existing cases. Renumbering breaks any existing references (issue tracker, test automation). - Count carefully:
grep -c '^| X-[0-9]'misses sub-numbered entries likeA-015a. Use'^| [A-Z]*-[0-9]'for total count, but per-module counts with the prefix filter are usually accurate enough. - Don't duplicate: Check if a concept is already covered under a different name before adding. "草稿可见" and "草稿预览" might be the same test.
- delegate_task for source analysis: Don't read 40+ route files manually. A single delegate_task with a well-structured prompt produces a complete analysis in one pass.
Alternative Workflow: Module-by-Module Testing with Incremental Commits
When the user has an existing test plan (e.g., test-plan.md in a repo) and wants to execute it module by module, committing results after each:
Step 1: Initialize Results Document
Create test-results.md with a summary table and placeholder sections for every module. Include: module name, status (⏳), execution time, and empty test result tables.
Step 2: Test Module → Update → Commit Loop
For each module:
- Execute tests (curl for HTTP-level, Playwright for browser-level)
- Update the module's section in
test-results.mdwith results - Update the summary table (pass/fail/blocked counts)
- Add a "模块 N 小结" section with key findings
- Add a "💡 模块 N 优化建议" section with prioritized recommendations (user explicitly wants these persisted in the document, not just in chat)
git add test-results.md && git commit -m "模块N: 通过X/失败Y" && git push- Report progress to user before starting next module
- ⚠️ If any test modified content, restore it BEFORE committing
Step 3: Parallel Delegation
Use delegate_task with 3 parallel tasks for curl-based modules. Each task tests a group of modules and returns JSON results. Browser-based modules must run sequentially.
Pitfalls for Module-by-Module
- Don't wait until the end to commit: Session may break, losing all work
- Restore content after destructive tests: Save state before, verify after
- Rate limiting blocks repeated tests: Test rate-limited endpoints last
- CSRF token sync: Use same cookie jar for GET+POST (see
references/multi-service-qa.md) - Optimization suggestions go in the document, not just the chat: User wants them persisted
Deliverables: Split Test Plan + Issue List
Always split QA output into two separate documents:
test-plan.md— Structured test cases with execution steps and expected resultsissue-list.md— Known issues found during analysis, with severity and fix suggestions
Do NOT merge them into one report. Users need the test plan for execution assignment and the issue list for bug tracking. Each document should be self-contained.
Test Plan Structure
- Service architecture table (services, URLs, ports)
- Test accounts and auth mechanism explanation
- Per-module test case tables:
编号 | 测试内容 | 测试步骤 | 预期结果 | 账号 | 优先级 - Cover both public pages AND admin/management pages
- Include a cross-cutting section for security, consistency, accessibility
Issue List Structure
- Summary table (count by severity level)
- Per-issue entries: module, page, phenomenon, impact, root cause, source code location, fix suggestion, priority
- Consistency matrix (which pages have which features/assets)
- Fix priority recommendation (immediate / soon / backlog)
Comprehensive QA Dimensions Checklist
When the user asks for "full" or "complete" testing, cover ALL of these dimensions. If you only covered page navigation and login flows, the plan is incomplete.
Core (always cover)
- Functional — Page loads, navigation, CRUD operations, form submissions
- Auth & Permissions — Login/logout, RBAC, cross-service cookie propagation, admin vs user access
- Input Validation — Form validation, empty submissions, boundary values, special characters
Security (cover for any site with user input)
- Cookie Security — HttpOnly, Secure, SameSite attributes; Max-Age; domain scope
- CSRF Protection — Token presence, double-submit pattern, token expiry, replay resistance
- Redirect Safety — Open redirect via
redirectparameter; validate against allowed domains - Rate Limiting — Per-endpoint limits; account lockout; IP-based limits
- File Upload Safety — Allowed extensions, size limits, filename sanitization, path traversal prevention
- Input Injection — XSS in user-generated content, SQL injection attempts, path traversal in slugs
Session & State
- Token Lifecycle — Expiration behavior, role changes mid-session (DB role vs token role), token format validation
- Concurrent Access — Race conditions on shared resources, optimistic locking
Content & Rendering
- Edge Case Content — Empty states, very long text, special characters (CJK, emoji), Markdown/LaTeX rendering
- Encoding — BOM markers, UTF-8 consistency, DOCTYPE prefix cleanliness
SEO & Metadata
- Meta Tags —
<title>,<meta description>, canonical URLs - Open Graph —
og:title,og:description,og:image,og:urlper page - Structured Data — robots.txt, sitemap.xml, RSS feed validity
Accessibility
- WCAG Basics —
altattributes,user-scalable, color contrast, skip-to-content links - Keyboard Navigation — Tab order, focus management, ARIA labels
Performance & Compatibility
- Page Load — Static asset availability, CDN reliability (especially in China), resource count
- Responsive Design — Breakpoints, mobile layout, touch targets
- Cross-Browser — Chrome/Firefox/Safari/Edge rendering differences
Operations
- Health Checks —
/healthendpoint availability per service - Error Handling — 404 pages, 500 error responses, graceful degradation
- Logging & Audit — Audit trail for admin actions, login attempts
Consistency (cross-service)
- Asset Inclusion — Which pages include mobile.css, loader.js, etc.
- Navigation — Which pages have site-wide nav bar
- Security Headers — X-Content-Type-Options, X-Frame-Options, Referrer-Policy, CSP
Pitfalls
- 🔴 CRITICAL: Always backup content before write operations: When testing CRUD endpoints (save, publish, create, update), the test payload (including XSS test strings, dummy data, empty fields) CAN overwrite real production content. Before any write test:
curl -s -b cookies SITE/admin→ extract current content_json / initialContent → save to/tmp/backup_<service>.json- Perform test
- Restore original content via Playwright (set form fields +
collectFormData()+ submit)
- This is not optional. A session that deletes user content without restoring it is a failed session.
- 🔴 CRITICAL: Restore content IMMEDIATELY after destructive tests: Don't wait until end of session. If a test modifies content, restore it in the same turn. Session interruptions, timeouts, or context limits can prevent later restoration.
- 🔴 CRITICAL: XSS payloads in form fields persist: When you fill a form field with
<script>alert(1)</script>for XSS testing, that value gets saved to the database if the form is submitted. Always use Playwright'spage.evaluate()to set values directly on form elements, NOTpage.fill()which triggers input events that may activate auto-save. - ⚠️ Do NOT parallelize browser delegate_tasks for QA: Each browser interaction is slow (navigate + snapshot + screenshot = 10-30s). 3 parallel browser tasks will all timeout at 600s. Run browser tests sequentially or use source-code-first mode.
- ⚠️ Curl-only delegate tasks also timeout with large batches: A delegate_task with 30+ curl test cases can hit the 600s limit (each curl call = 1-3s + overhead). Split large test batches into smaller tasks (~15-20 cases each) or use
execute_codewithfrom hermes_tools import terminalfor direct in-process execution (faster, no delegation overhead). - ⚠️ Client-side-only validation is a security finding: When CSP blocks inline JS (see
script-src-elempitfall), any validation that only exists in client JS (password strength, field format, confirmation matching) becomes bypassable. Always test registration/submission with curl to verify server-side validation exists independently. - ⚠️ API authentication order matters: Some endpoints validate request body BEFORE checking authentication, returning 422 (validation error) instead of 401 (unauthenticated). Test:
curl -X POST /api/endpoint -d 'invalid'without auth — should get 401, not 422. This is a security issue (leaks endpoint existence and field requirements). - ⚠️ Fulltext search can silently fail: Search endpoints with
mode=fulltextmay return 0 results whilemode=simpleworks fine. Always test both modes with the same query. Common causes: search index not built, tokenizer (jieba) not installed, BM25 ranking misconfigured. - ⚠️ Rate limiting blocks subsequent tests: Registration endpoints with strict limits (e.g., 6/hour) will block all remaining registration-related tests with 429. Strategy: test non-registration endpoints first, registration tests last, and note which tests were blocked.
- ⚠️ Present the test plan BEFORE executing: Show the user the complete test plan first. If they say "is this really all of it?", the plan is missing dimensions. Refer to the Comprehensive QA Dimensions Checklist above.
- ⚠️ "全部加上" means ALL dimensions: When the user says to add everything, do not skip any dimension. Write all 25+ categories into the test plan even if some have only 1-2 test cases.
- Multi-service auth: Sites with shared cookies (e.g.,
.ephron.rendomain) need login on ONE service first, then verify cookie propagation to others. Don't try to login on each service independently. - Encoding bugs: Always hex-dump HTML source to check for BOM markers (
ef bb bf) or leading newlines before DOCTYPE. Use:xxd file.html | head -5. For Python source files, also check:xxd file.py | head -1. - CSRF tokens: Many form submissions require CSRF tokens. Extract from the page first, then include in POST requests. Don't forget the CSRF cookie (
ephron_csrf). Note: CSRF cookies are HttpOnly=false (by design, so JS can read them). - Rate limits: Note rate limit values from source code (e.g.,
@limiter.limit("5/minute")). When testing auth failures, stay under the limit or you'll get 429s that mask the real bug. - Template vs runtime issues: Some issues (empty content, missing sections) may be data issues, not code bugs. Verify by checking if the data source (database/content files) actually has content.
- File delivery fallback: When sending files via QQ/WeChat fails, push to a Gitea repo as a fallback delivery mechanism.
- Source code security analysis: Always check these files when available:
cookie_utils.py(cookie params),csrf.py(CSRF mechanism),redirect.py(open redirect validation),security_headers.py(CSP/headers),auth.py(token format, lockout),validators.py(slug/path validation),limiter.py(rate limit config). - ⚠️ CSP
script-src-elemsilently kills inline JS: When a page has inline<script>but buttons call functions defined there (e.g.,onclick="saveDraft()"), always verify the CSP header. Thescript-src-elemdirective overridesscript-srcfor script elements — soscript-src 'unsafe-inline'combined withscript-src-elem 'self' https://cdn.example.comblocks ALL inline scripts. Symptoms: functions report "not defined", buttons do nothing, no network requests on click. Detection: checktypeof fnNamein browser console, or look for CSP error in console:Executing inline script violates the following Content Security Policy directive 'script-src-elem'. Fix: add'unsafe-inline'toscript-src-elem, use nonce/hash, or extract inline scripts to external.jsfiles. - ⚠️ CSP
form-action 'self'blocks cross-origin redirects after form submission: When a form POSTs to a same-origin endpoint (allowed byform-action 'self'), but the server responds with 303 redirect to a different origin (e.g.,auth.example.com→www.example.com), the browser blocks the redirect. CSPform-actionapplies to the entire redirect chain resulting from form submission, not just the form's action URL. Symptoms: form appears to submit (POST in network tab), cookie gets set server-side, but page stays on the form URL — no navigation. Console error:Sending form data to '...' violates the following Content Security Policy directive: "form-action 'self'". Detection: (1) test same-origin redirect (should work) vs cross-origin redirect (should fail); (2)curl -sIthe 303 response — if it carries CSP withform-action 'self', that's the blocker. Fix options: (a) skip CSP header on 303 redirect responses (empty body, CSP adds no protection); (b) use JS-based redirect instead of server-side 303; (c) add allowed origins toform-action. Key insight: this breaks any auth flow where login service is on a different subdomain than target pages. Seereferences/session-learnings-ephron-qa.mdfor full reproduction steps.
Scope Ambiguity Pitfall
When the user asks to "inspect a server" or "巡检服务器" without providing a URL:
- Clarify whether they mean the local machine Hermes runs on (system resources, running processes, disk/memory) or a remote web service (HTTP endpoints, app health).
- Default assumption: If the user mentions a domain name (e.g., "巡检 ephron.ren" or "check blog.ephron.ren"), they mean the remote web service. If they say "your server" or "the machine you're on", they mean the local machine.
- When in doubt, ask: "是巡检本机还是远程服务?"
Tools Reference
| Tool | Purpose |
|---|---|
browser_navigate |
Go to a URL |
browser_snapshot |
Get DOM text snapshot (accessibility tree) |
browser_click |
Click an element by ref (@eN) or text |
browser_type |
Type into an input field |
browser_scroll |
Scroll up/down on the page |
browser_back |
Go back in browser history |
browser_press |
Press a keyboard key |
browser_vision |
Screenshot + AI analysis; use annotate=true for element labels |
browser_console |
Get JS console output and errors |
| Playwright Python | Full browser automation via script — use when built-in tools unavailable or need programmatic control (see references/playwright-qa.md) |
Related References
references/issue-taxonomy.md— severity and category classification for issuesreferences/server-inspection.md— local server inspection checklist: system resources, listening ports, processes, Docker, security services; also covers scope ambiguity (local vs. remote), route file reading strategy, cross-service cookie auth testing, static analysis checksreferences/qa-dimensions-checklist.md— comprehensive 25-dimension QA checklist for "full site" testing requestsreferences/playwright-qa.md— Playwright Python setup, patterns, event monitoring, CSP bug detectionreferences/session-learnings-ephron-qa.md— concrete findings from ephron.ren QA: CSP override, password validation gaps, fulltext search failure, delegate sizing
Templates
templates/dogfood-report-template.md— issue list template (the output with bugs found)templates/test-plan-template.md— test plan template (structured test cases with steps)
Tips
- Always check
browser_console()after navigating and after significant interactions. Silent JS errors are among the most valuable findings. - Use
annotate=truewithbrowser_visionwhen you need to reason about interactive element positions or when the snapshot refs are unclear. - Test with both valid and invalid inputs — form validation bugs are common.
- Scroll through long pages — content below the fold may have rendering issues.
- Test navigation flows — click through multi-step processes end-to-end.
- Check responsive behavior by noting any layout issues visible in screenshots.
- Don't forget edge cases: empty states, very long text, special characters, rapid clicking.
- When reporting screenshots to the user, include
MEDIA:<screenshot_path>so they can see the evidence inline.