Skip to main content

Overview

The writing-skills skill applies Test-Driven Development principles to creating process documentation. Just as TDD requires writing failing tests before code, skill creation requires testing with agents before writing the skill document.
Core Principle: If you didn’t watch an agent fail without the skill, you don’t know if the skill teaches the right thing.

What is a Skill?

A skill is a reference guide for proven techniques, patterns, or tools that helps coding agents find and apply effective approaches. Skills are:
  • Reusable techniques
  • Proven patterns
  • Tool documentation
  • Reference guides
Skills are NOT:
  • Narratives about solving one specific problem
  • Project-specific conventions (use CLAUDE.md for those)
  • Mechanical constraints (automate those instead)

TDD Mapping for Skills

TDD ConceptSkill Creation
Test casePressure scenario with subagent
Production codeSkill document (SKILL.md)
Test fails (RED)Agent violates rule without skill
Test passes (GREEN)Agent complies with skill present
RefactorClose loopholes while maintaining compliance
Write test firstRun baseline scenario BEFORE writing skill
Watch it failDocument exact rationalizations
Minimal codeWrite skill addressing specific violations
Watch it passVerify agent now complies
Refactor cycleFind new rationalizations → plug → re-verify

When to Create a Skill

  • Technique wasn’t intuitively obvious to you
  • You’d reference this again across projects
  • Pattern applies broadly (not project-specific)
  • Others would benefit from the knowledge
  • One-off solutions to specific problems
  • Standard practices well-documented elsewhere
  • Project-specific conventions (use project docs)
  • Mechanical constraints (automate with linters/validation)

Skill Types

Technique

Concrete method with steps to follow. Examples: condition-based-waiting, root-cause-tracing

Pattern

Way of thinking about problems. Examples: flatten-with-flags, test-invariants

Reference

API docs, syntax guides, tool documentation. Examples: Library APIs, command references

The RED-GREEN-REFACTOR Cycle

RED Phase: Write Failing Test

1

Create Pressure Scenarios

Design scenarios that would trigger the problematic behavior (3+ combined pressures for discipline skills).
2

Run Without Skill

Test with a subagent that doesn’t have access to the skill. Document exact behavior verbatim.
3

Identify Patterns

Capture the specific rationalizations and failures that occur.

GREEN Phase: Write Minimal Skill

1

Create SKILL.md

Write a skill that addresses the specific failures identified in RED phase.
2

Test With Skill

Run the same scenarios WITH the skill present. Agent should now comply.

REFACTOR Phase: Close Loopholes

1

Find New Rationalizations

Test again and capture any new ways agents try to skip the skill.
2

Add Explicit Counters

Update the skill to explicitly forbid each rationalization.
3

Re-test

Verify agents comply with the updated skill.

SKILL.md Structure

Every skill needs proper frontmatter and clear sections:
---
name: skill-name-with-hyphens
description: Use when [specific triggering conditions and symptoms]
---
  • Only two fields: name and description
  • Max 1024 characters total
  • Name format: Letters, numbers, hyphens only (no special characters)
  • Description format:
    • Start with “Use when…”
    • Include specific triggers and symptoms
    • Do NOT summarize the skill’s workflow
    • Third-person voice
  1. Overview - What is this? Core principle in 1-2 sentences
  2. When to Use - Bullet list with symptoms and use cases
  3. Core Pattern (for techniques) - Before/after code comparison
  4. Quick Reference - Table or bullets for common operations
  5. Implementation - Inline code or link to separate file
  6. Common Mistakes - What goes wrong + fixes

Claude Search Optimization (CSO)

Future agents need to FIND your skill. Optimize for discovery:

Rich Description Field

Purpose: Agents read descriptions to decide which skills to load.
CRITICAL: Description = When to Use, NOT What the Skill DoesSummarizing workflow in the description causes agents to follow the description instead of reading the full skill. Keep descriptions to triggering conditions only.
# ❌ BAD: Summarizes workflow
description: Use when executing plans - dispatches subagent per task with code review

# ✅ GOOD: Just triggering conditions
description: Use when executing implementation plans with independent tasks

Keyword Coverage

Use words agents would search for:
  • Error messages: “Hook timed out”, “ENOTEMPTY”
  • Symptoms: “flaky”, “hanging”, “zombie”
  • Synonyms: “timeout/hang/freeze”
  • Tool names: Actual commands, libraries

Descriptive Naming

Use active voice, verb-first:
  • creating-skills not skill-creation
  • condition-based-waiting not async-test-helpers

The Iron Law

NO SKILL WITHOUT A FAILING TEST FIRST
This applies to NEW skills AND EDITS to existing skills. No exceptions:
  • Not for “simple additions”
  • Not for “just adding a section”
  • Not for “documentation updates”
If you write a skill before testing, delete it and start over.

Testing Different Skill Types

Discipline-Enforcing Skills

Test with academic questions and pressure scenarios. Verify compliance under maximum pressure.

Technique Skills

Test with application scenarios and edge cases. Verify correct application to new scenarios.

Pattern Skills

Test with recognition and counter-examples. Verify correct identification of when to apply.

Reference Skills

Test with retrieval and application scenarios. Verify agents find and use information correctly.

Common Testing Excuses

ExcuseReality
”Skill is obviously clear”Clear to you ≠ clear to agents. Test it.
”It’s just a reference”References have gaps. Test retrieval.
”Testing is overkill”Untested skills always have issues.
”I’ll test if problems emerge”Test BEFORE deploying.
”Too tedious to test”Less tedious than debugging later.
”I’m confident it’s good”Overconfidence guarantees issues.

Quick Reference

Full Guide

Comprehensive skill creation guide with examples

Testing Methodology

Detailed testing process and pressure scenarios