Writing Good Skills for Claude Agent SDK - Practical Authoring Tips
Skills are how you teach Claude agents to do specific things well. Think of them as reusable instruction sets - a skill might tell Claude how to process PDFs, run database migrations, or follow your company's code review process. When a skill is well-written, Claude picks the right one at the right time and follows it reliably. When a skill is poorly written, Claude either ignores it, misapplies it, or burns through context window tokens without adding value.
We've built enough agent systems using the Claude Agent SDK to have opinions about what separates a good skill from a bad one. Anthropic's official skill authoring best practices covers the principles in detail. Here's what those principles look like in practice.
The Context Window Is a Shared Resource
This is the single most important concept to internalise when writing skills. Every token in your skill competes for space with the conversation history, the system prompt, other skills' metadata, and the user's actual request. The context window is finite, and your skill doesn't get to hog it.
Here's the thing people miss: Claude is already very smart. You don't need to explain what a PDF is. You don't need three paragraphs of background on why database migrations are important. You need to tell Claude the specific things it doesn't already know - your particular tool choices, your exact command syntax, your organisation's naming conventions.
A concrete example. This is too verbose:
## Extract PDF text
PDF (Portable Document Format) files are a common file format that contains
text, images, and other content. To extract text from a PDF, you'll need to
use a library. There are many libraries available for PDF processing, but
pdfplumber is recommended because it's easy to use and handles most cases well.
This is better:
## Extract PDF text
Use pdfplumber for text extraction:
```python
import pdfplumber
with pdfplumber.open("file.pdf") as pdf:
text = pdf.pages[0].extract_text()
The second version is about a third of the tokens. It assumes Claude knows what PDFs are and how Python libraries work. It gives the specific tool choice and the exact syntax. That's all Claude needs.
When I'm reviewing a skill, I read every sentence and ask: "Does Claude genuinely need this information?" If the answer is "probably not," I cut it. Aggressive editing makes skills better, not worse.
## Matching Freedom to Fragility
Not every skill should be a rigid script. And not every skill should be a loose set of guidelines. The trick is matching the level of specificity to how fragile the task is.
**High freedom** works for tasks where multiple approaches are valid and Claude can make good contextual decisions. Code review is a good example. You might write:
```markdown
## Code review process
1. Analyse the code structure and organisation
2. Check for potential bugs or edge cases
3. Suggest improvements for readability and maintainability
4. Verify adherence to project conventions
Claude can interpret each step differently depending on the codebase, the language, and the PR being reviewed. That's appropriate because there's no single correct way to review code.
Low freedom works for tasks that are fragile, error-prone, or must be consistent every time. Database migrations, deployment scripts, anything where the wrong command causes data loss:
## Database migration
Run exactly this script:
```bash
python scripts/migrate.py --verify --backup
Do not modify the command or add additional flags.
No room for interpretation. No "customise as needed." Just do this exact thing. When the consequences of getting it wrong are serious, you want Claude on rails.
**Medium freedom** sits in between - a preferred pattern with parameters Claude can adjust:
```markdown
## Generate report
Use this template and customise as needed:
```python
def generate_report(data, format="markdown", include_charts=True):
# Process data
# Generate output in specified format
# Optionally include visualisations
The analogy I keep coming back to: imagine Claude is walking along a path. If the path crosses a narrow bridge over a ravine, you want guardrails and very specific instructions. If the path crosses an open field, you just need to point Claude in the right direction.
We've found that most organisations over-specify their skills initially. They write three pages of instructions for tasks where a paragraph would do. Then they wonder why Claude is spending so much context window on skill content and losing track of the actual conversation. Start with less. Add detail only where Claude consistently gets things wrong.
## Naming Skills Properly
Skill names matter more than you'd think because they're the first thing Claude sees when deciding which skill to use. With potentially hundreds of skills loaded, the name needs to communicate what the skill does at a glance.
The convention that works best is gerund form - verb plus -ing:
- `processing-pdfs`
- `analysing-spreadsheets`
- `managing-databases`
- `testing-code`
Names must be lowercase with hyphens only. No special characters, no spaces. Maximum 64 characters. Can't include "anthropic" or "claude" as reserved words.
Avoid vague names. `helper`, `utils`, `tools` - these tell Claude nothing. `documents`, `data`, `files` - equally unhelpful. If you have a skill for generating financial reports from Excel data, call it `generating-financial-reports` not `excel-helper`.
Consistency matters too. Pick a naming pattern and stick with it across your entire skill library. When all your skills follow the same convention, Claude gets better at finding the right one.
## Writing Descriptions That Actually Work
The description field is how Claude discovers your skill. At startup, Claude loads the name and description of every available skill. The full skill content (SKILL.md) only gets loaded when Claude decides a skill is relevant. So the description is your skill's elevator pitch - it needs to convey both what the skill does and when to use it.
Write in third person. "Processes Excel files and generates reports" not "I can help you process Excel files." The description is injected into Claude's system prompt, and inconsistent point-of-view causes discovery problems.
Include specific trigger terms. If your skill handles .xlsx files, mention ".xlsx" in the description. If it's about pivot tables, say "pivot tables." Claude matches on these terms when deciding relevance.
Good examples:
```yaml
description: Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction.
description: Generate descriptive commit messages by analysing git diffs. Use when the user asks for help writing commit messages or reviewing staged changes.
Bad examples:
description: Helps with documents
description: Processes data
The first set tells Claude exactly when to activate. The second set is so vague that Claude either activates it for everything or nothing.
Progressive Disclosure
Good skills don't dump everything upfront. They structure information so Claude reads the most important parts first and only loads additional detail when needed.
The pattern that works: SKILL.md contains the core instructions - the 80% of cases. For edge cases, reference separate files that Claude can read if needed. This keeps the common path fast and context-efficient while still handling unusual situations.
Think of it like documentation with expandable sections. The summary is always visible. The detail is one click away if needed. Except in this case, "one click" is Claude reading an additional file.
Testing Across Models
This catches a lot of teams off guard. A skill that works perfectly with Claude Opus might be too vague for Claude Haiku. Opus can infer intent from brief instructions. Haiku needs more explicit guidance.
If you're using skills across different Claude models - maybe Opus for complex tasks and Haiku for quick ones - test your skills with each model you plan to use. You might need different versions of the same skill, or you might need to add a bit more detail to make it work reliably with the faster models.
The sweet spot is instructions detailed enough for Haiku to follow correctly but concise enough that they don't waste context for Opus. It's a balancing act, and the only way to find the right level is to test with real tasks.
Common Mistakes We See
Skills that try to do too much. A skill for "managing the entire codebase" is too broad. Break it into focused skills: one for code review, one for testing, one for documentation, one for deployment.
Skills that explain the problem instead of the solution. Three paragraphs about why testing matters, followed by two lines about how to run the tests. Flip the ratio. Claude already knows testing matters.
Skills that duplicate Claude's built-in knowledge. You don't need a skill that explains how Python imports work or what a REST API is. Use skills for organisation-specific knowledge that Claude can't infer.
Skills without clear activation triggers. If the description doesn't tell Claude when to use the skill, Claude won't use it reliably. Every skill description should answer: "What situation should trigger this skill?"
Skills that are never updated. Your codebase evolves, your processes change, but your skills stay frozen in time. Review skills periodically and update them when they no longer match reality.
Building Better Agent Systems
Skills are one piece of a larger agent architecture. Getting them right makes your agents more reliable, more efficient, and easier to maintain. Getting them wrong means your agents burn context on irrelevant instructions and miss the tasks they should be handling.
If you're building AI agents and want help designing the skill architecture - or the broader agent system - our AI agent development team works with Australian organisations on production agent deployments. We also help teams that are using the Claude Agent SDK and need guidance on structuring their agent pipelines effectively.
For organisations earlier in their AI journey, our AI strategy consulting practice can help you figure out where agents fit in your operations and which tasks are worth automating first. Not everything needs an agent, and knowing where to start is half the battle.