Back to Blog

OpenAI Agent Skills - Versioned Instruction Bundles for Smarter Agents

April 22, 20268 min readMichael Ridland

If you've built agents with OpenAI's API, you've probably hit a familiar problem. You write detailed instructions for how an agent should handle a specific task - style guides, data processing steps, coding conventions - and then you copy-paste those same instructions across multiple agent configurations. When something changes, you update one copy and forget the others. Within a few weeks, your agents are all running slightly different versions of the same workflow.

Agent Skills are OpenAI's answer to this. They let you package a bundle of files and a SKILL.md manifest into a versioned, reusable unit that you can attach to any agent. Think of them as shared libraries for agent behaviour - except instead of code, they contain instructions, templates, and reference material.

The OpenAI Skills documentation covers the API. I want to talk about when Skills make sense, how they compare to other approaches, and the security implications that are easy to underestimate.

What a Skill Actually Is

A Skill is a zip file (or multipart upload) containing:

  1. A SKILL.md file with front matter metadata and markdown instructions
  2. Any supporting files the agent might need - templates, config files, example data, reference docs

When you attach a Skill to an agent's shell environment, the platform tells the model what Skills are available (name, description, path). The model decides whether to use a Skill based on this metadata. If it does, it reads the full SKILL.md for detailed instructions.

The key insight is that Skills are versioned. You can upload a new version of a Skill without affecting agents that are pinned to an earlier version. You set a default version that new agents pick up automatically. And you can explicitly reference version numbers or "latest" when attaching Skills.

This versioning is what makes Skills genuinely useful rather than just a fancy way to provide instructions. In a team environment, you can update a workflow without breaking agents that are already in production.

Hosted vs. Local Execution

Skills work in two contexts, and the distinction matters.

Hosted shell runs your agent's code in an OpenAI-managed container. Skills are attached as skill_reference objects with a skill ID:

from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="gpt-5.4",
    tools=[
        {
            "type": "shell",
            "environment": {
                "type": "container_auto",
                "skills": [
                    {"type": "skill_reference", "skill_id": "your-skill-id"},
                    {"type": "skill_reference", "skill_id": "another-skill", "version": 2},
                ],
            },
        }
    ],
    input="Process the quarterly report using the report-builder skill.",
)

Local shell runs on your own machine. You can't use skill_reference here - instead, you point to local file paths:

response = client.responses.create(
    model="gpt-5.4",
    tools=[
        {
            "type": "shell",
            "environment": {
                "type": "local",
                "skills": [
                    {
                        "name": "csv-insights",
                        "description": "Summarise CSV files and produce a markdown report.",
                        "path": "/path/to/csv-insights-skill",
                    },
                ],
            },
        }
    ],
    input="Summarise today's CSV reports using the csv-insights skill.",
)

The local mode is interesting for development. You edit the SKILL.md on your filesystem, re-run the agent, and see the changes immediately. No upload step, no version management. Once you're happy with the Skill, you upload it to hosted mode for production use.

When Skills Make Sense

Not every agent needs Skills. Here's where we've found them genuinely valuable:

Standardising processes across teams. One of our clients had five different teams building customer-facing reports. Each team had their own formatting conventions, data validation steps, and output templates. We packaged these into a single "report-standards" Skill. Now every agent that generates reports follows the same rules, and updates propagate centrally.

Encoding domain knowledge. Insurance claim processing, regulatory compliance checks, medical triage workflows - these involve detailed decision trees and rules that change periodically. A Skill captures that knowledge in a format the agent can follow, and version control means you can track exactly which rules were in effect when a particular decision was made.

Multi-step workflows. A Skill can describe a complete workflow with numbered steps, expected inputs/outputs at each stage, and error handling procedures. This is more reliable than cramming everything into a system prompt, because the agent reads the Skill only when it needs it rather than trying to hold the entire workflow in context at all times.

Reusable tool configurations. If your agents frequently interact with specific APIs or databases, a Skill can describe how to authenticate, what endpoints to use, how to handle pagination, and what error codes mean. Attach the same Skill to any agent that needs to talk to that system.

When Skills Don't Make Sense

Simple, one-off instructions. If your agent does one thing and its instructions fit in a paragraph, just put them in the system prompt. A Skill adds complexity for no benefit.

Rapidly iterating prototypes. During early development, you're changing instructions constantly. The upload-version-attach cycle slows you down. Use local mode during development and switch to hosted Skills when things stabilise.

Sensitive credentials. Never put API keys, passwords, or tokens in a Skill. The Skill contents are read by the model and could appear in logs, outputs, or error messages. Use environment variables or a secrets manager for credentials.

The Security Side - Take This Seriously

OpenAI's documentation includes a section on security that I think deserves more attention than it typically gets.

Skills are treated as user prompt input, not system prompt input. This means the model handles Skill instructions with the same priority as other user-provided text. That's a design choice with implications.

Prompt injection risk is real. A malicious SKILL.md could contain instructions that override the agent's intended behaviour. If you're using third-party Skills or allowing users to select Skills from a catalogue, you're exposing yourself to prompt injection attacks. An attacker could craft a Skill that instructs the agent to exfiltrate data, call unauthorised APIs, or bypass safety checks.

OpenAI's own recommendation is clear: don't expose an open Skills repository to end-users. Skills should be vetted by developers and then presented to users through controlled product experiences. The user picks from a curated menu, not from an open catalogue.

Network access amplifies the risk. If your agent's shell environment has network access (which it often does), a compromised Skill could instruct the agent to send data to external endpoints. Review every Skill that will run in a network-enabled environment.

Require approval for write operations. Any Skill that involves modifying data, calling external APIs, or making changes to systems should include an explicit approval step. Don't let Skills auto-execute high-impact actions without human review.

Our approach with clients is to treat Skills like we treat code - they go through review, they get version controlled in Git before uploading, and production Skills are only deployed through a controlled pipeline.

Versioning Strategy

The versioning system is simple but requires some discipline:

  • latest_version always points to the most recently uploaded version.
  • default_version is what agents get when they don't specify a version. This doesn't change automatically when you upload a new version.
  • You can pin agents to specific version numbers or to "latest."

For production agents, pin to specific versions. Update the default version only after testing. This is the same principle as pinning dependency versions in package managers - "latest" is convenient for development but dangerous in production.

One gotcha - you can't delete the default version. If you need to remove a version, set a different one as default first. Deleting the last remaining version deletes the entire Skill, and deleting a Skill removes all its versions. These cascading deletes can bite you if you're not careful.

Curated and Inline Skills

OpenAI maintains a set of first-party curated Skills (like openai-spreadsheets) that you can reference by ID. These are useful starting points, but don't rely on them for production workflows without testing them thoroughly. First-party Skills can change between versions, and their behaviour might not match your specific requirements.

If you don't want to create a hosted Skill, you can inline a base64-encoded zip bundle directly in the API call. This is useful for one-off or dynamic Skills that are generated programmatically. But it means you lose versioning, you send more data in each API call, and the Skill lives only as long as the request.

Building Your Skills Library

If you're investing in OpenAI's agent platform, building a well-maintained Skills library gives you compounding returns. Each new agent can draw on existing Skills rather than starting from scratch.

Start small. Pick one or two workflows that are repeated across agents and package them as Skills. Get the versioning discipline right on those before expanding.

Document your Skills the same way you'd document internal libraries. What does this Skill do? What inputs does it expect? What version should production agents use? Who owns it?

We help organisations build out their AI agent infrastructure, including Skills libraries and the governance around them. If you're working with OpenAI's platform and want to structure your agent development properly, our AI agent development team can help. We also do broader agentic automation work that covers multi-platform agent strategies - because most organisations end up using more than one agent framework.

For the full API reference and code examples, see the OpenAI Skills documentation. And if you're thinking about AI strategy more broadly - where agents fit, what they should do, and how to govern them - our AI strategy consultants work through those questions with leadership teams across Australian organisations.