If you've been spending serious time with AI coding agents — the kind that actually run inside your editor or terminal and do things, not just autocomplete — you've probably bumped into the concept of "skills." Maybe you read it in some documentation, maybe someone on your team mentioned it, maybe you stumbled into it while digging through a system prompt. Either way, it's one of those concepts that sounds obvious until you try to explain it to someone else and realize you only half-understood it.
This is my attempt to give a full, honest picture of what skills are, where they shine, where they fall flat, and — critically — when you should just not bother with them.
Let's start concrete. A skill, in the context of an AI agent harness, is a chunk of structured knowledge that gets injected into the agent's context before it works on a task. That's it at the core. It's not magic, it's not a fine-tuned model — it's curated information that tells the agent how to behave in a specific domain.
Think of it as a cheat sheet you staple to the front of the agent's workspace every time you spin it up for a particular kind of job. "Before you touch any .docx file, read this." The skill contains things like: what libraries are available, what edge cases trip you up, what the correct patterns look like, what the wrong patterns look like and why they fail, and what the output should look like when things go right.
In practice, skills look like Markdown files. They live somewhere in your repo or your harness configuration. They have a name, a description (which is what the agent uses to decide whether to load them), and a body full of accumulated tribal knowledge. The trigger — whether the skill gets loaded at all — is usually based on that description matching the task at hand, either through keyword matching or a smarter semantic check.
So for example: if you have a skill called docx with a description saying it covers creating and editing Word documents, and someone asks the agent to generate a report in .docx format, the harness pulls that skill into context before the agent starts writing code. The agent now knows things it wouldn't have known cold — like the fact that docx-js defaults to A4 paper instead of US Letter, or that you should never use unicode bullet characters and should use the LevelFormat.BULLET numbering config instead.
Without the skill, the agent might produce something that looks right but breaks in edge cases. With the skill, it gets those details right on the first try.
Models are trained on enormous datasets, but training data has a funny relationship with reality. It captures what's common, not necessarily what's correct in your specific environment. And it captures things as they were at training time, not as they are now.
Your environment is almost certainly unusual in some way. Maybe you're using a specific version of a library that has different behavior. Maybe your infrastructure has constraints that the model's general knowledge doesn't account for. Maybe your team has established patterns that deviate from what Stack Overflow would suggest. Maybe you've discovered — through pain — that some seemingly reasonable approach breaks in your production setup.
None of that lives in the model's weights. It lives in your team's heads, your runbooks, your post-mortems, your PR comments. Skills are a mechanism for getting that knowledge into the agent's context systematically, rather than hoping someone writes it in the prompt every single time.
There's also a more subtle reason skills matter: they reduce ambiguity. When an agent is working on a task that touches multiple possible approaches, having a skill that says "in this environment, use approach X, not Y, and here's why" cuts through a ton of hedging. The agent doesn't have to reason through tradeoffs it can't fully see — it just has the answer.
This part is worth being precise about, because there's a temptation to wave hands and say "more context = better output" without explaining the mechanism.
Models can be confidently wrong. If a model learned from documentation that says "you can do X," but your specific version of a library dropped support for X, the model will try to do X and it will fail. Worse, if it doesn't have access to run the code and see the error, it might not even know it failed.
A skill that says "X is not supported in this environment; use Y instead" completely sidesteps this. The failure mode is known, it's documented, and it's in the agent's context before it starts.
General model knowledge tells you what's possible. Skills tell you what's practical in your situation. A skill for, say, PDF generation in a Go service doesn't need to explain every PDF library in existence — it needs to say "we're using pdfcpu for this, here's how it's integrated, here are the patterns we've settled on." The agent can skip the exploration phase and go straight to implementation.
This is especially valuable in agentic workflows where each tool call or bash invocation has a cost — in time, in tokens, in potential side effects. Narrowing the solution space up front means fewer wrong turns.
Some things you only know because something broke. A skill about table rendering in .docx files might include a note like "tables need BOTH columnWidths on the table AND width on each cell — without both, it breaks in Google Docs but looks fine in Word." That's not in any library readme. That's the kind of thing you discover at 11pm before a demo.
Skills are how that knowledge stops dying with the person who discovered it and starts being available to every agent invocation from that point forward.
Rather than putting constraints in code (which is brittle), skills let you express constraints in language, close to where the reasoning happens. "Always use DXA units for table widths, never use percentage-based widths because they break in Google Docs" is much easier to maintain and adjust than a code-level validation that rejects percentage widths.
Skills are genuinely useful but they're not free, and they have real failure modes that don't get talked about enough.
This is the big one. A skill that was accurate six months ago but hasn't been updated since the library it describes had a major version bump is actively dangerous. The agent reads it, trusts it, and does the wrong thing with high confidence.
The discipline required to keep skills current is underestimated. If you add a skill, you need a process for updating it when the underlying reality changes. This is harder than it sounds. It requires whoever makes a change that invalidates a skill to know the skill exists and know to update it. That's a cultural and operational thing, not just a technical one.
Context windows are finite, and they're a shared resource. If you stuff ten skills into an agent's context because "they might be relevant," you've consumed space that could be used for the actual task, the codebase, the conversation history, or the output. You've also potentially introduced conflicting guidance — if two skills both say something about how to handle errors but give different advice, you've created ambiguity instead of removing it.
Good skill systems are selective. The skill description needs to be precise enough that the skill only loads when it's actually needed. "This skill covers creating .docx files" should not trigger for "analyze the contents of this .docx file" — those are different tasks with different needs.
If you write a skill that captures the wrong approach to something — maybe you misunderstood the library at the time, or you found a workaround for a bug that has since been fixed — that bad pattern now gets applied consistently, every time, with no questioning. The agent follows the skill. It doesn't second-guess it.
This is the flip side of the reliability benefit. Skills make the model more consistent, but consistency in the wrong direction is consistently wrong.
If you're using a skill to paper over a fundamental lack of understanding — either yours or the agent's — you'll eventually hit a situation the skill didn't anticipate, and the agent will have no good basis to reason from. Skills work best when they complement understanding, not substitute for it. A skill that explains what WidthType.DXA means is useful context. A skill that just says "use DXA everywhere" without explaining why is a cargo cult waiting to fail outside its happy path.
If the mechanism that decides which skills to load makes a wrong call, you get either a missing skill (agent doesn't have knowledge it needs) or an irrelevant skill (agent is working with noise). Both are bad. A skill about .pptx editing loaded during a Python scripting task doesn't help — it just wastes context. Getting the trigger logic right requires careful writing of skill descriptions, and it requires ongoing attention as the task landscape evolves.
This is the part that most documentation glosses over, so let's be direct.
Don't use a skill for general knowledge. If the model already knows something well — standard language syntax, widely-documented library patterns, basic algorithms — a skill adding that same information just wastes context. The model doesn't need a skill that explains how async/await works in TypeScript.
Don't use a skill for one-off tasks. If you're doing something once and have no expectation of repeating it, the investment in writing, reviewing, and maintaining a skill isn't worth it. Just write a thorough prompt for that specific task. Skills pay off through reuse.
Don't use a skill when the task is exploratory. Skills narrow the solution space. When you specifically want the agent to explore, to consider multiple approaches, to not have pre-existing constraints — loading a skill that says "here's how we do this" works against you. Some tasks benefit from the model's broader, unconstrained reasoning.
Don't use a skill when the domain is stable and simple. If a task is clearly within the model's training distribution and has no environment-specific quirks, a skill adds nothing. Not every task needs one. Coding agents work fine without skills on plenty of tasks; the skill mechanism exists for the cases where raw model knowledge isn't enough.
Don't use a skill when the skill itself isn't trustworthy yet. If you've just drafted a skill and haven't validated it against real tasks, don't load it in production agent runs. An untested skill is an untested piece of documentation — it might be wrong, incomplete, or misleading. Write it, test it on representative tasks, fix it, then deploy it.
Since skills are only as useful as the content inside them, it's worth talking about what makes a skill actually good.
Lead with what's environment-specific. The model already knows generic stuff. What it doesn't know is what's true in your environment. Front-load that.
Use negative examples. "Don't do X because Y" is often more useful than "do Z." People (and models) often understand constraints better when they understand what the constraint is protecting against.
Keep it actionable. A skill that explains theory without telling the agent what to actually do is philosophy, not guidance. Be specific. Show code patterns where possible.
Include the "why." A skill that says "always set explicit page size" is less useful than one that says "always set explicit page size — docx-js defaults to A4 and most of our users expect US Letter." The why helps the agent generalize to adjacent situations.
Version your skills. Not necessarily with semver, but track when they were last verified against the actual environment. This gives you and anyone maintaining the system a signal for when to revisit.
Keep them focused. One skill per domain, one domain per skill. A skill that covers PDF creation, DOCX editing, and PPTX generation is three skills pretending to be one. It'll be harder to trigger correctly, harder to maintain, and harder to reason about.
Skills are part of a larger story about how we make AI agents actually reliable in real-world settings. Raw model capability gets you surprisingly far, but the last mile — the environment-specific knowledge, the accumulated lessons, the guardrails that reflect actual experience — has to come from somewhere. Skills are currently one of the better mechanisms for closing that gap.
They're not a silver bullet. They require maintenance. They require discipline around when to use them and when not to. They can go wrong in ways that are subtle and hard to debug. But when they're done right, they're one of the most concrete ways to take an AI agent from "impressive demo" to "actually useful in production."
The teams that get the most out of coding agents tend to treat skills like they treat documentation: something that needs to be accurate, maintained, and specific enough to be genuinely useful rather than vaguely reassuring. The teams that don't take them seriously tend to discover, after enough painful runs, that they needed them all along.
If you're building your first set of skills, start with the tasks that have bitten you the most — the edge cases you had to learn the hard way, the library behaviors that weren't in the docs, the patterns your team has settled on through trial and error. That's exactly the knowledge skills are built to carry.