Codex Skills and AGENTS.md: How to Make AI Coding Work Repeatable

Macintosh HDWritingAI Coding

Article 13AI CodingMay 1, 2026

Reading time: 24 min

Codex Skills and AGENTS.md: How to Make AI Coding Work Repeatable

A practical tutorial on using AGENTS.md and Codex skills to make AI coding workflows repeatable, consistent, testable, and easier to review.

The first time Codex fixes a bug for you, it feels like magic. The tenth time, the real problem appears: you keep repeating the same project rules, test commands, file structure, review expectations, and safety warnings.

That is where AGENTS.md and Codex skills become important.

OpenAI’s Codex documentation says Codex reads AGENTS.md files before doing work, so you can give it durable project guidance. The same docs describe skills as reusable workflows that package instructions, resources, and optional scripts so Codex can follow a process reliably.

Those two ideas solve different problems. AGENTS.md tells Codex how this repository works. A skill tells Codex how to perform a repeated task.

This article explains how to use both together. We will cover what to put in AGENTS.md, what belongs in a skill, how to avoid overloading the agent, and how to create practical skills for code review, refactoring, tests, article SEO, and project-specific formatting.

The Short Version

Use AGENTS.md for stable project guidance.

Use Codex skills for repeated procedures.

Use a normal prompt for one-time tasks.

A simple rule:

txtCopy
Repository rules → AGENTS.md
Repeated workflow → Codex skill
One-off request → prompt

Example:

“This repo uses Next.js, TypeScript, npm, strict linting, and these test commands” belongs in AGENTS.md.
“Review the current diff for bugs, missing tests, and risky files” belongs in a code-review skill.
“Explain why this one component is throwing a hydration error” can be a normal prompt.

This separation keeps Codex useful without giving it a giant pile of instructions every time.

What AGENTS.md Does

AGENTS.md is project guidance for agents.

Codex reads it before work begins. OpenAI’s docs describe it as a way to layer global guidance with project-specific overrides so tasks start with consistent expectations across repositories.

A good AGENTS.md answers the questions a new developer would ask before making a change:

What kind of project is this?
Where do important files live?
How do I run the app?
How do I run tests, lint, typecheck, and build?
What conventions should I follow?
Which files are risky?
What does “done” mean?
How should I report my changes?

It should be practical, short, and specific. A short accurate file beats a long vague one.

A Practical AGENTS.md Template

Here is a useful starter template for a TypeScript / Next.js project:

mdCopy
# AGENTS.md

## Project
This is a Next.js TypeScript project. Keep changes small, typed, and reviewable.

## Commands
- Install: npm install
- Dev: npm run dev
- Type check: npm run typecheck
- Lint: npm run lint
- Test: npm test
- Build: npm run build

## Repo map
- `app/` — routes, layouts, and page-level components.
- `components/` — reusable React components.
- `components/ui/` — low-level UI primitives. Do not add business logic here.
- `lib/` — reusable utilities and framework-agnostic logic.
- `server/` — server-side integrations and data access.
- `tests/` — test utilities and integration tests.

## Coding rules
- Use strict TypeScript. Do not use `any` to hide type errors.
- Prefer existing utilities and components before adding new ones.
- Do not add dependencies without explaining why.
- Keep changes scoped to the task.
- Follow patterns in nearby files.

## Testing rules
- If business logic changes, add or update tests.
- If fixing a bug, add a regression test when practical.
- Run the narrowest relevant test first, then broader checks.

## Safety rules
- Do not edit `.env*`, auth, billing, permissions, migrations, CI, or deployment files unless explicitly requested.
- Do not print secrets.
- Do not run destructive commands.
- Ask before deleting files or broad refactors.

## Done means
Before finishing, summarize:
1. Files changed.
2. Tests or checks run.
3. Risks or assumptions.
4. Manual verification still needed.

This file gives Codex stable repository context. It does not need to describe every workflow. That is what skills are for.

Global, Repo, and Folder-Level Guidance

OpenAI’s AGENTS.md guide describes layered guidance. Codex can use global guidance and more specific project guidance, and more local instructions can override broader ones.

A practical setup might look like this:

txtCopy
~/.codex/AGENTS.md
my-app/AGENTS.md
my-app/server/billing/AGENTS.md
my-app/components/AGENTS.md

Use global guidance for personal defaults:

prefer small plans before editing;
summarize diffs before finishing;
ask before destructive commands.

Use repo guidance for project setup:

commands;
architecture;
test expectations;
folder map.

Use folder-level guidance for local rules:

billing code requires extra review;
UI components must not call server APIs directly;
migrations require manual approval.

Do not put everything in the root file. Put rules close to where they apply.

What Codex Skills Do

Codex skills package repeatable workflows.

OpenAI’s skills documentation says skills extend Codex with task-specific capabilities and can package instructions, resources, and optional scripts so Codex can follow a workflow reliably.

A skill is useful when you keep repeating the same process:

review this diff;
refactor this component;
write tests for this module;
check article SEO;
generate release notes;
validate a migration plan;
audit a landing page.

Instead of pasting the same long prompt, create a skill. The skill becomes a reusable procedure the agent can call when the task appears.

AGENTS.md vs Skill

The difference is simple:

Use case	Put it in AGENTS.md	Put it in a skill
Project commands	Yes	No
Repo layout	Yes	No
TypeScript conventions	Yes	Sometimes reference only
Code review checklist	Maybe short version	Yes
Refactor procedure	No	Yes
Test-writing workflow	No	Yes
Article SEO audit	No	Yes
Formatting rules for a content object	Maybe summary	Yes
One-time bug explanation	No	No, use prompt

If the instruction applies to every task in the repository, it belongs in AGENTS.md. If the instruction describes how to perform a specific recurring task, it belongs in a skill.

A Basic Codex Skill Structure

A skill usually starts with a folder and a SKILL.md file.

Example:

txtCopy
skills/
  code-review/
    SKILL.md
    resources/
      review-rubric.md
  refactor-small-scope/
    SKILL.md
    examples/
      good-plan.md
      bad-plan.md
  write-tests/
    SKILL.md
    resources/
      testing-patterns.md
  article-seo-check/
    SKILL.md
    resources/
      seo-checklist.md

The main SKILL.md should say when to use the skill, what process to follow, what resources matter, and what output to return.

Keep the skill focused. A code-review skill should not also rewrite the whole feature. A test-writing skill should not redesign the product.

Skill 1: Code Review

A code-review skill is one of the most useful first skills because AI-generated code still needs careful review.

Folder:

txtCopy
skills/code-review/SKILL.md

Example:

mdCopy
---
name: code-review
description: Review the current git diff for bugs, missing tests, type safety, security-sensitive changes, and unrelated edits.
---

# Code Review Skill

## When to use
Use this skill when reviewing code written by a human or an AI agent.

## Process
1. Run or request `git diff --stat`.
2. Inspect changed files.
3. Identify the purpose of the change.
4. Look for unrelated edits.
5. Check type safety.
6. Check missing tests.
7. Check security-sensitive changes.
8. Check whether the diff matches the task.

## High-risk files
Flag changes to:
- auth
- billing
- payments
- permissions
- migrations
- environment config
- CI/CD
- secrets

## Output format
Return findings by severity:
- blocker
- major
- minor
- question
- suggested test

Do not modify files unless explicitly asked.

This skill should be used after Codex creates a diff. It helps prevent the common mistake of accepting a confident summary without reviewing the actual code.

Skill 2: Small-Scope Refactor

Refactoring is where coding agents can be both helpful and dangerous. They can update many files quickly, but they can also over-refactor.

Folder:

txtCopy
skills/refactor-small-scope/SKILL.md

Example:

mdCopy
---
name: refactor-small-scope
description: Refactor a limited area of the codebase while preserving behavior and keeping the diff reviewable.
---

# Small-Scope Refactor Skill

## When to use
Use this skill for scoped refactors where behavior should stay the same.

## Process
1. Identify the exact files involved.
2. Find an existing pattern to follow.
3. Explain the current behavior.
4. Propose a minimal refactor plan.
5. Wait for approval before editing if the task is broad.
6. Make the smallest useful change.
7. Run relevant checks.
8. Summarize behavior preserved and files changed.

## Rules
- Do not change public API behavior unless requested.
- Do not rename exported symbols without need.
- Do not mix formatting-only changes with logic changes.
- Do not add dependencies.
- Do not touch auth, billing, migrations, or CI.

## Output
Return:
- refactor goal;
- files changed;
- behavior preserved;
- checks run;
- remaining risks.

This skill is useful because it forces the agent to treat refactoring as a controlled operation, not a license to rewrite the codebase.

Skill 3: Writing Tests

A test-writing skill helps Codex add tests that protect behavior instead of adding shallow coverage.

Folder:

txtCopy
skills/write-tests/SKILL.md

Example:

mdCopy
---
name: write-tests
description: Add focused tests for changed behavior using existing project test patterns.
---

# Write Tests Skill

## When to use
Use this skill when a bug fix, feature, or refactor needs test coverage.

## Process
1. Inspect existing tests near the changed code.
2. Identify the behavior that should be protected.
3. Prefer regression tests for bugs.
4. Use existing test utilities and mocking patterns.
5. Keep tests focused on behavior, not implementation details.
6. Run the narrowest relevant test command.
7. Report any tests that could not be run.

## Avoid
- snapshot tests unless they already match the project style;
- tests that only check rendering without behavior;
- changing production code just to make a bad test pass;
- broad test rewrites.

## Output
Return:
- behavior covered;
- test files changed;
- command run;
- result;
- remaining gaps.

This skill pairs well with AGENTS.md. The project file tells Codex how to run tests. The skill tells it how to decide what tests to write.

Skill 4: Article SEO for a Code-Based Blog

If your blog stores articles in articles.ts, you can create a skill that checks metadata, Markdown, image paths, and formatting.

Folder:

txtCopy
skills/article-seo-check/SKILL.md

Example:

mdCopy
---
name: article-seo-check
description: Review a code-based blog article object for SEO metadata, search intent, Markdown rendering, image paths, and TypeScript formatting.
---

# Article SEO Check Skill

## When to use
Use this skill before publishing or rewriting a blog article stored in `articles.ts`.

## Check
- title matches one clear search intent;
- SEO title is specific and not clickbait;
- description explains the practical value;
- slug is readable and stable;
- cover image path matches the article folder;
- body does not start with duplicate H1;
- tables are valid Markdown;
- code blocks are fenced correctly;
- internal links are relevant;
- claims are cautious and not misleading;
- no AdSense earnings promises;
- TypeScript strings are valid.

## Formatting rules
- `body` must be `string[]`.
- Each array item must be one valid TypeScript string.
- Use single newline escapes inside strings.
- Do not output double-escaped newline sequences.
- Keep technical terms intact: `article-review`, `JSON-LD`, `next.config.ts`.

## Output
Return:
- metadata issues;
- formatting issues;
- content issues;
- suggested fixes;
- final publish checklist.

This is a good example of a skill that is specific to your actual project. It prevents repeated formatting bugs and keeps article quality consistent.

Skill 5: PR Description

A PR description skill helps Codex explain a change clearly without hiding risk.

Folder:

txtCopy
skills/pr-description/SKILL.md

Example:

mdCopy
---
name: pr-description
description: Generate a clear pull request description from the current diff.
---

# PR Description Skill

## Process
1. Inspect changed files.
2. Identify the user-facing or developer-facing change.
3. Summarize implementation details.
4. List tests run.
5. List risks and manual checks.

## Output format
## Summary
- ...

## Changes
- ...

## Tests
- ...

## Risks / Review Notes
- ...

This skill is useful because a good PR description makes human review faster. It should not oversell the change or claim tests were run if they were not.

How Not to Overload the Agent

Too much guidance can make Codex worse. Long instruction files create contradictions, stale rules, and irrelevant context.

Use these rules:

keep AGENTS.md short;
move repeated procedures into skills;
move long examples into resource files;
delete outdated rules;
avoid vague rules like “write clean code”;
avoid repeating the same rule in five places;
keep high-risk rules explicit and close to the relevant folder.

Bad instruction:

txtCopy
Always write perfect production-ready code using best practices.

Better instruction:

txtCopy
For changes to business logic, add or update tests. Do not use `any` to hide TypeScript errors. Do not edit auth, billing, migrations, or CI unless the task explicitly asks for it.

Good guidance is specific enough to review in a diff.

A Complete Workflow Example

Imagine you want Codex to refactor a dashboard filter and add tests.

Do not write:

txtCopy
Fix the filter.

Write:

txtCopy
Task: persist the dashboard status filter in the URL query string.

Before editing:
- Read AGENTS.md.
- Inspect `components/dashboard/StatusFilter.tsx`.
- Inspect `lib/url-state.ts`.
- Use the refactor-small-scope skill.

Requirements:
- Selecting a status updates `?status=`.
- Reloading keeps the selected status.
- Invalid values fall back to `all`.
- Do not change dashboard layout.
- Do not add dependencies.

After editing:
- Use the write-tests skill.
- Run relevant tests and typecheck.
- Use the code-review skill on your own diff.

This prompt works because it combines stable repo guidance with task-specific skills. AGENTS.md tells Codex how the project works. The skills tell it how to perform the refactor, tests, and review.

How to Improve AGENTS.md Over Time

Do not try to write the perfect AGENTS.md on day one.

Start with commands, repo map, conventions, and safety rules. Then update the file when Codex makes the same mistake twice.

Examples:

Codex keeps adding dependencies → add a dependency rule.
Codex changes auth by accident → add auth to high-risk files.
Codex writes tests with the wrong framework → document the test pattern.
Codex forgets build commands → add command list.
Codex changes too many files → add small-diff expectations.

OpenAI’s best practices recommend making guidance reusable with AGENTS.md and keeping it practical. Treat it as a living engineering artifact, not a one-time setup file.

How to Improve Skills Over Time

Improve skills the same way you improve code: based on real failures.

After using a skill, ask:

Did the agent choose the right scope?
Did it follow the procedure?
Did it miss a risk?
Did the output format help?
Did the skill include too much irrelevant material?
Did a resource file become outdated?
Should this workflow have a script or checklist?

Keep the main SKILL.md focused. If the skill needs examples, put them in examples/. If it needs long reference material, put it in resources/. If it needs repeatable validation, add a script.

The best skills are small, tested, and easy to update.

Testing a Codex Skill

A skill should be tested on real tasks before you trust it.

For a code-review skill, test it on:

a clean small diff;
a diff with an unrelated file change;
a diff that adds any;
a diff that changes auth or billing;
a diff with missing tests;
a diff that updates tests incorrectly.

For a refactor skill, test it on:

a tiny refactor;
a refactor with multiple files;
a task where it should ask for approval;
a task that is too broad and should be split.

A simple test table:

Skill	Test case	Expected behavior
code-review	Diff changes auth file	Flag as high risk
write-tests	Bug fix has no test	Suggest regression test
refactor-small-scope	Task is too broad	Ask for plan/approval
article-seo-check	Body has bad Markdown	Flag formatting issue

If a skill fails a common case, fix the skill before relying on it.

Security and Approval Settings Still Matter

Skills and AGENTS.md do not replace security controls.

Codex documentation also covers configuration, approvals, sandboxing, and security. Those controls matter because a coding agent can read files, edit files, run commands, and interact with tools depending on your setup.

Practical safety rules:

start with conservative permissions;
require approval for shell commands you do not recognize;
avoid giving write access to sensitive folders;
never let the agent read or print secrets;
keep network and MCP access scoped;
review diffs before merging;
do not automate destructive actions by default.

A skill can tell Codex to be careful. Security settings enforce boundaries when instructions are not enough.

Common Mistakes

Avoid these mistakes:

putting every instruction into AGENTS.md;
creating skills for tasks you only do once;
writing vague skills with no output format;
using skills as a substitute for tests;
letting skills become outdated;
duplicating contradictory rules across files;
allowing refactor skills to touch too many files;
accepting Codex summaries without reviewing diffs;
letting article-formatting skills output invalid TypeScript strings.

The fix is simple: stable project facts go in AGENTS.md, repeated workflows become skills, and human review remains part of the process.

Final Checklist

Before you consider your Codex setup repeatable, check:

Does the repo have a short, accurate AGENTS.md?
Are build, lint, typecheck, and test commands documented?
Are risky files listed?
Do repeated workflows have skills?
Does each skill have a clear trigger and output format?
Are skills tested on real tasks?
Are examples and resources current?
Are security approvals and sandbox settings appropriate?
Do developers review diffs before merging?
Is there a process for updating guidance after repeated mistakes?

If yes, Codex will behave less like a one-off assistant and more like a reusable engineering workflow.

Conclusion: Repeatability Is the Real Productivity Gain

The biggest benefit of Codex is not that it can write code once. It is that you can configure it to follow your project’s rules again and again.

AGENTS.md gives Codex durable project guidance. Skills give Codex repeatable procedures. Prompts give Codex the task-specific goal.

Use all three together:

txtCopy
AGENTS.md = how this repo works
Skill = how this workflow should run
Prompt = what I need right now

That is how AI coding becomes repeatable. Not by writing one perfect prompt, but by turning your engineering process into context, skills, checks, and reviewable diffs.