This article explores how software developers can integrate ChatGPT Codex into their development workflows, from initial code generation to pull request creation. Codex is OpenAI’s code-focused large language model, capable of reading and writing code, generating test cases, and interacting with GitHub repositories. We explain how to get the most out of Codex by combining it with unit tests and test-driven development (TDD), ensuring reliable and verifiable results. Drawing on real-world advice from Simon Willison, we emphasize why automated testing is not just a complement but a critical enabler of safe and effective AI-assisted software engineering.
UPDATE: Codex has been heavily been updated as this post was being written, therefore some sections might look incorrect. Example: Now Codex is able to update original PR after you ask further adjustments, which was a manual process before.
ChatGPT Codex has introduced a new approach to software development; one where developers write less code and more intent. Codex can generate entire modules, construct unit tests, scaffold features, and open pull requests through GitHub, all from plain language instructions.
But AI is not magic. Codex does not understand your business logic, maintain long-term context across projects, or validate correctness on its own. To use Codex effectively, you must combine it with repeatable verification tools; especially automated tests and TDD workflows.
This article presents a comprehensive guide to using Codex for real-world software tasks, structured around a modern DevOps flow:
- Define the problem and behavior
- Use Codex to generate or implement the solution
- Write or request unit tests
- Run tests to validate behavior
- Use Codex to open a pull request
- Iterate safely, guided by tests
What is ChatGPT Codex?
Codex is a code-specialized AI model. It is integrated into ChatGPT (for Pro and Teams users), where it can interact with connected GitHub repositories and local project files. Codex understands:
- Multiple programming languages and frameworks (JavaScript, Python, Go, Ruby, etc.)
- Common project patterns (MVC, monorepos, microservices)
- Unit testing libraries (Jest, Mocha, Pytest, RSpec, etc.)
- GitHub actions like creating branches and submitting pull requests
Codex doesn’t just autocomplete code, it reasons about intent, structure, and behavior. This allows developers to collaborate with the AI like they would with a junior developer who works quickly and iteratively, but still needs direction and verification.
A Basic Workflow
Here is a typical multi-step prompt flow using Codex assuming you already have an existing repository setup with an application you are developing:
Step 1: Define the Feature
“Create an Express middleware that rate-limits each IP to 100 requests per minute and returns a 429 if the limit is exceeded.”
Codex will:
- Generate a middleware using memory or Redis
- Explain how it works and where to install it
- Suggest integration points
Step 2: Ask for Unit Tests
“Write Jest tests for this middleware.”
Codex generates:
- Test coverage for valid requests
- Behavior after exceeding the rate limit
- Optional edge cases like header spoofing
Step 3: Create the Pull Request
Now click on the “Create Pull Request” button
Codex:
- Creates a branch (e.g.,
codex/rate-limiter
) - Commits the implementation and tests
- Opens a pull request with a summary
This entire flow can happen in a single Codex session, reducing friction and accelerating prototyping.
Codex Is Fast. Tests Make It Safe.
Codex can write working code with remarkable speed. But it cannot “understand” whether that code is correct in your real-world context; only whether it structurally looks right. That’s where automated tests become essential.
Tests enable a shift from inspection to verification. Instead of reading every line Codex generates, you can use a test suite to answer a much more important question: “Does it behave the way we want it to?”
Lessons from Simon Willison on Using LLMs Safely
Developer and educator Simon Willison captures this mindset perfectly in his article, “Automated tests make LLM-assisted coding more powerful”. His key takeaway:
“The better your test coverage, the more productive you can be with LLMs.”
That insight reflects a deep truth: automated tests are not just a complement to LLMs, they are what unlock the true value.
Key Advantages of Using Tests with Codex
1. Tests Define Correctness for Codex
You don’t need Codex to perfectly understand your business rules. You just need it to pass your tests.
“You don’t have to know what the correct output looks like. You just have to know how to verify that a given output is correct.” — Simon Willison
This changes the game: instead of prompting Codex to generate “the right answer,” you prompt it to generate something that passes the test.
2. Tests Enable Safe Exploration
Codex often generates multiple valid solutions. With good test coverage, you can:
- Try a few different implementations
- Run your tests
- Choose the best one based on performance or clarity
You don’t need to manually verify correctness each time; your test suite does it for you.
3. Tests Protect Future Work
You might accept Codex’s first suggestion today, but replace it next week. With tests, you know if behavior breaks the moment it happens.
Prompting Strategies for Test-Aware Codex Usage
Test-First Prompting (Conversational TDD)
Start by asking Codex to generate tests.
“Write a test that checks if
slugify('Hello World!')
returns'hello-world'
.”
Then ask:
“Write the implementation of
slugify
that passes this test.”
Codex provides a function using a regex and .toLowerCase()
, and you’re done.
Failure-Driven Prompts
If you already have failing tests, use them directly as input:
“Fix this middleware so that this test passes: expected status 429, received 200.”
Codex will identify and repair the condition that fails, then walk you through what changed.
Refactor and Retest
Codex can also improve existing logic if you define what must stay true:
“Refactor this to use a map instead of an object, but make sure all tests still pass.”
Tests as Prompt Scaffolding
Another strategy is to provide both the test and interface, then let Codex fill in the rest:
// test/userService.test.ts
test("gets user by email", async () => {
const user = await getUserByEmail("test@example.com");
expect(user.email).toBe("test@example.com");
});
Prompt:
“Implement
getUserByEmail
inservices/userService.ts
to pass this test.”
Codex responds with a function using your data layer—often correctly guessing the database abstraction, or asking you to clarify.
When to Rely on Codex (And When Not To)
Use Codex for:
- Scaffolding or rewriting boilerplate logic
- Refactoring code based on reviewer feedback
- Generating test cases and coverage
- Documenting what changed and why
Avoid using Codex alone for:
- Security-sensitive functionality (e.g., cryptography)
- Schema or API migration logic without manual review
- Business rules where context and nuance matter
Why TDD Works So Well with Codex
If you want Codex to go from “clever assistant” to “trustworthy teammate,” there’s one technique that makes a huge difference: Test-Driven Development (TDD).
TDD isn’t just a coding methodology, it’s a language of verification, and it’s the most effective way to steer Codex toward producing exactly what you want.
Codex is not great at guessing your intent when your prompt is vague. But it is exceptional at completing patterns, especially when given a clear test with inputs and expected outputs.
TDD Does 3 Powerful Things for Codex:
It Defines the Edges of the Problem
When you write a test first, you are telling Codex:
- What the input is
- What the output should be
- What behaviors matter
Now Codex doesn’t have to decide what the function should do, it just has to satisfy your test.
It Reduces Prompt Ambiguity
Instead of saying:
“Build a function to calculate sales tax.”
Try:
“Here’s a test that expects
calculateSalesTax(100, 0.07)
to return7
. Write a function to pass this.”
Codex now has all the clarity it needs.
It Lets You Iterate Confidently
Once a test passes, you can safely ask Codex to:
- Refactor the implementation
- Swap out libraries
- Modify the interface
And you’ll immediately know if anything broke, because the tests will fail.
TDD Helps Codex, and Codex Helps You Stick to TDD
Most teams know they should write tests first, but often skip the step. Codex removes the friction:
- Ask it to scaffold tests first
- Use the tests to drive implementation
- Lean on tests to validate changes
By putting the test at the center of your conversation, you make Codex smarter, and you become a better developer in the process.
Building AI-Enhanced Engineering Culture
Codex is fast, adaptable, and great at working within prompts. But its real power emerges when it works with your tests—not in spite of them.
When you pair Codex with TDD, you unlock:
- Faster prototyping with real safeguards
- Higher-quality code with less manual checking
- Confidence in AI-assisted changes
- A workflow that scales to teams, projects, and legacy systems
Treat Codex like a fast-thinking junior dev:
- You write the tests.
- You define the goal.
- Codex writes the code that makes the test go green.
That’s the future of software development and it’s already here.