ChatGPT Codex: Takeaways

This article explores how software developers can integrate ChatGPT Codex into their development workflows, from initial code generation to pull request creation. Codex is OpenAI’s code-focused large language model, capable of reading and writing code, generating test cases, and interacting with GitHub repositories. We explain how to get the most out of Codex by combining it with unit tests and test-driven development (TDD), ensuring reliable and verifiable results. Drawing on real-world advice from Simon Willison, we emphasize why automated testing is not just a complement but a critical enabler of safe and effective AI-assisted software engineering.

UPDATE: Codex has been heavily been updated as this post was being written, therefore some sections might look incorrect. Example: Now Codex is able to update original PR after you ask further adjustments, which was a manual process before.

ChatGPT Codex has introduced a new approach to software development; one where developers write less code and more intent. Codex can generate entire modules, construct unit tests, scaffold features, and open pull requests through GitHub, all from plain language instructions.

But AI is not magic. Codex does not understand your business logic, maintain long-term context across projects, or validate correctness on its own. To use Codex effectively, you must combine it with repeatable verification tools; especially automated tests and TDD workflows.

This article presents a comprehensive guide to using Codex for real-world software tasks, structured around a modern DevOps flow:

Define the problem and behavior
Use Codex to generate or implement the solution
Write or request unit tests
Run tests to validate behavior
Use Codex to open a pull request
Iterate safely, guided by tests

What is ChatGPT Codex?

Codex is a code-specialized AI model. It is integrated into ChatGPT (for Pro and Teams users), where it can interact with connected GitHub repositories and local project files. Codex understands:

Multiple programming languages and frameworks (JavaScript, Python, Go, Ruby, etc.)
Common project patterns (MVC, monorepos, microservices)
Unit testing libraries (Jest, Mocha, Pytest, RSpec, etc.)
GitHub actions like creating branches and submitting pull requests

Codex doesn’t just autocomplete code, it reasons about intent, structure, and behavior. This allows developers to collaborate with the AI like they would with a junior developer who works quickly and iteratively, but still needs direction and verification.

A Basic Workflow

Here is a typical multi-step prompt flow using Codex assuming you already have an existing repository setup with an application you are developing:

Step 1: Define the Feature

“Create an Express middleware that rate-limits each IP to 100 requests per minute and returns a 429 if the limit is exceeded.”

Codex will:

Generate a middleware using memory or Redis
Explain how it works and where to install it
Suggest integration points

Step 2: Ask for Unit Tests

“Write Jest tests for this middleware.”

Codex generates:

Test coverage for valid requests
Behavior after exceeding the rate limit
Optional edge cases like header spoofing

Step 3: Create the Pull Request

Now click on the “Create Pull Request” button

Codex:

Creates a branch (e.g., codex/rate-limiter)
Commits the implementation and tests
Opens a pull request with a summary

This entire flow can happen in a single Codex session, reducing friction and accelerating prototyping.

Codex Is Fast. Tests Make It Safe.

Codex can write working code with remarkable speed. But it cannot “understand” whether that code is correct in your real-world context; only whether it structurally looks right. That’s where automated tests become essential.

Tests enable a shift from inspection to verification. Instead of reading every line Codex generates, you can use a test suite to answer a much more important question: “Does it behave the way we want it to?”

Lessons from Simon Willison on Using LLMs Safely

Developer and educator Simon Willison captures this mindset perfectly in his article, “Automated tests make LLM-assisted coding more powerful”. His key takeaway:

“The better your test coverage, the more productive you can be with LLMs.”

That insight reflects a deep truth: automated tests are not just a complement to LLMs, they are what unlock the true value.

Key Advantages of Using Tests with Codex

1. Tests Define Correctness for Codex

You don’t need Codex to perfectly understand your business rules. You just need it to pass your tests.

“You don’t have to know what the correct output looks like. You just have to know how to verify that a given output is correct.” — Simon Willison

This changes the game: instead of prompting Codex to generate “the right answer,” you prompt it to generate something that passes the test.

2. Tests Enable Safe Exploration

Codex often generates multiple valid solutions. With good test coverage, you can:

Try a few different implementations
Run your tests
Choose the best one based on performance or clarity

You don’t need to manually verify correctness each time; your test suite does it for you.

3. Tests Protect Future Work

You might accept Codex’s first suggestion today, but replace it next week. With tests, you know if behavior breaks the moment it happens.

Prompting Strategies for Test-Aware Codex Usage

Test-First Prompting (Conversational TDD)

Start by asking Codex to generate tests.

“Write a test that checks if slugify('Hello World!') returns 'hello-world'.”

Then ask:

“Write the implementation of slugify that passes this test.”

Codex provides a function using a regex and .toLowerCase(), and you’re done.

Failure-Driven Prompts

If you already have failing tests, use them directly as input:

“Fix this middleware so that this test passes: expected status 429, received 200.”

Codex will identify and repair the condition that fails, then walk you through what changed.

Refactor and Retest

Codex can also improve existing logic if you define what must stay true:

“Refactor this to use a map instead of an object, but make sure all tests still pass.”

Tests as Prompt Scaffolding

Another strategy is to provide both the test and interface, then let Codex fill in the rest:

// test/userService.test.ts
test("gets user by email", async () => {
  const user = await getUserByEmail("test@example.com");
  expect(user.email).toBe("test@example.com");
});

Prompt:

“Implement getUserByEmail in services/userService.ts to pass this test.”

Codex responds with a function using your data layer—often correctly guessing the database abstraction, or asking you to clarify.

When to Rely on Codex (And When Not To)

Use Codex for:

Scaffolding or rewriting boilerplate logic
Refactoring code based on reviewer feedback
Generating test cases and coverage
Documenting what changed and why

Avoid using Codex alone for:

Security-sensitive functionality (e.g., cryptography)
Schema or API migration logic without manual review
Business rules where context and nuance matter

Why TDD Works So Well with Codex

If you want Codex to go from “clever assistant” to “trustworthy teammate,” there’s one technique that makes a huge difference: Test-Driven Development (TDD).

TDD isn’t just a coding methodology, it’s a language of verification, and it’s the most effective way to steer Codex toward producing exactly what you want.

Codex is not great at guessing your intent when your prompt is vague. But it is exceptional at completing patterns, especially when given a clear test with inputs and expected outputs.

TDD Does 3 Powerful Things for Codex:

It Defines the Edges of the Problem

When you write a test first, you are telling Codex:

What the input is
What the output should be
What behaviors matter

Now Codex doesn’t have to decide what the function should do, it just has to satisfy your test.

It Reduces Prompt Ambiguity

Instead of saying:

“Build a function to calculate sales tax.”

Try:

“Here’s a test that expects calculateSalesTax(100, 0.07) to return 7. Write a function to pass this.”

Codex now has all the clarity it needs.

It Lets You Iterate Confidently

Once a test passes, you can safely ask Codex to:

Refactor the implementation
Swap out libraries
Modify the interface

And you’ll immediately know if anything broke, because the tests will fail.

TDD Helps Codex, and Codex Helps You Stick to TDD

Most teams know they should write tests first, but often skip the step. Codex removes the friction:

Ask it to scaffold tests first
Use the tests to drive implementation
Lean on tests to validate changes

By putting the test at the center of your conversation, you make Codex smarter, and you become a better developer in the process.

Building AI-Enhanced Engineering Culture

Codex is fast, adaptable, and great at working within prompts. But its real power emerges when it works with your tests—not in spite of them.

When you pair Codex with TDD, you unlock:

Faster prototyping with real safeguards
Higher-quality code with less manual checking
Confidence in AI-assisted changes
A workflow that scales to teams, projects, and legacy systems

Treat Codex like a fast-thinking junior dev:

You write the tests.
You define the goal.
Codex writes the code that makes the test go green.

That’s the future of software development and it’s already here.

References

Simon Willison: Automated tests make LLM-assisted coding more powerful
OpenAI: Codex in ChatGPT
Jest
GitHub API Docs
Replit + LLMs

Takeaways on ChatGPT Codex