I Rebuilt My Site With Two AI Models: Opus for Design, Codex for Execution

For about four and a half months, my site wore the TRANSMISSION skin: dark, cyan, amber, and deliberately technical.

Navy-charcoal background. Cyan accents. Glassy panels. I built it that way on purpose. I had spent eighteen years in advertising. I had taught myself to code at forty. I wanted the site to make that transition visible — to anyone who landed on a blog post, scrolled past the wordmark, and noticed the developer-tool aesthetic, the signal was: this person crossed over.

It did its job. Then it was done.

Over the Memorial Day long weekend I shipped a redesign that looks nothing like what came before. Light mode. Warm paper background. Carmine red instead of cyan. Hairline rules instead of rounded cards. A visible "Vol. 04 · Issue NN" in the corner — the volume counts years since I started building in public, the issue is the current ISO week, both wired to update on their own. The kicker on the day you read this might say Issue 22 or 23 or 41, depending on when you land. There is no Sparkles icon anywhere. The site is live now on chandlernguyen.com.

I could write a whole post about those decisions. But the more useful story is the workflow. I split the redesign between two different AI models. Claude Opus 4.7 with extended thinking did the design. Codex on GPT-5.5 with its highest reasoning setting did the execution. And the /goal function in Codex let it run autonomously for close to four hours at a stretch, while I was out for a walk or eating dinner.

Four days. Two models. Almost thirty commits. One site.

Here is what I learned about which model to use for what — and where the boundary still sits.

Why TRANSMISSION had to go

I crossed from advertising into code over the last two years. By early this year the site had caught up: I redesigned it as TRANSMISSION specifically to make that transition visible. Four and a half months later, the signal is done.

The next job is different. Visitors arriving from a blog post, from a LinkedIn thread, from a Google search are not coming to confirm that I write code. They are coming to learn something about AI in media operations, to consider buying the course, or to compare the course, Prova, DIALØGUE, and STRAŦUM as a product ladder. The site needed to do less signaling and more guiding.

The easiest way to describe the shift: the old site was a personality. The new site is an operating model. Same person behind it, different job to do. That meant a different aesthetic — lighter, calmer, editorial, the kind of surface where a long blog post can breathe and a course CTA can feel honest instead of pushy. Warm paper instead of glass.

But the harder problem was not the aesthetic. It was the throughput. The site has thirteen locales, eight or so public surfaces (home, blog, post, products, learn, about, ask, course sales), an authenticated course access flow, a Stripe checkout, and a Sydney RAG endpoint. Rebuilding all of that by hand would have taken weeks. I did not have weeks.

So I split the work between two AI models.

Why split design and execution at all

Design and execution are different cognitive tasks.

Design is slow. It is judgment about composition, about restraint, about whether one shade of warm paper looks too yellow on an iPhone and another shade looks too cold. It is asking "what kind of surface is this trying to be?" before "how do we render this section?". Good design work compresses time — you think for two hours and then make one decision that saves you ten hours of execution.

Execution is fast. It is pattern application. Given a design system, generate the components. Given a component, wire it across thirteen locales. Given a page layout, ship the variants. The work is not less skilled, but it is less ambiguous. The right answer is usually defensible against the spec.

Different AI models are better at different cognitive tasks. From my own experience across the last few months — first with Claude Opus 4.6 and now with 4.7, both on their extended-thinking settings — Opus is the more well-rounded partner. Better at brainstorming. Better at design judgment. Slower and less surgically precise on pure code, but it considers the whole composition before answering and it reasons in a way that catches why one font feels too friendly and another feels right.

Codex on GPT-5.5 with its high-reasoning setting feels like a competent senior software engineer who never went anywhere near the design team. It is fast. It is excellent at multi-step tool use — opening files, reading them, editing them, running tests, opening the next one. It does not have strong opinions about whether a button feels too tall, but it ships the button, and it ships every variant of the button across thirteen locales without complaining. Its /goal function lets you hand it a target — for example, "convert the v3 design tokens into the Next.js app, wire the new shell, ship working BottomNav, MobileAppBar, ReadingProgress, verify the build passes with the flag off" — and walk away. It will plan sub-steps, execute them, run the build, fix what breaks, and keep going. The longest stretch it has run unattended for me, on this project, was just under four hours.

I have to admit I did not plan this split deliberately at the start. I had had the design conversation with Opus across the days before the weekend. By the time the long weekend started, I had the design direction document and the desktop prototype in hand. The weekend itself was when I handed the execution to Codex, and within an hour of doing that I noticed how much better-suited each model was to its half of the work.

What Opus did: the design phase

The design work happened in a long conversation with Opus over the days before the rebuild. Two artifacts came out of that conversation, both of which now live in the repo:

1. A design direction document. Six markdown files in doc/v3-design/ — the why, the tokens, the mobile patterns, the page specs, the accessibility constraints, the execution plan. More than thirteen thousand words of decisions, mostly written by Opus from a brief I had given it. The directive: "the next site should feel like a small printed magazine, not a SaaS dashboard. Earn attention through restraint. Use the Operating Model 2×2 framework as the visual signature."

2. A desktop HTML prototype. A flat directory at public/design-prototype-v2/ with six pages and a shared style.css. No React, no Tailwind, no framework — just hand-written HTML so I could click through every page on a local server and feel whether the design system actually held together as a system. The prototype was the source of truth for the visual treatment for the rest of the weekend. When in doubt during the rebuild, the answer was "match the prototype."

Opus made the decisions that mattered most. A few specific moments:

The font reversal. An earlier design pass had locked Manrope as the display font on the basis of availability and ease of shipping — committed in a Claude Sonnet 4.6 session. Eighteen hours later I had moved the design conversation to Opus 4.7 with extended thinking, and Opus's first move was to reconsider the lock. After I had spent an evening on the official PP Neue Montreal specimen page comparing letterforms, the switch went in. The commit message reads like a typography crime scene: "Manrope rendered too rounded and friendly... Satoshi has the same condensed-grotesque proportions, the same double-storey a, single-storey g, swept R leg, cleanly-sheared terminals." The interesting wrinkle is that two different Claude models reached different judgments on the same decision. Sonnet picked the safe option. Opus picked the right one.
The carmine choice. Cyan is the AI-default accent. Opus and I landed on carmine red — #c33824, a printer's-ink red, the color of an editor's mark on a galley proof or a Latin volume on a Loeb Classical Library shelf. The reasoning was specific: "the site is trying to feel editorial, not developer-tool. Cyan signals tech. Carmine signals craft. Use the carmine sparingly, always at high contrast, ideally on warm paper." That is not a decision an execution-optimized model would reach for on its own.
The Operating Model as visual signature. The 2×2 framework I teach in the course — Automate / Protect / Deepen / Change — becomes a quiet brand mark on the new site. In the current implementation it shows up in two restrained places: the homepage thesis block and a small thesis marker on long-form posts. The framework itself was mine; the move to make it the visual identity of the site came out of the design conversation with Opus.
Light over dark. Opus argued for light mode as the personality and dark mode as a supported variant. The argument: long-form reading is easier on warm paper than on glass; price tags feel more honest on light backgrounds; almost every AI tool ships dark by default, so light is the differentiator. I disagreed for a few hours, then agreed.

Opus was mostly the design and review loop; Codex did the implementation passes. Opus's job was the part of the work where being slow paid off.

What Codex did: the execution phase

Once the design document and prototype were done, the work shifted to Codex.

I opened a new Codex session, gave it the design direction files as context, and started running /goal sessions. Each goal was a multi-step target with a clear end state:

Goal: Convert the v3 design tokens from doc/v3-design/components/tokens.css into src/app/globals.css. Remove the conflicting TRANSMISSION tokens. Wire up the three new fonts via src/lib/fonts.ts and src/app/layout.tsx. Generate the v3 shell behind the feature flag NEXT_PUBLIC_ENABLE_V3_DESIGN. Ship a working BottomNav, MobileAppBar, and ReadingProgress. Verify the build passes and that no existing pages are visually affected when the flag is off.

That kind of multi-stage target is exactly what /goal is designed for. Codex plans the sub-steps, executes them, runs pnpm build to check for type errors, fixes what breaks, runs the build again, and keeps going. I came back from a walk and the v3 shell was live in the codebase behind the flag, with the new fonts loaded and the bottom nav rendering on mobile.

Subsequent /goal sessions:

Generate the v3 home and archive pages from the prototype.
Generate the v3 post layout, including the reading progress bar and the v3 share button.
Generate the v3 products and learn pages, matching the ladder structure in the spec.
Align the existing course sales, login, signup, MFA, and access pages to v3 visually, without changing any of the underlying Stripe or Supabase Auth logic.
Localize the v3 course and Ask Sydney surfaces across all thirteen locales, using the existing messages/{locale}.json files as the translation source.

The longest run completed without intervention was just under four hours. The shortest was around forty-five minutes. Across the four days I shipped almost thirty commits — most of them Codex's first or second pass, with small manual cleanups where the model's output was correct but not quite the right shape. The honest accounting: I think Codex wrote the large majority of the actual code that runs in production today. I wrote or edited the rest.

The single most useful thing about /goal is not the speed. It is the autonomy. Most coding sessions I have run with AI tools require me to stay at the keyboard — confirm a change, accept the next file, review the diff. /goal lets you specify the end state you want and walk away. You come back to either a finished change or a paused session at the place the model needs you to make a judgment call. The shape of the work is different. You become a planner of multi-hour sessions instead of a babysitter of single edits.

Where the boundary still sits

Splitting design and execution across two models is not a magic trick. There are still moments where one model needs the other, and a few moments where neither is enough on its own.

The Sparkles icon. Codex generated a clean v3 mobile shell with a Sparkles ✨ icon on the Ask tab — the universal AI-tool tell. The icon was correct against the spec; the spec did not say "do not use the cliché." I noticed it a couple of days later and went back to Opus to discuss replacements. We went through several iterations: the Operating Model mark, then an italic Newsreader question mark in carmine red — the kind of glyph you might find as marginalia in an old essay. The final solution was Opus's idea. Codex shipped it in fifteen minutes.

The Hinomaru moment. While the Operating Model mark was sitting in the top-left of the mobile app bar, next to my name, I had been staring at it for two or three days. A small carmine dot on a warm paper square. Then one evening I looked at it the way someone else would and felt a small flush of realization. It looked exactly like the Japanese flag. The Hinomaru. A small red sun on a pale field. Neither Opus nor Codex caught this. Opus had designed the mark; Codex had implemented it; both had reviewed it. The visual association was a cultural pattern neither model could see. I deleted the mark from the wordmark and reached for a different solution. (The same italic ? ended up filling its spot on the Ask tab.)

The 13-locale rollout. Opus could have designed each locale's UI nuances thoughtfully but would have taken days to do it. Codex could not have made the design judgment alone but could ship all thirteen locales in an evening once the patterns were set. This is the most literal example of the split: design slow, execute fast.

Picking three fonts that work together. Opus chose Satoshi (display), Newsreader (italic accent), and JetBrains Mono (labels). Codex would not have picked this combination on its own — and I am not sure I would have either without Opus's slowness in the conversation. The three-font composition is a design call, not an execution one.

The conversation between the two systems happens in my head, not between the models. That is the part that is still not automatable in 2026 — and I think it is the part that defines what taste means right now.

Honest accounting

Four days. Almost thirty commits. Production v3 is live on chandlernguyen.com. The new site is doing the new job — helping visitors find what to read next instead of telling them I crossed over.

The site has a remaining problem I am still chasing. At draft time, production Lighthouse mobile showed the blog archive landing at 4.6 seconds while my local Chrome DevTools trace showed the same image painting in 264 milliseconds — same code, very different network model. My current leading fix is font preload pressure: three preloaded webfonts plus three render-blocking CSS chunks racing the image to the finish on a throttled mobile connection. I deployed the smallest-leverage version of that fix in the last day and still need a fresh Lighthouse run to confirm it landed. The post you are reading was written before I have the post-fix number.

Subscription cost: my Codex subscription on the highest tier, plus the Claude Max subscription I already had. Together a few hundred dollars a month. Cheap, given the throughput.

The fact that v3 happened in four days instead of three weeks is mostly because the design and execution phases were separated and assigned to the models that were better at each. I do not think this is universal. There are projects where one model can do both, and projects where neither model is the right partner. But for a multi-page, multi-locale, multi-surface redesign with a strong design point of view, this split is the workflow I would reach for again.

If you have done your own AI-assisted rebuild recently, I would genuinely love to hear how you split the work. Did you use one model end-to-end? Did you split by phase the way I did, or by some other axis — by component, by file, by task type? I think the next wave of build-in-public craft is going to be about which AI for which part of the work, not just about whether you use AI at all.

If you want to see the result, you are reading it. The home page is at chandlernguyen.com. The product ladder is at /products/. The course that started the whole "operating model" idea is at /learn/.

Cheers, Chandler

Tagged#claude-opus #codex #ai-workflow #design #build-in-public #personal-site