What Shipping DIALOGUE Taught Me About Multilingual AI Products
DIALØGUE supports 7 languages, but the real multilingual work was not translating strings. It was fixing audience-local dates, TTS consistency, UI language drift, and deciding where quality mattered enough to slow down.
One of the most seductive things AI lets you do now is add languages fast. I know because I keep doing it — I added 5 languages to DIALØGUE in 48 hours, then translated my entire blog archive into 10 languages in 4 days.
Those posts were about the speed side. This one is about everything speed doesn't cover.
Translation is the easy part to demo and the dangerous part to overestimate.
The machine can move text from one language to another astonishingly fast. That does not mean the product now feels native. In my experience, the real multilingual work shows up in four places: time, voice, UI systems, and fallback behavior. That is where the product either starts to feel thoughtful or starts to feel fake.
Here is what that looked like in practice:
| Translation Work | Product Work | |
|---|---|---|
| What it covers | Strings, copy, labels | Dates, layout, voice, fallbacks |
| Who notices when it's wrong | Bilingual reviewers | Every user in that locale |
| How AI helps | Generates translations fast | Can't judge product-level fit |
| Effort to fix | Re-translate the string | Redesign the component |
| When you discover it | Code review | After someone uses the app |
The First Bug Was Not Translation. It Was Time.
One of the fixes that made this obvious had nothing to do with copy.
For recurring shows, DIALØGUE generates episode titles automatically. The straightforward version is obvious: show name plus today's date.
That sounds harmless until your audience is in Tokyo and your server is thinking in California.
If a Japanese listener opens a daily show on what is already March 14 in Tokyo, but the episode title still says March 13 because that is what the server thinks, the product immediately feels off. Not broken. Just careless.
So I ended up changing episode titles to use the audience's local date in the audience's locale and timezone, not the server's default.
That is a small implementation detail. It is also exactly the point.
Multilingual products are not just about words. They are about context.
Language without local context is often just a nicer version of being wrong.
Lesson 1: Voice Is Part of the Product
This became much clearer the deeper I went into DIALØGUE.
The product is built around dialogue, pacing, personality, and audio. That raises the quality bar much more than a normal SaaS interface.
When I added new templates, I learned quickly that multilingual support was not just:
- translate the template name
- translate the landing page copy
- ship it
For one template, I had to think through all of this:
- host profiles
- audience profiles
- dialogue guides
- voice instructions for TTS
- localized naming conventions per language
That work existed in English, but it also needed Japanese and Vietnamese versions because the chemistry between hosts is part of the product, not a cosmetic layer on top of it.
If your product produces content, voice is not decoration. Voice is infrastructure.
Something can be grammatically correct and still feel dead.
I ran into this sharply with Chinese variants and with spoken content more generally. Standard Mandarin can be technically correct and still feel too formal in one context. A casual conversational rhythm in English can sound bureaucratic when pushed through literal product language in another language. A joke that lands in one market can become flat exposition in another.
That is why I care so much about tone guides and host profiles now. Not because I want every output to sound "branded." Because voice drift is one of the fastest ways to make a multilingual AI product feel generic.
And generic is expensive.
Lesson 2: UI Length Is a Product Problem, Not a Translation Problem
This sounds minor until you ship an actual app.
Then suddenly it is everywhere.
When I localized the DIALØGUE iOS app, it was not a matter of translating a handful of screens. It became 253 strings across 7 languages.
And even then, the work was not done.
I had to rewire the input components so the app language picker controlled the actual UI language consistently. Placeholders, buttons, pricing labels, status text, all of it. Otherwise you get the half-translated app problem:
- one screen respects the selected language
- another still shows the English placeholder
- the status labels stay in the original language
- the app technically supports multiple languages, but the experience does not
A clean English button becomes awkward in German. A tidy pricing line wraps in French. A punchy settings label behaves differently in Japanese.
If your localization workflow ends at text generation, you will discover these problems embarrassingly late.
This is why I increasingly think multilingual work has to be treated as a full product system:
- copy
- layout
- spacing
- truncation rules
- component behavior
- screenshot QA
The machine can generate the words. Someone still has to open the Simulator.
Lesson 3: Fallback Behavior Reveals How Mature the Product Really Is
This matters even more in audio products than in text products.
One of the most useful DIALØGUE fixes I made recently looked boring from the outside: TTS thresholding and per-segment consistency.
The original whole-script TTS gate was too optimistic. Long podcasts were crossing the real input limit, wasting multiple failed retries before falling back. On some runs, that meant roughly 12 seconds of avoidable latency before the system corrected itself.
That is annoying in English.
It is worse in multilingual flows because voice consistency is already harder to hold.
The fix was not "use a better model." It was product work:
- lower the threshold for whole-script synthesis
- move long episodes to per-segment mode earlier
- add opening / middle / closing guidance for segment energy
- pass a few lines of previous dialogue as continuity context so the next segment does not feel like a new show
That is fallback behavior. That is maturity.
If a language-specific TTS voice sounds wrong, what happens? If a generated answer mixes languages, what happens? If an untranslated error message appears in the middle of a localized flow, what happens? If a long script crosses the model limit, what happens?
Those moments tell you whether multilingual support is a feature or a foundation.
Users are often forgiving when the system is clearly trying to help. They are much less forgiving when the product feels careless.
Lesson 4: Know When Coverage Isn't Worth It
The business question is not "Can I support another language?"
It is:
- will this language unlock real usage or real distribution?
- can I maintain a believable quality bar in both UI and output?
- can I support the failure cases without creating support debt?
If the answer is no, then what you are launching is not multilingual capability. It is multilingual surface area. And surface area without quality quietly weakens trust.
Lesson 5: Multilingual Products Need Their Own Eval Layer
This may be the biggest takeaway from both DIALØGUE and the blog archive.
You cannot judge multilingual quality by instinct alone, especially at scale.
You need explicit checks:
- locale-specific date and timezone behavior
- terminology consistency
- UI overflow
- fallback language rules
- TTS consistency across segments
- host/personality drift
- output quality in real user flows
In other words, multilingual products need evals too.
And this is where I think AI builders can get caught. We become fascinated by how much the model can produce and spend less time defining what "good" means in a durable way.
That is manageable at small scale.
At larger scale, it becomes dangerous.
Where This Leaves Me
I am more optimistic than ever about multilingual AI products.
AI absolutely changes the economics. It absolutely expands what one person or one small team can ship. It absolutely makes product ambitions possible that would have been unrealistic not long ago.
But it also raises the standard for product judgment.
Because once language expansion becomes easy, quality becomes the differentiator. And quality in multilingual products does not come from translation alone. It comes from context, voice, interface fit, fallbacks, review loops, and a very honest definition of what "native enough" actually means.
That is what DIALØGUE taught me. The more languages you support, the more product management you are really doing, whether you call it that or not.
If your product is truly multilingual, the work does not end when the strings are translated. That is where the real work starts.
If you want to see the result of that thinking, DIALØGUE is live now in 7 languages. And I suspect the multilingual layer will keep teaching me new lessons the more people use it. Usually the humbling kind.
If you are building multilingual products too, I would love to know: what was the moment you realized the hardest bug had nothing to do with translation at all?
Cheers, Chandler
Frequently Asked Questions
What is the hardest part of building a multilingual AI product?
Not the translation — AI handles that remarkably well now. The hardest part is everything around the translation: timezone-aware formatting, voice consistency across TTS segments, UI components that break at different string lengths, and fallback behavior when something goes wrong in a specific locale. These are product problems, not language problems.
Should I add more languages to my AI product?
Only if you can maintain quality. Each language you add creates ongoing product work: layout QA, voice tuning, locale-specific date and number formatting, and fallback paths. If your product relies on nuance and generated content — like podcasts, long-form writing, or conversational AI — the bar is higher than a simple transactional interface.
How do you test multilingual AI features?
Build an explicit eval layer. For DIALØGUE, that means checking locale-specific dates, TTS segment consistency, UI overflow across all 7 languages, and personality drift in generated host dialogue. You cannot rely on instinct once you support more than two or three languages — the surface area is too large to manually spot-check everything.
Does AI make localization easier or harder?
Both. AI makes the initial translation dramatically faster and cheaper. But it also creates a false sense of completion — you get the words right and assume the product is ready. The real work (context, voice, UI fit, fallbacks) still requires human judgment. AI raised the floor for multilingual coverage, but the ceiling is still product craft.





