一個AI Parameter改動Cost我$54/Month
我以為我嘅Cloud Run migration完美無瑕,直到一個AI parameter——temperature設為0.7而唔係0——造成30% failed API calls同每月$54浪費嘅tokens。
我trust咗Claude幫我由AWS Lambda migrate到Google Cloud Run。Migration完美——services deployed、workflows executed、用戶開心。然後我check咗我哋嘅API bills,差啲跌落椅。
*呢篇係我DIALØGUE engineering系列嘅第3部分。如果你miss咗:[第1部分:DIALØGUE介紹]同[第2部分:由Advertising到Engineering]係基礎。*
呢個係當你俾AI寫production code但唔explicit about production constraints會發生咩嘅故事。劇透:「make it work」同「make it production-ready」係好唔同嘅requests。
太順利嘅Migration
同AWS Lambda搏鬥咗幾個月之後(cold starts、layer limits等等),我決定將所有嘢migrate到Google Cloud Run。作為一個pragmatic developer(即係懶),我請咗Claude做我嘅pair programmer。
「幫我將呢啲Lambda functions migrate到Cloud Run,」我話。「呢度係existing code。」
Claude交出靚嘅成果。乾淨嘅Dockerfiles、proper service configurations、working Cloud Workflows。Migration只用咗一日而唔係我預期嘅幾個禮拜。我好開心!
所有嘢deploy得好順。Services啟動得好快。用戶冇問題咁create podcasts。成功,對唔對?
然後我留意到我哋logs入面有啲奇怪嘅嘢:
```
[ERROR] Segment generation failed: Unexpected token in JSON
[RETRY] Attempting segment generation again...
[SUCCESS] Segment generated successfully
```
大概30%嘅AI calls第一次fail但retry成功。唔係世界末日——我嘅retry logic work緊!用戶冇留意到任何問題。但嗰啲extra API calls加埋嚟就唔少。
調查:Blame Migration?
我第一個諗法係migration有嘢出錯。可能新嘅Cloud Run environment唔同?Container networking問題?我花咗幾個鐘比較AWS同GCP嘅configurations。
全部睇落一模一樣。同一啲prompts、同一啲retry logic、同一啲error handling。咁點解我哋突然得到malformed JSON responses?
然後我開始比較actual code。以下係我發現嘅:
AWS Lambda版本(正常work咗幾個月):
```python
response = anthropic.messages.create(
model="claude-3-7-sonnet-20250219",
temperature=0, # Deterministic for JSON
response_format={"type": "json_object"}, # JSON mode
system="You are a JSON generation assistant. Output only valid JSON.",
messages=[{"role": "user", "content": prompt}]
)
```
GCP Cloud Run版本(Claude嘅migration):
```python
response = anthropic.messages.create(
model="claude-3-7-sonnet-20250219",
temperature=0.7, # <-- Wait, what?
messages=[{"role": "user", "content": prompt}]
)
```
就係呢度。Temperature 0.7。
「但嗰個係reasonable default嚟㗎!」你可能會話。你啱嘅。對於creative writing、exploration、brainstorming——0.7好合理。但對於structured JSON generation?係disaster。
Smoking Gun:Creative JSON
加咗detailed logging capture actual AI responses之後,我搵到佢exactly做緊咩。以下係Claude喺temperature 0.7時return嘅:
```
Here's the podcast segment you requested:
{
"title": "The Rise of AI Podcasting",
"content": "Welcome back, listeners! Today we're diving into something really fascinating...",
"duration": 120
}
I hope this segment captures what you were looking for!
```
睇到問題冇?Claude對response format太_creative_。有時淨係JSON,有時JSON加helpful commentary,有時JSON包喺markdown code blocks入面。喺temperature 0.7,佢improvising好似jazz musician咁,但我需要嘅係metronome。
AI Pair Programming嘅盲點
講到同AI做pair programmer:佢incredibly good at令code work,但佢唔一定會optimize for production concerns,除非你specifically ask。
當我話「migrate this Lambda function to Cloud Run」,Claude focus喺migration requirements:
- ✅ Make it run on Cloud Run
- ✅ Handle the same inputs and outputs
- ✅ Maintain the same functionality
- ❌ Optimize for production efficiency
Claude揀temperature 0.7因為佢係AI applications嘅「reasonable default」。而且的確係!對大部分use cases,0.7俾你creativity同consistency嘅nice balance。
但以下係我學到嘅:AI唔知你嘅specific production constraints,除非你話佢知。
我原本嘅AWS code用temperature 0加JSON mode因為我(hard way)學到JSON generation需要deterministic outputs同explicit formatting。但migration期間,我從來冇explicitly話「this is for structured data generation」或「maintain all the JSON-specific optimizations。」
所以Claude寫咗perfectly functional code,有sensible defaults。問題唔係AI——而係我嘅incomplete requirements。
修復:同AI Specific啲
當我identify到問題,我帶住better requirements返去搵Claude:
「Help me fix this code. It's for JSON generation in production. I need 100% reliability, zero creativity. Use temperature 0 and any other optimizations for structured data.」
Claude即刻suggest:
```python
response = anthropic.messages.create(
model="claude-3-5-sonnet",
temperature=0, # Deterministic outputs
response_format={"type": "json_object"}, # JSON mode
system="You are a JSON generation assistant. Output only valid JSON.",
messages=[{"role": "user", "content": prompt}]
)
```
等等,Claude drop咗我哋有嘅JSON mode!(呢個就係你唔explicitly mention每個production requirement during migration時會發生嘅事!)
Success rate由70%跳到99.9%。剩返嗰0.1%?Network timeouts。唔可以blame Claude。
分別?呢次我explicit about production constraints。我唔淨止ask for a migration——我ask for production-optimized、reliability-focused code。
「Reasonable」Defaults嘅真正Cost
等我breakdown吓呢個「reasonable」temperature setting actually cost咗我哋幾多:
- 每日podcast generations:~200
- Failure rate:30%需要retries
- 每日extra API calls:60個failed calls需要retries
- 每月浪費:~1,800個不必要嘅API calls
- Claude 3.7 Sonnet定價:$3 input + $15 output per million tokens
- 每個call平均tokens:~2,000 input + 1,500 output
- 每個retry嘅cost:~$0.03 per failed attempt
- 每月超支:~$54浪費嘅API calls
但真正嘅cost唔淨止$54/month。每個retry多加3-5秒generation time。用戶等得更耐,我嘅Cloud Run instances spin up得更頻密,而且我burn through quota更快。
最諷刺嘅?呢一切發生係因為我trust AI做production decisions但冇俾佢production context。典型嘅「it works」vs「it works efficiently」。
我對AI Pair Programming學到嘅
1. 要Explicit About Production Requirements
「Make it work」俾你functional code。「Make it work efficiently in production with these constraints」俾你optimized code。AI好叻解決你describe嘅問題,唔係你心入面嘅問題。
2. AI用「Reasonable」Defaults,唔係「Optimal」嘅
Temperature 0.7對大部分AI applications嚟講reasonable。但production systems通常需要specific optimizations好似temperature 0加JSON mode。AI唔會preserve呢啲,除非你explicitly mention。
3. Code Review同樣適用於AI-Generated Code
淨係因為AI寫嘅唔代表佢production-ready。我應該喺code review catch到呢個,但我太focus喺migration work唔work,冇audit parameters。
4. Context比你想像嘅更重要
我原本嘅AWS code有temperature 0加JSON mode係有good reasons嘅——through painful experience學到。Migration期間,呢個context lost咗因為我冇explicitly mention。而家我document每個parameter同feature背後嘅「why」。
我嘅新AI Pair Programming Workflow
而家當我同AI做production code嘅時候,我explicit好多about constraints:
之前:
「Migrate this Lambda function to Cloud Run」
而家:
「Migrate this Lambda function to Cloud Run. This is for production JSON generation - prioritize reliability over creativity. Use temperature 0, JSON mode if available, and any other optimizations for structured data output.」
以下係Claude幫我build嘅production-optimized config:
```python
def get_ai_json_response(prompt: str) -> dict:
"""Production-optimized JSON generation with AI"""
response = anthropic.messages.create(
model="claude-sonnet-4-20250514",
temperature=0, # Zero creativity for structured data
response_format={"type": "json_object"}, # Force JSON mode
system="You are a JSON generation assistant. Output only valid JSON.",
messages=[{
"role": "user",
"content": f"{prompt}\n\nRespond with valid JSON only."
}]
)
# Explicit error handling for production
try:
return json.loads(response.content[0].text)
except json.JSONDecodeError as e:
logger.error(f"JSON parse failed: {e}")
logger.error(f"Raw response: {response.content[0].text}")
raise
```
分別?我俾咗AI佢需要嘅context嚟optimize for我嘅specific use case。
結果
API costs低咗30%,generation快咗40%。全部嚟自一段我actually specific about「production-ready」意味住咩嘅對話。
好好笑嘅係?連我哋嘅「creative」dialogue generation都用temperature 0。原來deterministic唔等於boring——佢代表reliable。啲對話仲係好natural因為prompts同content research提供variety,唔係靠random temperature fluctuations。
原來AI pair programming好work,只要你記得佢仍然係programming——requirements嘅precision俾你results嘅precision。
你有冇試過AI assistant做咗個「reasonable」嘅選擇但結果對你嘅use case完全錯嘅經歷?我好想聽吓你嘅war stories——我suspect我哋都曾經被sensible defaults咬過 :)
祝好,
Chandler
想用guaranteed valid JSON create你自己嘅AI podcasts?試吓DIALØGUE——2個免費credits嚟開始!:P
DIALØGUE Engineering系列第3部分。仲喺學「make it work」同「make it work efficiently」係完全唔同嘅requests。喺chandlernguyen.com跟住更多AI pair programming冒險。
系列下一篇:大約7日後——「From 3 Minutes to 500ms: The Signup Bug That Made No Sense」





