我第2日Build咗Multi-Tenancy。第67日,我Rebuild咗佢
我以為加org_id到每個table就係bulletproof嘅multi-tenancy。然後我嘅security audit揭示agencies係寫緊去SME tables——唔係因為bug,而係因為design。
2025年10月27日,夜晚11:47。我喺度run我以為係routine嘅security audit on STRAŦUM。所有嘢幾個禮拜以嚟都正常運作——SMEs有佢哋嘅data,agencies有佢哋嘅,multi-tenancy好solid。
咖啡已經凍咗。Audit script喺度churn through logs。然後我見到:Agencies係寫緊去SME tables。
唔係因為bug。唔係因為security hole。係因為architecture本身。
兩個月前,我做咗決定由Day 2開始build multi-tenant architecture。對一個得一個working AI agent嘅solo founder嚟講係bold move。我加咗org_id到每個table,寫咗RLS policies,build咗SMEs同Agencies嘅separate routing。佢work緊——SMEs有佢哋嘅campaigns,agencies有佢哋嘅clients,data flow去正確嘅地方。
或者我以為係咁。
我坐咗大概20分鐘只係盯住個schema。我點解miss咗呢個?我花咗幾個禮拜build multi-tenant architecture,寫咗83個RLS policies,用SME同Agency accounts test過。所有嘢都_work_。但「work緊」同「correct」唔係同一件事。
呢種bug會令你質疑自己係咪應該build software。因為佢唔係typo。唔係missed edge case。佢係architectural naivety。
我犯咗classic mistake:我assume org_id filtering就夠做multi-tenant isolation。佢唔夠。
呢個係發現true multi-tenant isolation需要嘅唔止加org_id到每個table——同埋48小時內33個migrations終於解決佢嘅故事。
---
> **Note**: 呢篇文嘅SQL examples用咗genericized schema同table names(tenant_b、workspace_entities、entity_data)for security。Concepts無論你用咩naming conventions都一樣。
---
問題:唔係所有Tenants都一樣
以下係我最初build嘅:
```sql
-- Brand guidelines table (shared by SMEs and Agencies)
CREATE TABLE brand_guidelines (
id UUID PRIMARY KEY,
org_id UUID REFERENCES organizations(id),
name TEXT,
guidelines JSONB,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- RLS policy (seems safe)
CREATE POLICY brand_guidelines_org_isolation ON brand_guidelines
FOR ALL TO authenticated
USING (org_id = get_user_org_id());
```
呢個for SMEs完美work。每個organization有自己嘅brand guidelines。Row-Level Security確保佢哋睇唔到對方嘅data。(至少,我以為係咁。之後有更多。)
但Agencies唔同。
Agencies唔只有一套brand guidelines。佢哋有每個client一套:
- Client A嘅brand guidelines(vibrant color palette、bold typography、innovation-focused messaging)
- Client B嘅brand guidelines(muted color palette、minimal design、quality-focused positioning)
同一個agency,唔同clients,完全唔同嘅brands。
Naive嘅solution(我最初build嘅):
```sql
-- Add entity_id to the shared table
ALTER TABLE brand_guidelines ADD COLUMN entity_id UUID;
-- Update RLS policy
CREATE POLICY brand_guidelines_isolation ON brand_guidelines
FOR ALL TO authenticated
USING (
organization_id = get_user_org_id() AND
(entity_id IS NULL OR entity_id = get_user_entity_id())
);
```
問題:呢個create咗一個有兩個唔同data models嘅table:
```
SME row: organization_id='org-123', entity_id=NULL, guidelines={...}
Agency row: organization_id='org-456', entity_id='entity-a', guidelines={...}
Agency row: organization_id='org-456', entity_id='entity-b', guidelines={...}
```
Queries變成有complex NULL handling嘅mess,每個feature都需要application code入面嘅「if SME, else Agency」logic。
係嘅,我寫咗全部呢啲先意識到佢哋係deeper problem嘅symptoms。幾個禮拜嘅work,全部指向同一個conclusion:我architect咗自己入一個corner。
每個feature都需要custom logic:「If SME, do this. If Agency, do that.」
更差嘅係,architecture做咗incorrect assumptions:
- Agencies寫entity_id=NULL會pollute SME data
- SMEs唔可以有sub-entities即使佢哋想要sub-accounts
- Schema變成有nullable columns嘅「Swiss cheese」
呢個唔係multi-tenant architecture。呢個係一個table試緊serve兩個唔同嘅data models。
---
頓悟:唔同嘅Tenants需要唔同嘅Schemas
到10月底,我意識到真相:SMEs同Agencies唔share同一個data model。
SME data model:
```
organization → campaigns → agent_outputs
```
Agency data model:
```
organization → workspace_entities → campaigns → agent_outputs
↓
entity_data (e.g., brand guidelines, personas)
```
Agencies有成層(workspace entities)係SMEs冇嘅。佢哋亦有entity-specific intelligence唔應該存在喺SME world。
Solution:Separate database schemas。
```sql
-- SME tables (public schema)
public.brand_guidelines
public.campaigns
public.outputs
-- Agency tables (tenant_b schema)
tenant_b.workspace_entities
tenant_b.entity_data -- Includes brand guidelines, personas, etc.
tenant_b.campaigns
tenant_b.outputs
```
而家SMEs同Agencies有完全唔同嘅tables。冇shared schema。冇nullable entity_id pollution。冇「if SME, else Agency」logic。
---
點解呢個重要:Schema Routing嘅Business Case
喺dive入technical implementation之前,我哋講吓點解呢個architectural decision重要——唔止係「佢係cleaner code」。
Future-Proofing for Growth(可能)
Schema routing唔止係解決今日嘅problem。佢係keep doors open for我甚至predict唔到嘅opportunities。
我仲喺private alpha有15個users。我冇enterprise customers。我冇同GDPR律師傾過。但以下係schema routing 可以 enable嘅嘢如果STRAŦUM grow:
International Expansion:
- 如果我哋expand去EU:Separate schemas可以enable data residency(EU client data喺EU servers嘅eu_agency schema)
- Right to deletion變得更簡單:Query一個schema,唔係filter through mixed tables
- Audit trails:「Show me all Client X data」= 一個schema query
Compliance Conversations:
- 當有人eventually問「How do you guarantee data isolation?」
- 用org_id filtering:「We use Row-Level Security policies」(vague,難verify)
- 用schema routing:「Each client's data lives in a separate database schema」(concrete,auditable)
- 我唔知呢個幾時matter。但佢_可以_matter如果我哋有呢啲conversations。
Honest Truth:
我而家唔係為HIPAA或SOC 2 compliance build。我係為需要更好marketing strategy嘅SMEs同small agencies build。
但schema routing代表如果有人問「Can you handle healthcare clients?」或者「Do you support data residency?」嘅時候,答案係「yes, let me show you the architecture」而唔係「let me rebuild everything first。」
Downsides(講老實)
Schema routing唔係全部都好。以下係佢actual cost:
Development Complexity:
- 每個WRITE operation需要一個router function
- 每個READ operation需要一個security view
- Testing需要SME同Agency兩個paths
- 用Claude Code:2日intense work(Oct 27-29, 2025)喺晚上
- 冇AI tools:會係幾個禮拜
Migration Risk:
- 33個sequential migrations = 33個typo嘅機會
- 一個錯誤嘅ALTER TABLE = production data corruption
- 每個migration要喺staging run 3次先掂production
- Paranoia係real嘅
Query Performance Overhead:
- 有UNION ALL嘅Views = 稍微慢啲嘅reads
- Router functions = 多一個function call on writes
- RLS + views = 更complex嘅query plans
- (實際上:我暫時冇notice到slowdowns,但我亦只有15個alpha users)
Operational Complexity:
- Schema migrations而家affect 2+個schemas(public + agency)
- Database backups需要schema-aware restore
- Monitoring queries需要check multiple schemas
- 呢個eventually會bite我,只係我唔知幾時
點解我仲係做咗呢個Trade-Off
Option value可能好大。或者完全唔matter。
Schema routing keep住doors open,我甚至唔確定自己想唔想行過去:
- White-label partnerships:可以俾partner佢哋自己嘅schema,rebrand個UI
- Reseller opportunities:Agencies可以用provable data isolation resell
- 唔同pricing tiers:「Premium」customers可以有dedicated schemas
- Geographic expansion:EU schema、US schema、APAC schema——同一個codebase
問題係:我喺private alpha。我唔知呢啲會唔會matter。可能我永遠唔會收到white-label request。可能geographic expansion仲有幾年。可能成盤生意pivot咗全部都irrelevant。
但以下係我知嘅:有schema routing,呢啲options存在。有org_id filtering,大部分都需要complete rewrite。
呢個就係我嘅bet: 而家多花2日(用Claude Code)嚟keep options open。
呢個bet啱唔啱?一年後問我。
---
Architecture:Schema Routing
Pattern 1:Schema-Specific Tables
有啲tables只為一種tenant type存在:
```sql
-- Specialized tenant schema
CREATE SCHEMA tenant_b;
-- Workspace entities (specific to this tenant type)
CREATE TABLE tenant_b.workspace_entities (
id UUID PRIMARY KEY,
organization_id UUID,
name TEXT,
metadata JSONB
);
-- Entity-specific data
CREATE TABLE tenant_b.entity_data (
id UUID PRIMARY KEY,
organization_id UUID,
entity_id UUID REFERENCES tenant_b.workspace_entities(id),
data_type TEXT,
content JSONB
);
```
SMEs永遠唔會掂呢啲tables。佢哋喺public schema入面唔存在。
Pattern 2:Database Router Functions
你點寫入correct schema?**Router functions**。
以下係concept(simplified):
```sql
CREATE FUNCTION save_resource_routed(params)
RETURNS JSONB
LANGUAGE plpgsql
SECURITY DEFINER -- Run with elevated privileges
AS $$
BEGIN
-- Step 1: Detect organization type
SELECT type INTO org_type FROM organizations WHERE id = p_org_id;
-- Step 2: Route to correct schema based on type
IF org_type = 'TENANT_B' THEN
INSERT INTO tenant_b.entity_data (...) VALUES (...);
ELSE
INSERT INTO public.brand_guidelines (...) VALUES (...);
END IF;
RETURN result;
END;
$$;
```
點運作:
1. Detect org type:Query organizations table嚟determine tenant type
2. Route to correct schema:根據type寫入appropriate schema
3. Return result:包括用咗邊個schema做debugging
Application code(所有tenant types都一樣):
```typescript
// Just call the router function - no tenant-specific logic
const result = await supabase.rpc('save_resource_routed', {
p_org_id: orgId,
p_entity_id: entityId, // null for simple tenants
p_data: { ... }
});
```
Application code入面冇if/else。Database做routing。
寫我第一個router function用咗4個鐘。Debug點解佢唔work?又6個鐘。問題?我忘記grant EXECUTE permissions。Classic solo founder energy:architectural brilliance,permission oversights。:P
Pattern 3:Security-Invoker Views for Reads
Writing用router functions。Reading用****views****。
```sql
-- Unified view combining both schemas
CREATE VIEW resources_unified
WITH (security_invoker = on) -- Respects RLS policies
AS
SELECT id, organization_id, NULL AS entity_id, data, 'public' AS source
FROM public.brand_guidelines
UNION ALL
SELECT id, organization_id, entity_id, content AS data, 'tenant_b' AS source
FROM tenant_b.entity_data
WHERE data_type = 'brand_guidelines';
```
Application code(unified reads):
```typescript
// Read resources (works for all tenant types)
const { data } = await supabase
.from('resources_unified')
.select('*')
.eq('organization_id', orgId);
// RLS policies filter correctly regardless of source schema
```
Key detail:WITH (security_invoker = on)確保RLS policies被enforce。冇呢個,views會bypass RLS(security disaster)。
---
Migration:48小時內33個Migrations
加schema routing唔係一個migration。佢係一段journey。
你知咩好玩?連續寫33個database migrations同時知道如果有一個typo,你就會corrupt production data。其實「好玩」唔係正確嘅字。「恐怖」先啱。我每個migration喺staging run三次先掂production。
2025年10月27-29日:33個sequential migrations for complete schema routing。
Migration phases:
1. Create specialized schema - Setup tenant_b schema有proper permissions
2. Create schema-specific tables - 喺新schema mirror necessary tables
3. Build router functions - 每種需要routing嘅resource type一個
4. Create security views - 有UNION ALL嘅unified views for reads
5. Update RLS policies - 確保兩個schemas都有proper isolation
6. Data migration - 將現有data搬去正確嘅schemas
7. Application updates - 由direct queries switch去router functions/views
Total effort:33個migrations,用Claude Code 2日,100% worth it。
---
結果:True Multi-Tenant Isolation
Before(Shared Tables有org_id)
Data model:
```sql
public.brand_guidelines (organization_id, entity_id, guidelines)
```
問題:
- ❌ 一種tenant type有nullable entity_id(data model confusion)
- ❌ 有NULL handling嘅complex queries
- ❌ Application logic:if (tenantTypeA) { ... } else { ... }
- ❌ Cross-contamination風險
After(Schema Routing)
Data model:
```sql
public.brand_guidelines (organization_id, guidelines) -- Tenant Type A
tenant_b.entity_data (organization_id, entity_id, data) -- Tenant Type B
```
好處:
- ✅ Clean data models(冇nullable foreign keys)
- ✅ 冇complex NULL handling嘅simple queries
- ✅ 冇application if/else(database handle routing)
- ✅ Cross-schema contamination唔可能(physically separate)
Security Improvements
Before:有nullable columns同shared tables嘅cross-contamination風險
After:Router functions根據organization type自動direct writes去correct schema。Physical schema separation令cross-contamination變得唔可能。
Isolation level:Database-enforced separation。唔係application-level checks。
---
幾時用Schema Routing vs org_id Filtering
唔係每個multi-tenant app都需要schema routing。用呢個decision tree:
用org_id Filtering(Simpler)如果:
✅ 所有tenants有同一個data model(例如todos app)
✅ 冇hierarchical tenancy(organizations入面冇sub-entities)
✅ Simple queries(WHERE org_id = X到處work)
✅ B2C或small B2B(冇enterprise sales,冇compliance requirements)
✅ MVP speed matters(幾個禮拜上市,唔係幾個月)
Business reasoning: 你喺度validate product-market fit,唔係為Fortune 500 compliance requirements build。Ship快,之後refactor如果你有enterprise traction。
例子 Project management tool,每個organization manage自己嘅projects。
```sql
CREATE TABLE projects (
id UUID PRIMARY KEY,
org_id UUID, -- Simple filtering
name TEXT
);
```
用Schema Routing(More Complex)如果:
✅ 唔同tenant types需要唔同data models(Type A vs Type B vs Type C)
✅ Hierarchical tenancy(organizations → workspace_entities → sub-entities)
✅ Enterprise sales喺roadmap上(Fortune 500、healthcare、finance、government)
✅ 需要regulatory compliance(GDPR、HIPAA、SOC 2、FedRAMP)
✅ 有white-label或reseller potential(partners需要complete data isolation)
✅ 計劃international expansion(data residency requirements)
Business reasoning: 如果你target enterprise customers,「data isolation」會變成security questionnaires上嘅checkbox。Schema routing令你可以自信咁answer。Row-level filtering令你要hedge。
例子:Platform入面有啲tenants有hierarchical workspace structures,fundamentally同direct customers唔同。
Cost/Benefit(我嘅經驗):
- Schema routing cost:用Claude Code 2日(冇AI assistance會係幾個禮拜)
- Potential upside:Cleaner architecture、compliance readiness、partnership options
- Actual upside:Unknown——我仲喺private alpha
- Break-even:如果schema routing開到一道我唔能夠行過嘅門,佢就pay for itself
真正嘅問題: 你optimize緊speed-to-market定係optionality?兩個都valid。我揀咗optionality。
---
Alternative Isolation Strategies
Schema routing唔係唯一嘅approach。以下係spectrum:
Level 1:Separate Databases(Highest Isolation)
```
database_tenant_1
database_tenant_2
database_tenant_3
```
Pros:
- ✅ Complete physical isolation
- ✅ Per-tenant backups
- ✅ Independent scaling
- ✅ Regulatory compliance(data residency)
Cons:
- ❌ High operational complexity(manage N databases)
- ❌ 貴(每個tenant一個database instance)
- ❌ Cross-tenant queries唔可能
- ❌ Schema migrations across所有databases
Use case:有regulatory requirements嘅enterprise SaaS,high-value customers($10k+/月)。
Level 2:Separate Schemas(Strong Isolation)
```
database
├── schema_tenant_1
├── schema_tenant_2
└── public (shared tables)
```
Pros:
- ✅ Strong logical isolation
- ✅ Shared infrastructure(一個database)
- ✅ Schema-level permissions
- ✅ 每種tenant type唔同嘅data models
Cons:
- ❌ 比row-level更complex
- ❌ 需要router functions
- ❌ Migration complexity(N個schemas)
Use case:有唔同customer tiers嘅B2B SaaS(STRAŦUM嘅approach)。
Level 3:Row-Level Filtering有RLS(Moderate Isolation)
```
database
└── public
└── table (with org_id column)
```
Pros:
- ✅ 簡單implement
- ✅ 容易migrations(一個schema)
- ✅ 可以做cross-tenant analytics
- ✅ Postgres RLS enforce isolation
Cons:
- ❌ 所有tenants share同一個data model
- ❌ RLS performance overhead
- ❌ RLS misconfigurations嘅風險
Use case:有uniform data models嘅B2B SaaS(project management、CRM)。
---
Implementation Checklist:Schema Routing
如果你喺度implement schema routing,用呢個checklist:
Phase 1:Schema Design
- [ ] Create tenant type discriminator(organizations.type)
- [ ] Design schema-specific tables(咩屬於邊度?)
- [ ] Create specialized schema:CREATE SCHEMA tenant_b;
- [ ] 喺specialized schema mirror necessary tables
- [ ] Document邊個tables住喺邊個schema
Phase 2:Router Functions
- [ ] 為每種resource type寫router function
- [ ] 用SECURITY DEFINER for elevated permissions
- [ ] Set search_path = public, tenant_b for multi-schema access
- [ ] Handle tenant type detection:SELECT type FROM organizations
- [ ] Return schema info for debugging
- [ ] Grant EXECUTE to authenticated role
Phase 3:Security Views
- [ ] Create unified views for reads(UNION ALL across schemas)
- [ ] 用WITH (security_invoker = on) for RLS enforcement
- [ ] 加source_schema column for debugging
- [ ] Test RLS policies work on views
- [ ] Grant SELECT to authenticated role
Phase 4:Application Integration
- [ ] Update writes用router functions:supabase.rpc('save_resource_routed', ...)
- [ ] Update reads用views:supabase.from('resource_unified').select()
- [ ] 移除application-level if/else logic
- [ ] Test tenant type A flow(writes to public)
- [ ] Test tenant type B flow(writes to tenant_b)
- [ ] Verify data isolation(Entity A ≠ Entity B)
Phase 5:Migration & Testing
- [ ] 寫migration scripts(30+個for full coverage)
- [ ] 用real-like data喺staging test
- [ ] Run security audit(cross-schema leakage?)
- [ ] Load test(RLS + views performance)
- [ ] Monitor in production(slow queries?)
---
學到嘅嘢
1. org_id係Necessary,唔係Sufficient
加org_id到每個table俾你row-level filtering。但如果唔同tenant types需要唔同data models,你需要schema routing。
Learning: 「Multi-tenant」唔係binary。有唔同levels嘅isolation。我揀咗比我目前15個users strictly需要嘅更高level。時間會話俾我知呢個係smart定只係extra work。
2. Application Logic → Database Logic
你application入面每個if (tenantType === 'TYPE_B')都係code smell。用router functions將tenant-aware logic搬去database。
Learning: Database functions更難寫但potentially更容易audit。如果我有一日有enterprise customers問「prove your data isolation」,我可以指住stored procedures。但而家,呢個係hypothetical。
3. Views + RLS = Unified Reads
由multiple schemas read係complex。Views + security_invoker = on俾你unified reads有proper isolation。
Learning: Views create abstraction layers可能有一日令compliance更容易。或者佢哋只係加咗我唔需要嘅complexity。We'll see。
4. Security Definer Functions好Powerful
SECURITY DEFINER令functions可以用elevated privileges run同時仲respect RLS policies。對router functions嚟講係essential嘅。
5. Migrations係Worth It(大概)
33個migrations for schema routing感覺好多。但結果?Clean architecture、true isolation、同零cross-tenant bugs。
Learning: Technical debt有price。我揀咗喺有15個alpha users嘅時候早啲pay,而唔係等到有paying customers。呢個係唔係right call?我有100個customers要worry嘅時候就會知。
6. Architectural Flaws比Code Bugs更傷
喺11:47 PM搵到null pointer exception?煩。搵到你成個multi-tenant architecture fundamentally broken?呢種discovery令你夜晚瞓唔著。
但以下係我學到嘅:architectural mistakes係fixable嘅。佢哋expensive,係。Time-consuming,絕對係。但我由「一種tenant type可以accidentally contaminate另一種嘅data」去到「database-enforced isolation唔可能bypass」。
早啲fix,fix啱佢,你會瞓得更好。
7. Architecture可能係Strategy(或者可能只係Over-Engineering)
Schema routing嘅決定唔止係「clean code」。佢係keep future options open——white-label partnerships、enterprise sales、international expansion。
但honest truth係:我喺private alpha有15個users。我冇enterprise customers knocking。我冇收到一個GDPR question。我想像中嘅partnerships可能永遠唔會materialize。
我以為自己喺度做technical decision。可能我係喺度做business strategy decision。或者可能我只係因為覺得database architecture有趣而over-engineering。:)
你有冇試過發現一個唔係bug而係design mistake嘅architectural flaw?你點handle個rebuild——incremental fix定係好似我咁rip the whole thing out?
祝好,
Chandler
STRAŦUM architecture series: 呢個係multi-tenancy journey嘅Part 2。佢由第2日build multi-tenancy開始。Schema rebuild之後,我發現31個blank screens因為lost navigation context同我嘅database係correct但296x太慢。
---
Build緊有complex isolation需求嘅multi-tenant SaaS? STRAŦUM用schema routing嚟serve唔同tenant types有true data isolation。喺https://stratum.chandlernguyen.com/request-invitation申請alpha access
---
仲喺學緊「multi-tenant」有好多levels嘅isolation。仲喺半夜debug RLS policies。仲喺質疑我Day 2嘅architecture decisions(但而家少咗)。更多database adventures喺 https://www.chandlernguyen.com/
---





