我第2日Build咗Multi-Tenancy。第67日，我Rebuild咗佢

2025年10月27日，夜晚11:47。我喺度run我以為係routine嘅security audit on STRAŦUM。所有嘢幾個禮拜以嚟都正常運作——SMEs有佢哋嘅data，agencies有佢哋嘅，multi-tenancy好solid。

咖啡已經凍咗。Audit script喺度churn through logs。然後我見到：Agencies係寫緊去SME tables。

唔係因為bug。唔係因為security hole。係因為architecture本身。

兩個月前，我做咗決定由Day 2開始build multi-tenant architecture。對一個得一個working AI agent嘅solo founder嚟講係bold move。我加咗org_id到每個table，寫咗RLS policies，build咗SMEs同Agencies嘅separate routing。佢work緊——SMEs有佢哋嘅campaigns，agencies有佢哋嘅clients，data flow去正確嘅地方。

或者我以為係咁。

我坐咗大概20分鐘只係盯住個schema。我點解miss咗呢個？我花咗幾個禮拜build multi-tenant architecture，寫咗83個RLS policies，用SME同Agency accounts test過。所有嘢都_work_。但「work緊」同「correct」唔係同一件事。

呢種bug會令你質疑自己係咪應該build software。因為佢唔係typo。唔係missed edge case。佢係architectural naivety。

我犯咗classic mistake：我assume org_id filtering就夠做multi-tenant isolation。佢唔夠。

呢個係發現true multi-tenant isolation需要嘅唔止加org_id到每個table——同埋48小時內33個migrations終於解決佢嘅故事。

---

> **Note**: 呢篇文嘅SQL examples用咗genericized schema同table names（tenant_b、workspace_entities、entity_data）for security。Concepts無論你用咩naming conventions都一樣。

---

問題：唔係所有Tenants都一樣

以下係我最初build嘅：

```sql
-- Brand guidelines table (shared by SMEs and Agencies)
CREATE TABLE brand_guidelines (
  id UUID PRIMARY KEY,
  org_id UUID REFERENCES organizations(id),
  name TEXT,
  guidelines JSONB,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- RLS policy (seems safe)
CREATE POLICY brand_guidelines_org_isolation ON brand_guidelines
  FOR ALL TO authenticated
  USING (org_id = get_user_org_id());
```

呢個for SMEs完美work。每個organization有自己嘅brand guidelines。Row-Level Security確保佢哋睇唔到對方嘅data。（至少，我以為係咁。之後有更多。）

但Agencies唔同。

Agencies唔只有一套brand guidelines。佢哋有每個client一套：

- Client A嘅brand guidelines（vibrant color palette、bold typography、innovation-focused messaging）

- Client B嘅brand guidelines（muted color palette、minimal design、quality-focused positioning）

同一個agency，唔同clients，完全唔同嘅brands。

Naive嘅solution（我最初build嘅）：

```sql
-- Add entity_id to the shared table
ALTER TABLE brand_guidelines ADD COLUMN entity_id UUID;

-- Update RLS policy
CREATE POLICY brand_guidelines_isolation ON brand_guidelines
  FOR ALL TO authenticated
  USING (
    organization_id = get_user_org_id() AND
    (entity_id IS NULL OR entity_id = get_user_entity_id())
  );
```

問題：呢個create咗一個有兩個唔同data models嘅table：

```
SME row:    organization_id='org-123',  entity_id=NULL,      guidelines={...}
Agency row: organization_id='org-456',  entity_id='entity-a', guidelines={...}
Agency row: organization_id='org-456',  entity_id='entity-b', guidelines={...}
```

Queries變成有complex NULL handling嘅mess，每個feature都需要application code入面嘅「if SME, else Agency」logic。

係嘅，我寫咗全部呢啲先意識到佢哋係deeper problem嘅symptoms。幾個禮拜嘅work，全部指向同一個conclusion：我architect咗自己入一個corner。

每個feature都需要custom logic：「If SME, do this. If Agency, do that.」

更差嘅係，architecture做咗incorrect assumptions：

- Agencies寫entity_id=NULL會pollute SME data

- SMEs唔可以有sub-entities即使佢哋想要sub-accounts

- Schema變成有nullable columns嘅「Swiss cheese」

呢個唔係multi-tenant architecture。呢個係一個table試緊serve兩個唔同嘅data models。

---

頓悟：唔同嘅Tenants需要唔同嘅Schemas

到10月底，我意識到真相：SMEs同Agencies唔share同一個data model。

SME data model：

```
organization → campaigns → agent_outputs
```

Agency data model：

```
organization → workspace_entities → campaigns → agent_outputs
                   ↓
            entity_data (e.g., brand guidelines, personas)
```

Agencies有成層（workspace entities）係SMEs冇嘅。佢哋亦有entity-specific intelligence唔應該存在喺SME world。

Solution：Separate database schemas。

```sql
-- SME tables (public schema)
public.brand_guidelines
public.campaigns
public.outputs

-- Agency tables (tenant_b schema)
tenant_b.workspace_entities
tenant_b.entity_data  -- Includes brand guidelines, personas, etc.
tenant_b.campaigns
tenant_b.outputs
```

而家SMEs同Agencies有完全唔同嘅tables。冇shared schema。冇nullable entity_id pollution。冇「if SME, else Agency」logic。

---

點解呢個重要：Schema Routing嘅Business Case

喺dive入technical implementation之前，我哋講吓點解呢個architectural decision重要——唔止係「佢係cleaner code」。

Future-Proofing for Growth（可能）

Schema routing唔止係解決今日嘅problem。佢係keep doors open for我甚至predict唔到嘅opportunities。

我仲喺private alpha有15個users。我冇enterprise customers。我冇同GDPR律師傾過。但以下係schema routing 可以 enable嘅嘢如果STRAŦUM grow：

International Expansion：

- 如果我哋expand去EU：Separate schemas可以enable data residency（EU client data喺EU servers嘅eu_agency schema）

- Right to deletion變得更簡單：Query一個schema，唔係filter through mixed tables

- Audit trails：「Show me all Client X data」= 一個schema query

Compliance Conversations：

- 當有人eventually問「How do you guarantee data isolation?」

- 用org_id filtering：「We use Row-Level Security policies」（vague，難verify）

- 用schema routing：「Each client's data lives in a separate database schema」（concrete，auditable）

- 我唔知呢個幾時matter。但佢_可以_matter如果我哋有呢啲conversations。

Honest Truth：

我而家唔係為HIPAA或SOC 2 compliance build。我係為需要更好marketing strategy嘅SMEs同small agencies build。

但schema routing代表如果有人問「Can you handle healthcare clients?」或者「Do you support data residency?」嘅時候，答案係「yes, let me show you the architecture」而唔係「let me rebuild everything first。」

Downsides（講老實）

Schema routing唔係全部都好。以下係佢actual cost：

Development Complexity：

- 每個WRITE operation需要一個router function

- 每個READ operation需要一個security view

- Testing需要SME同Agency兩個paths

- 用Claude Code：2日intense work（Oct 27-29, 2025）喺晚上

- 冇AI tools：會係幾個禮拜

Migration Risk：

- 33個sequential migrations = 33個typo嘅機會

- 一個錯誤嘅ALTER TABLE = production data corruption

- 每個migration要喺staging run 3次先掂production

- Paranoia係real嘅

Query Performance Overhead：

- 有UNION ALL嘅Views = 稍微慢啲嘅reads

- Router functions = 多一個function call on writes

- RLS + views = 更complex嘅query plans

- （實際上：我暫時冇notice到slowdowns，但我亦只有15個alpha users）

Operational Complexity：

- Schema migrations而家affect 2+個schemas（public + agency）

- Database backups需要schema-aware restore

- Monitoring queries需要check multiple schemas

- 呢個eventually會bite我，只係我唔知幾時

點解我仲係做咗呢個Trade-Off

Option value可能好大。或者完全唔matter。

Schema routing keep住doors open，我甚至唔確定自己想唔想行過去：

- White-label partnerships：可以俾partner佢哋自己嘅schema，rebrand個UI

- Reseller opportunities：Agencies可以用provable data isolation resell

- 唔同pricing tiers：「Premium」customers可以有dedicated schemas

- Geographic expansion：EU schema、US schema、APAC schema——同一個codebase

問題係：我喺private alpha。我唔知呢啲會唔會matter。可能我永遠唔會收到white-label request。可能geographic expansion仲有幾年。可能成盤生意pivot咗全部都irrelevant。

但以下係我知嘅：有schema routing，呢啲options存在。有org_id filtering，大部分都需要complete rewrite。

呢個就係我嘅bet： 而家多花2日（用Claude Code）嚟keep options open。

呢個bet啱唔啱？一年後問我。

---

Architecture：Schema Routing

Pattern 1：Schema-Specific Tables

有啲tables只為一種tenant type存在：

```sql
-- Specialized tenant schema
CREATE SCHEMA tenant_b;

-- Workspace entities (specific to this tenant type)
CREATE TABLE tenant_b.workspace_entities (
  id UUID PRIMARY KEY,
  organization_id UUID,
  name TEXT,
  metadata JSONB
);

-- Entity-specific data
CREATE TABLE tenant_b.entity_data (
  id UUID PRIMARY KEY,
  organization_id UUID,
  entity_id UUID REFERENCES tenant_b.workspace_entities(id),
  data_type TEXT,
  content JSONB
);
```

SMEs永遠唔會掂呢啲tables。佢哋喺public schema入面唔存在。

Pattern 2：Database Router Functions

你點寫入correct schema？**Router functions**。

以下係concept（simplified）：

```sql
CREATE FUNCTION save_resource_routed(params)
RETURNS JSONB
LANGUAGE plpgsql
SECURITY DEFINER  -- Run with elevated privileges
AS $$
BEGIN
  -- Step 1: Detect organization type
  SELECT type INTO org_type FROM organizations WHERE id = p_org_id;

  -- Step 2: Route to correct schema based on type
  IF org_type = 'TENANT_B' THEN
    INSERT INTO tenant_b.entity_data (...) VALUES (...);
  ELSE
    INSERT INTO public.brand_guidelines (...) VALUES (...);
  END IF;

  RETURN result;
END;
$$;
```

點運作：

1. Detect org type：Query organizations table嚟determine tenant type

2. Route to correct schema：根據type寫入appropriate schema

3. Return result：包括用咗邊個schema做debugging

Application code（所有tenant types都一樣）：

```typescript
// Just call the router function - no tenant-specific logic
const result = await supabase.rpc('save_resource_routed', {
  p_org_id: orgId,
  p_entity_id: entityId,  // null for simple tenants
  p_data: { ... }
});
```

Application code入面冇if/else。Database做routing。

寫我第一個router function用咗4個鐘。Debug點解佢唔work？又6個鐘。問題？我忘記grant EXECUTE permissions。Classic solo founder energy：architectural brilliance，permission oversights。:P

Pattern 3：Security-Invoker Views for Reads

Writing用router functions。Reading用****views****。

```sql
-- Unified view combining both schemas
CREATE VIEW resources_unified
WITH (security_invoker = on)  -- Respects RLS policies
AS
  SELECT id, organization_id, NULL AS entity_id, data, 'public' AS source
  FROM public.brand_guidelines
UNION ALL
  SELECT id, organization_id, entity_id, content AS data, 'tenant_b' AS source
  FROM tenant_b.entity_data
  WHERE data_type = 'brand_guidelines';
```

Application code（unified reads）：

```typescript
// Read resources (works for all tenant types)
const { data } = await supabase
  .from('resources_unified')
  .select('*')
  .eq('organization_id', orgId);

// RLS policies filter correctly regardless of source schema
```

Key detail：WITH (security_invoker = on)確保RLS policies被enforce。冇呢個，views會bypass RLS（security disaster）。

---

Migration：48小時內33個Migrations

加schema routing唔係一個migration。佢係一段journey。

你知咩好玩？連續寫33個database migrations同時知道如果有一個typo，你就會corrupt production data。其實「好玩」唔係正確嘅字。「恐怖」先啱。我每個migration喺staging run三次先掂production。

2025年10月27-29日：33個sequential migrations for complete schema routing。

Migration phases：

1. Create specialized schema - Setup tenant_b schema有proper permissions

2. Create schema-specific tables - 喺新schema mirror necessary tables

3. Build router functions - 每種需要routing嘅resource type一個

4. Create security views - 有UNION ALL嘅unified views for reads

5. Update RLS policies - 確保兩個schemas都有proper isolation

6. Data migration - 將現有data搬去正確嘅schemas

7. Application updates - 由direct queries switch去router functions/views

Total effort：33個migrations，用Claude Code 2日，100% worth it。

---

結果：True Multi-Tenant Isolation

Before（Shared Tables有org_id）

Data model：

```sql
public.brand_guidelines (organization_id, entity_id, guidelines)
```

問題：

- ❌ 一種tenant type有nullable entity_id（data model confusion）

- ❌ 有NULL handling嘅complex queries

- ❌ Application logic：if (tenantTypeA) { ... } else { ... }

- ❌ Cross-contamination風險

After（Schema Routing）

Data model：

```sql
public.brand_guidelines (organization_id, guidelines)    -- Tenant Type A
tenant_b.entity_data (organization_id, entity_id, data)  -- Tenant Type B
```

好處：

- ✅ Clean data models（冇nullable foreign keys）

- ✅ 冇complex NULL handling嘅simple queries

- ✅ 冇application if/else（database handle routing）

- ✅ Cross-schema contamination唔可能（physically separate）

Security Improvements

Before：有nullable columns同shared tables嘅cross-contamination風險

After：Router functions根據organization type自動direct writes去correct schema。Physical schema separation令cross-contamination變得唔可能。

Isolation level：Database-enforced separation。唔係application-level checks。

---

幾時用Schema Routing vs org_id Filtering

唔係每個multi-tenant app都需要schema routing。用呢個decision tree：

用`org_id` Filtering（Simpler）如果：

✅ 所有tenants有同一個data model（例如todos app）

✅ 冇hierarchical tenancy（organizations入面冇sub-entities）

✅ Simple queries（WHERE org_id = X到處work）

✅ B2C或small B2B（冇enterprise sales，冇compliance requirements）

✅ MVP speed matters（幾個禮拜上市，唔係幾個月）

Business reasoning： 你喺度validate product-market fit，唔係為Fortune 500 compliance requirements build。Ship快，之後refactor如果你有enterprise traction。

例子 Project management tool，每個organization manage自己嘅projects。

```sql
CREATE TABLE projects (
  id UUID PRIMARY KEY,
  org_id UUID,  -- Simple filtering
  name TEXT
);
```

用Schema Routing（More Complex）如果：

✅ 唔同tenant types需要唔同data models（Type A vs Type B vs Type C）

✅ Hierarchical tenancy（organizations → workspace_entities → sub-entities）

✅ Enterprise sales喺roadmap上（Fortune 500、healthcare、finance、government）

✅ 需要regulatory compliance（GDPR、HIPAA、SOC 2、FedRAMP）

✅ 有white-label或reseller potential（partners需要complete data isolation）

✅ 計劃international expansion（data residency requirements）

Business reasoning： 如果你target enterprise customers，「data isolation」會變成security questionnaires上嘅checkbox。Schema routing令你可以自信咁answer。Row-level filtering令你要hedge。

例子：Platform入面有啲tenants有hierarchical workspace structures，fundamentally同direct customers唔同。

Cost/Benefit（我嘅經驗）：

- Schema routing cost：用Claude Code 2日（冇AI assistance會係幾個禮拜）

- Potential upside：Cleaner architecture、compliance readiness、partnership options

- Actual upside：Unknown——我仲喺private alpha

- Break-even：如果schema routing開到一道我唔能夠行過嘅門，佢就pay for itself

真正嘅問題： 你optimize緊speed-to-market定係optionality？兩個都valid。我揀咗optionality。

---

Alternative Isolation Strategies

Schema routing唔係唯一嘅approach。以下係spectrum：

Level 1：Separate Databases（Highest Isolation）

```

database_tenant_1

database_tenant_2

database_tenant_3

```

Pros：

- ✅ Complete physical isolation

- ✅ Per-tenant backups

- ✅ Independent scaling

- ✅ Regulatory compliance（data residency）

Cons：

- ❌ High operational complexity（manage N databases）

- ❌ 貴（每個tenant一個database instance）

- ❌ Cross-tenant queries唔可能

- ❌ Schema migrations across所有databases

Use case：有regulatory requirements嘅enterprise SaaS，high-value customers（$10k+/月）。

Level 2：Separate Schemas（Strong Isolation）

```
database
├── schema_tenant_1
├── schema_tenant_2
└── public (shared tables)
```

Pros：

- ✅ Strong logical isolation

- ✅ Shared infrastructure（一個database）

- ✅ Schema-level permissions

- ✅ 每種tenant type唔同嘅data models

Cons：

- ❌ 比row-level更complex

- ❌ 需要router functions

- ❌ Migration complexity（N個schemas）

Use case：有唔同customer tiers嘅B2B SaaS（STRAŦUM嘅approach）。

Level 3：Row-Level Filtering有RLS（Moderate Isolation）

```
database
└── public
    └── table (with org_id column)
```

Pros：

- ✅ 簡單implement

- ✅ 容易migrations（一個schema）

- ✅ 可以做cross-tenant analytics

- ✅ Postgres RLS enforce isolation

Cons：

- ❌ 所有tenants share同一個data model

- ❌ RLS performance overhead

- ❌ RLS misconfigurations嘅風險

Use case：有uniform data models嘅B2B SaaS（project management、CRM）。

---

Implementation Checklist：Schema Routing

如果你喺度implement schema routing，用呢個checklist：

Phase 1：Schema Design

- [ ] Create tenant type discriminator（organizations.type）

- [ ] Design schema-specific tables（咩屬於邊度？）

- [ ] Create specialized schema：CREATE SCHEMA tenant_b;

- [ ] 喺specialized schema mirror necessary tables

- [ ] Document邊個tables住喺邊個schema

Phase 2：Router Functions

- [ ] 為每種resource type寫router function

- [ ] 用SECURITY DEFINER for elevated permissions

- [ ] Set search_path = public, tenant_b for multi-schema access

- [ ] Handle tenant type detection：SELECT type FROM organizations

- [ ] Return schema info for debugging

- [ ] Grant EXECUTE to authenticated role

Phase 3：Security Views

- [ ] Create unified views for reads（UNION ALL across schemas）

- [ ] 用WITH (security_invoker = on) for RLS enforcement

- [ ] 加source_schema column for debugging

- [ ] Test RLS policies work on views

- [ ] Grant SELECT to authenticated role

Phase 4：Application Integration

- [ ] Update writes用router functions：supabase.rpc('save_resource_routed', ...)

- [ ] Update reads用views：supabase.from('resource_unified').select()

- [ ] 移除application-level if/else logic

- [ ] Test tenant type A flow（writes to public）

- [ ] Test tenant type B flow（writes to tenant_b）

- [ ] Verify data isolation（Entity A ≠ Entity B）

Phase 5：Migration & Testing

- [ ] 寫migration scripts（30+個for full coverage）

- [ ] 用real-like data喺staging test

- [ ] Run security audit（cross-schema leakage？）

- [ ] Load test（RLS + views performance）

- [ ] Monitor in production（slow queries？）

---

學到嘅嘢

1. org_id係Necessary，唔係Sufficient

加org_id到每個table俾你row-level filtering。但如果唔同tenant types需要唔同data models，你需要schema routing。

Learning： 「Multi-tenant」唔係binary。有唔同levels嘅isolation。我揀咗比我目前15個users strictly需要嘅更高level。時間會話俾我知呢個係smart定只係extra work。

2. Application Logic → Database Logic

你application入面每個if (tenantType === 'TYPE_B')都係code smell。用router functions將tenant-aware logic搬去database。

Learning： Database functions更難寫但potentially更容易audit。如果我有一日有enterprise customers問「prove your data isolation」，我可以指住stored procedures。但而家，呢個係hypothetical。

3. Views + RLS = Unified Reads

由multiple schemas read係complex。Views + security_invoker = on俾你unified reads有proper isolation。

Learning： Views create abstraction layers可能有一日令compliance更容易。或者佢哋只係加咗我唔需要嘅complexity。We'll see。

4. Security Definer Functions好Powerful

SECURITY DEFINER令functions可以用elevated privileges run同時仲respect RLS policies。對router functions嚟講係essential嘅。

5. Migrations係Worth It（大概）

33個migrations for schema routing感覺好多。但結果？Clean architecture、true isolation、同零cross-tenant bugs。

Learning： Technical debt有price。我揀咗喺有15個alpha users嘅時候早啲pay，而唔係等到有paying customers。呢個係唔係right call？我有100個customers要worry嘅時候就會知。

6. Architectural Flaws比Code Bugs更傷

喺11:47 PM搵到null pointer exception？煩。搵到你成個multi-tenant architecture fundamentally broken？呢種discovery令你夜晚瞓唔著。

但以下係我學到嘅：architectural mistakes係fixable嘅。佢哋expensive，係。Time-consuming，絕對係。但我由「一種tenant type可以accidentally contaminate另一種嘅data」去到「database-enforced isolation唔可能bypass」。

早啲fix，fix啱佢，你會瞓得更好。

7. Architecture可能係Strategy（或者可能只係Over-Engineering）

Schema routing嘅決定唔止係「clean code」。佢係keep future options open——white-label partnerships、enterprise sales、international expansion。

但honest truth係：我喺private alpha有15個users。我冇enterprise customers knocking。我冇收到一個GDPR question。我想像中嘅partnerships可能永遠唔會materialize。

我以為自己喺度做technical decision。可能我係喺度做business strategy decision。或者可能我只係因為覺得database architecture有趣而over-engineering。:)

你有冇試過發現一個唔係bug而係design mistake嘅architectural flaw？你點handle個rebuild——incremental fix定係好似我咁rip the whole thing out？

祝好，

Chandler

STRAŦUM architecture series： 呢個係multi-tenancy journey嘅Part 2。佢由第2日build multi-tenancy開始。Schema rebuild之後，我發現31個blank screens因為lost navigation context同我嘅database係correct但296x太慢。

---

Build緊有complex isolation需求嘅multi-tenant SaaS？ STRAŦUM用schema routing嚟serve唔同tenant types有true data isolation。喺https://stratum.chandlernguyen.com/request-invitation申請alpha access

---

仲喺學緊「multi-tenant」有好多levels嘅isolation。仲喺半夜debug RLS policies。仲喺質疑我Day 2嘅architecture decisions（但而家少咗）。更多database adventures喺 https://www.chandlernguyen.com/

---

問題：唔係所有Tenants都一樣

頓悟：唔同嘅Tenants需要唔同嘅Schemas

點解呢個重要：Schema Routing嘅Business Case

Future-Proofing for Growth（可能）

Downsides（講老實）

點解我仲係做咗呢個Trade-Off

Architecture：Schema Routing

Pattern 1：Schema-Specific Tables

Pattern 2：Database Router Functions

Pattern 3：Security-Invoker Views for Reads

Migration：48小時內33個Migrations

結果：True Multi-Tenant Isolation

Before（Shared Tables有org_id）

After（Schema Routing）

Security Improvements

幾時用Schema Routing vs org_id Filtering

用org_id Filtering（Simpler）如果：

用Schema Routing（More Complex）如果：

Alternative Isolation Strategies

Level 1：Separate Databases（Highest Isolation）

Level 2：Separate Schemas（Strong Isolation）

Level 3：Row-Level Filtering有RLS（Moderate Isolation）

Implementation Checklist：Schema Routing

Phase 1：Schema Design

Phase 2：Router Functions

Phase 3：Security Views

Phase 4：Application Integration

Phase 5：Migration & Testing

學到嘅嘢

1. org_id係Necessary，唔係Sufficient

2. Application Logic → Database Logic

3. Views + RLS = Unified Reads

4. Security Definer Functions好Powerful

5. Migrations係Worth It（大概）

6. Architectural Flaws比Code Bugs更傷

7. Architecture可能係Strategy（或者可能只係Over-Engineering）

繼續閱讀

Blank Screens Kill Trust。我有31個。

我知Agency需要咩——所以我第2日就Build咗Multi-Tenancy

我嘅Database係「Correct」嘅。佢亦慢咗296倍。

AI 冇記憶，只係一個貴價 Chatbot

Marketing Teams一齊Work。而家STRATUM都係。

STRAŦUM：我75日內Solo Build嘅9-Agent Marketing Application（其中病咗10日）

用`org_id` Filtering（Simpler）如果：