Skip to content
··閱讀時間8分鐘

我第2日Build咗Multi-Tenancy。第67日,我Rebuild咗佢

我以為加org_id到每個table就係bulletproof嘅multi-tenancy。然後我嘅security audit揭示agencies係寫緊去SME tables——唔係因為bug,而係因為design。

2025年10月27日,夜晚11:47。我喺度run我以為係routine嘅security audit on STRAŦUM。所有嘢幾個禮拜以嚟都正常運作——SMEs有佢哋嘅data,agencies有佢哋嘅,multi-tenancy好solid。

咖啡已經凍咗。Audit script喺度churn through logs。然後我見到:Agencies係寫緊去SME tables

唔係因為bug。唔係因為security hole。係因為architecture本身。

兩個月前,我做咗決定由Day 2開始build multi-tenant architecture。對一個得一個working AI agent嘅solo founder嚟講係bold move。我加咗org_id到每個table,寫咗RLS policies,build咗SMEs同Agencies嘅separate routing。佢work緊——SMEs有佢哋嘅campaigns,agencies有佢哋嘅clients,data flow去正確嘅地方。

或者我以為係咁。

我坐咗大概20分鐘只係盯住個schema。我點解miss咗呢個?我花咗幾個禮拜build multi-tenant architecture,寫咗83個RLS policies,用SME同Agency accounts test過。所有嘢都_work_。但「work緊」同「correct」唔係同一件事。

呢種bug會令你質疑自己係咪應該build software。因為佢唔係typo。唔係missed edge case。佢係architectural naivety。

我犯咗classic mistake:我assume org_id filtering就夠做multi-tenant isolation。佢唔夠。

呢個係發現true multi-tenant isolation需要嘅唔止加org_id到每個table——同埋48小時內33個migrations終於解決佢嘅故事。

---

> **Note**: 呢篇文嘅SQL examples用咗genericized schema同table names(tenant_bworkspace_entitiesentity_data)for security。Concepts無論你用咩naming conventions都一樣。

---

問題:唔係所有Tenants都一樣

以下係我最初build嘅:

```sql
-- Brand guidelines table (shared by SMEs and Agencies)
CREATE TABLE brand_guidelines (
  id UUID PRIMARY KEY,
  org_id UUID REFERENCES organizations(id),
  name TEXT,
  guidelines JSONB,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- RLS policy (seems safe)
CREATE POLICY brand_guidelines_org_isolation ON brand_guidelines
  FOR ALL TO authenticated
  USING (org_id = get_user_org_id());
```

呢個for SMEs完美work。每個organization有自己嘅brand guidelines。Row-Level Security確保佢哋睇唔到對方嘅data。(至少,我以為係咁。之後有更多。)

但Agencies唔同。

Agencies唔只有一套brand guidelines。佢哋有每個client一套

- Client A嘅brand guidelines(vibrant color palette、bold typography、innovation-focused messaging)

- Client B嘅brand guidelines(muted color palette、minimal design、quality-focused positioning)

同一個agency,唔同clients,完全唔同嘅brands。

Naive嘅solution(我最初build嘅):

```sql
-- Add entity_id to the shared table
ALTER TABLE brand_guidelines ADD COLUMN entity_id UUID;

-- Update RLS policy
CREATE POLICY brand_guidelines_isolation ON brand_guidelines
  FOR ALL TO authenticated
  USING (
    organization_id = get_user_org_id() AND
    (entity_id IS NULL OR entity_id = get_user_entity_id())
  );
```

問題:呢個create咗一個有兩個唔同data models嘅table:

```
SME row:    organization_id='org-123',  entity_id=NULL,      guidelines={...}
Agency row: organization_id='org-456',  entity_id='entity-a', guidelines={...}
Agency row: organization_id='org-456',  entity_id='entity-b', guidelines={...}
```

Queries變成有complex NULL handling嘅mess,每個feature都需要application code入面嘅「if SME, else Agency」logic。

係嘅,我寫咗全部呢啲先意識到佢哋係deeper problem嘅symptoms。幾個禮拜嘅work,全部指向同一個conclusion:我architect咗自己入一個corner。

每個feature都需要custom logic:「If SME, do this. If Agency, do that.」

更差嘅係,architecture做咗incorrect assumptions:

- Agencies寫entity_id=NULL會pollute SME data

- SMEs唔可以有sub-entities即使佢哋想要sub-accounts

- Schema變成有nullable columns嘅「Swiss cheese」

呢個唔係multi-tenant architecture。呢個係一個table試緊serve兩個唔同嘅data models

---

頓悟:唔同嘅Tenants需要唔同嘅Schemas

到10月底,我意識到真相:SMEs同Agencies唔share同一個data model

SME data model

```
organization → campaigns → agent_outputs
```

Agency data model

```
organization → workspace_entities → campaigns → agent_outputs
                   ↓
            entity_data (e.g., brand guidelines, personas)
```

Agencies有成層(workspace entities)係SMEs冇嘅。佢哋亦有entity-specific intelligence唔應該存在喺SME world。

Solution:Separate database schemas。

```sql
-- SME tables (public schema)
public.brand_guidelines
public.campaigns
public.outputs

-- Agency tables (tenant_b schema)
tenant_b.workspace_entities
tenant_b.entity_data  -- Includes brand guidelines, personas, etc.
tenant_b.campaigns
tenant_b.outputs
```

而家SMEs同Agencies有完全唔同嘅tables。冇shared schema。冇nullable entity_id pollution。冇「if SME, else Agency」logic。

---

點解呢個重要:Schema Routing嘅Business Case

喺dive入technical implementation之前,我哋講吓點解呢個architectural decision重要——唔止係「佢係cleaner code」。

Future-Proofing for Growth(可能)

Schema routing唔止係解決今日嘅problem。佢係keep doors open for我甚至predict唔到嘅opportunities。

我仲喺private alpha有15個users。我冇enterprise customers。我冇同GDPR律師傾過。但以下係schema routing 可以 enable嘅嘢如果STRAŦUM grow:

International Expansion:

- 如果我哋expand去EU:Separate schemas可以enable data residency(EU client data喺EU servers嘅eu_agency schema)

- Right to deletion變得更簡單:Query一個schema,唔係filter through mixed tables

- Audit trails:「Show me all Client X data」= 一個schema query

Compliance Conversations:

- 當有人eventually問「How do you guarantee data isolation?」

- 用org_id filtering:「We use Row-Level Security policies」(vague,難verify)

- 用schema routing:「Each client's data lives in a separate database schema」(concrete,auditable)

- 我唔知呢個幾時matter。但佢_可以_matter如果我哋有呢啲conversations。

Honest Truth:

我而家唔係為HIPAA或SOC 2 compliance build。我係為需要更好marketing strategy嘅SMEs同small agencies build。

但schema routing代表如果有人問「Can you handle healthcare clients?」或者「Do you support data residency?」嘅時候,答案係「yes, let me show you the architecture」而唔係「let me rebuild everything first。」

Downsides(講老實)

Schema routing唔係全部都好。以下係佢actual cost:

Development Complexity:

- 每個WRITE operation需要一個router function

- 每個READ operation需要一個security view

- Testing需要SME同Agency兩個paths

- 用Claude Code:2日intense work(Oct 27-29, 2025)喺晚上

- 冇AI tools:會係幾個禮拜

Migration Risk:

- 33個sequential migrations = 33個typo嘅機會

- 一個錯誤嘅ALTER TABLE = production data corruption

- 每個migration要喺staging run 3次先掂production

- Paranoia係real嘅

Query Performance Overhead:

- 有UNION ALL嘅Views = 稍微慢啲嘅reads

- Router functions = 多一個function call on writes

- RLS + views = 更complex嘅query plans

- (實際上:我暫時冇notice到slowdowns,但我亦只有15個alpha users)

Operational Complexity:

- Schema migrations而家affect 2+個schemas(public + agency)

- Database backups需要schema-aware restore

- Monitoring queries需要check multiple schemas

- 呢個eventually會bite我,只係我唔知幾時

點解我仲係做咗呢個Trade-Off

Option value可能好大。或者完全唔matter。

Schema routing keep住doors open,我甚至唔確定自己想唔想行過去:

- White-label partnerships:可以俾partner佢哋自己嘅schema,rebrand個UI

- Reseller opportunities:Agencies可以用provable data isolation resell

- 唔同pricing tiers:「Premium」customers可以有dedicated schemas

- Geographic expansion:EU schema、US schema、APAC schema——同一個codebase

問題係:我喺private alpha。我唔知呢啲會唔會matter。可能我永遠唔會收到white-label request。可能geographic expansion仲有幾年。可能成盤生意pivot咗全部都irrelevant。

但以下係我知嘅:有schema routing,呢啲options存在。有org_id filtering,大部分都需要complete rewrite。

呢個就係我嘅bet: 而家多花2日(用Claude Code)嚟keep options open。

呢個bet啱唔啱?一年後問我。

---

Architecture:Schema Routing

Pattern 1:Schema-Specific Tables

有啲tables只為一種tenant type存在:

```sql
-- Specialized tenant schema
CREATE SCHEMA tenant_b;

-- Workspace entities (specific to this tenant type)
CREATE TABLE tenant_b.workspace_entities (
  id UUID PRIMARY KEY,
  organization_id UUID,
  name TEXT,
  metadata JSONB
);

-- Entity-specific data
CREATE TABLE tenant_b.entity_data (
  id UUID PRIMARY KEY,
  organization_id UUID,
  entity_id UUID REFERENCES tenant_b.workspace_entities(id),
  data_type TEXT,
  content JSONB
);
```

SMEs永遠唔會掂呢啲tables。佢哋喺public schema入面唔存在。

Pattern 2:Database Router Functions

你點寫入correct schema?**Router functions**

以下係concept(simplified):

```sql
CREATE FUNCTION save_resource_routed(params)
RETURNS JSONB
LANGUAGE plpgsql
SECURITY DEFINER  -- Run with elevated privileges
AS $$
BEGIN
  -- Step 1: Detect organization type
  SELECT type INTO org_type FROM organizations WHERE id = p_org_id;

  -- Step 2: Route to correct schema based on type
  IF org_type = 'TENANT_B' THEN
    INSERT INTO tenant_b.entity_data (...) VALUES (...);
  ELSE
    INSERT INTO public.brand_guidelines (...) VALUES (...);
  END IF;

  RETURN result;
END;
$$;
```

點運作

1. Detect org type:Query organizations table嚟determine tenant type

2. Route to correct schema:根據type寫入appropriate schema

3. Return result:包括用咗邊個schema做debugging

Application code(所有tenant types都一樣):

```typescript
// Just call the router function - no tenant-specific logic
const result = await supabase.rpc('save_resource_routed', {
  p_org_id: orgId,
  p_entity_id: entityId,  // null for simple tenants
  p_data: { ... }
});
```

Application code入面冇if/else。Database做routing。

寫我第一個router function用咗4個鐘。Debug點解佢唔work?又6個鐘。問題?我忘記grant EXECUTE permissions。Classic solo founder energy:architectural brilliance,permission oversights。:P

Pattern 3:Security-Invoker Views for Reads

Writing用router functions。Reading用****views****。

```sql
-- Unified view combining both schemas
CREATE VIEW resources_unified
WITH (security_invoker = on)  -- Respects RLS policies
AS
  SELECT id, organization_id, NULL AS entity_id, data, 'public' AS source
  FROM public.brand_guidelines
UNION ALL
  SELECT id, organization_id, entity_id, content AS data, 'tenant_b' AS source
  FROM tenant_b.entity_data
  WHERE data_type = 'brand_guidelines';
```

Application code(unified reads):

```typescript
// Read resources (works for all tenant types)
const { data } = await supabase
  .from('resources_unified')
  .select('*')
  .eq('organization_id', orgId);

// RLS policies filter correctly regardless of source schema
```

Key detailWITH (security_invoker = on)確保RLS policies被enforce。冇呢個,views會bypass RLS(security disaster)。

---

Migration:48小時內33個Migrations

加schema routing唔係一個migration。佢係一段journey。

你知咩好玩?連續寫33個database migrations同時知道如果有一個typo,你就會corrupt production data。其實「好玩」唔係正確嘅字。「恐怖」先啱。我每個migration喺staging run三次先掂production。

2025年10月27-29日:33個sequential migrations for complete schema routing。

Migration phases

1. Create specialized schema - Setup tenant_b schema有proper permissions

2. Create schema-specific tables - 喺新schema mirror necessary tables

3. Build router functions - 每種需要routing嘅resource type一個

4. Create security views - 有UNION ALL嘅unified views for reads

5. Update RLS policies - 確保兩個schemas都有proper isolation

6. Data migration - 將現有data搬去正確嘅schemas

7. Application updates - 由direct queries switch去router functions/views

Total effort:33個migrations,用Claude Code 2日,100% worth it。

---

結果:True Multi-Tenant Isolation

Before(Shared Tables有org_id)

Data model

```sql
public.brand_guidelines (organization_id, entity_id, guidelines)
```

問題

- ❌ 一種tenant type有nullable entity_id(data model confusion)

- ❌ 有NULL handling嘅complex queries

- ❌ Application logic:if (tenantTypeA) { ... } else { ... }

- ❌ Cross-contamination風險

After(Schema Routing)

Data model

```sql
public.brand_guidelines (organization_id, guidelines)    -- Tenant Type A
tenant_b.entity_data (organization_id, entity_id, data)  -- Tenant Type B
```

好處

- ✅ Clean data models(冇nullable foreign keys)

- ✅ 冇complex NULL handling嘅simple queries

- ✅ 冇application if/else(database handle routing)

- ✅ Cross-schema contamination唔可能(physically separate)

Security Improvements

Before:有nullable columns同shared tables嘅cross-contamination風險

After:Router functions根據organization type自動direct writes去correct schema。Physical schema separation令cross-contamination變得唔可能。

Isolation level:Database-enforced separation。唔係application-level checks。

---

幾時用Schema Routing vs org_id Filtering

唔係每個multi-tenant app都需要schema routing。用呢個decision tree:

org_id Filtering(Simpler)如果:

所有tenants有同一個data model(例如todos app)

冇hierarchical tenancy(organizations入面冇sub-entities)

Simple queriesWHERE org_id = X到處work)

B2C或small B2B(冇enterprise sales,冇compliance requirements)

MVP speed matters(幾個禮拜上市,唔係幾個月)

Business reasoning: 你喺度validate product-market fit,唔係為Fortune 500 compliance requirements build。Ship快,之後refactor如果你有enterprise traction。

例子 Project management tool,每個organization manage自己嘅projects。

```sql
CREATE TABLE projects (
  id UUID PRIMARY KEY,
  org_id UUID,  -- Simple filtering
  name TEXT
);
```

用Schema Routing(More Complex)如果:

唔同tenant types需要唔同data models(Type A vs Type B vs Type C)

Hierarchical tenancy(organizations → workspace_entities → sub-entities)

Enterprise sales喺roadmap上(Fortune 500、healthcare、finance、government)

需要regulatory compliance(GDPR、HIPAA、SOC 2、FedRAMP)

有white-label或reseller potential(partners需要complete data isolation)

計劃international expansion(data residency requirements)

Business reasoning: 如果你target enterprise customers,「data isolation」會變成security questionnaires上嘅checkbox。Schema routing令你可以自信咁answer。Row-level filtering令你要hedge。

例子:Platform入面有啲tenants有hierarchical workspace structures,fundamentally同direct customers唔同。

Cost/Benefit(我嘅經驗):

- Schema routing cost:用Claude Code 2日(冇AI assistance會係幾個禮拜)

- Potential upside:Cleaner architecture、compliance readiness、partnership options

- Actual upside:Unknown——我仲喺private alpha

- Break-even:如果schema routing開到一道我唔能夠行過嘅門,佢就pay for itself

真正嘅問題: 你optimize緊speed-to-market定係optionality?兩個都valid。我揀咗optionality。

---

Alternative Isolation Strategies

Schema routing唔係唯一嘅approach。以下係spectrum:

Level 1:Separate Databases(Highest Isolation)

```

database_tenant_1

database_tenant_2

database_tenant_3

```

Pros

- ✅ Complete physical isolation

- ✅ Per-tenant backups

- ✅ Independent scaling

- ✅ Regulatory compliance(data residency)

Cons

- ❌ High operational complexity(manage N databases)

- ❌ 貴(每個tenant一個database instance)

- ❌ Cross-tenant queries唔可能

- ❌ Schema migrations across所有databases

Use case:有regulatory requirements嘅enterprise SaaS,high-value customers($10k+/月)。

Level 2:Separate Schemas(Strong Isolation)

```
database
├── schema_tenant_1
├── schema_tenant_2
└── public (shared tables)
```

Pros

- ✅ Strong logical isolation

- ✅ Shared infrastructure(一個database)

- ✅ Schema-level permissions

- ✅ 每種tenant type唔同嘅data models

Cons

- ❌ 比row-level更complex

- ❌ 需要router functions

- ❌ Migration complexity(N個schemas)

Use case:有唔同customer tiers嘅B2B SaaS(STRAŦUM嘅approach)。

Level 3:Row-Level Filtering有RLS(Moderate Isolation)

```
database
└── public
    └── table (with org_id column)
```

Pros

- ✅ 簡單implement

- ✅ 容易migrations(一個schema)

- ✅ 可以做cross-tenant analytics

- ✅ Postgres RLS enforce isolation

Cons

- ❌ 所有tenants share同一個data model

- ❌ RLS performance overhead

- ❌ RLS misconfigurations嘅風險

Use case:有uniform data models嘅B2B SaaS(project management、CRM)。

---

Implementation Checklist:Schema Routing

如果你喺度implement schema routing,用呢個checklist:

Phase 1:Schema Design

- [ ] Create tenant type discriminator(organizations.type

- [ ] Design schema-specific tables(咩屬於邊度?)

- [ ] Create specialized schema:CREATE SCHEMA tenant_b;

- [ ] 喺specialized schema mirror necessary tables

- [ ] Document邊個tables住喺邊個schema

Phase 2:Router Functions

- [ ] 為每種resource type寫router function

- [ ] 用SECURITY DEFINER for elevated permissions

- [ ] Set search_path = public, tenant_b for multi-schema access

- [ ] Handle tenant type detection:SELECT type FROM organizations

- [ ] Return schema info for debugging

- [ ] Grant EXECUTE to authenticated role

Phase 3:Security Views

- [ ] Create unified views for reads(UNION ALL across schemas)

- [ ] 用WITH (security_invoker = on) for RLS enforcement

- [ ] 加source_schema column for debugging

- [ ] Test RLS policies work on views

- [ ] Grant SELECT to authenticated role

Phase 4:Application Integration

- [ ] Update writes用router functions:supabase.rpc('save_resource_routed', ...)

- [ ] Update reads用views:supabase.from('resource_unified').select()

- [ ] 移除application-level if/else logic

- [ ] Test tenant type A flow(writes to public

- [ ] Test tenant type B flow(writes to tenant_b

- [ ] Verify data isolation(Entity A ≠ Entity B)

Phase 5:Migration & Testing

- [ ] 寫migration scripts(30+個for full coverage)

- [ ] 用real-like data喺staging test

- [ ] Run security audit(cross-schema leakage?)

- [ ] Load test(RLS + views performance)

- [ ] Monitor in production(slow queries?)

---

學到嘅嘢

1. org_id係Necessary,唔係Sufficient

org_id到每個table俾你row-level filtering。但如果唔同tenant types需要唔同data models,你需要schema routing。

Learning: 「Multi-tenant」唔係binary。有唔同levels嘅isolation。我揀咗比我目前15個users strictly需要嘅更高level。時間會話俾我知呢個係smart定只係extra work。

2. Application Logic → Database Logic

你application入面每個if (tenantType === 'TYPE_B')都係code smell。用router functions將tenant-aware logic搬去database。

Learning: Database functions更難寫但potentially更容易audit。如果我有一日有enterprise customers問「prove your data isolation」,我可以指住stored procedures。但而家,呢個係hypothetical。

3. Views + RLS = Unified Reads

由multiple schemas read係complex。Views + security_invoker = on俾你unified reads有proper isolation。

Learning: Views create abstraction layers可能有一日令compliance更容易。或者佢哋只係加咗我唔需要嘅complexity。We'll see。

4. Security Definer Functions好Powerful

SECURITY DEFINER令functions可以用elevated privileges run同時仲respect RLS policies。對router functions嚟講係essential嘅。

5. Migrations係Worth It(大概)

33個migrations for schema routing感覺好多。但結果?Clean architecture、true isolation、同零cross-tenant bugs。

Learning: Technical debt有price。我揀咗喺有15個alpha users嘅時候早啲pay,而唔係等到有paying customers。呢個係唔係right call?我有100個customers要worry嘅時候就會知。

6. Architectural Flaws比Code Bugs更傷

喺11:47 PM搵到null pointer exception?煩。搵到你成個multi-tenant architecture fundamentally broken?呢種discovery令你夜晚瞓唔著。

但以下係我學到嘅:architectural mistakes係fixable嘅。佢哋expensive,係。Time-consuming,絕對係。但我由「一種tenant type可以accidentally contaminate另一種嘅data」去到「database-enforced isolation唔可能bypass」。

早啲fix,fix啱佢,你會瞓得更好。

7. Architecture可能係Strategy(或者可能只係Over-Engineering)

Schema routing嘅決定唔止係「clean code」。佢係keep future options open——white-label partnerships、enterprise sales、international expansion。

但honest truth係:我喺private alpha有15個users。我冇enterprise customers knocking。我冇收到一個GDPR question。我想像中嘅partnerships可能永遠唔會materialize。

我以為自己喺度做technical decision。可能我係喺度做business strategy decision。或者可能我只係因為覺得database architecture有趣而over-engineering。:)

你有冇試過發現一個唔係bug而係design mistake嘅architectural flaw?你點handle個rebuild——incremental fix定係好似我咁rip the whole thing out?

祝好,

Chandler

STRAŦUM architecture series: 呢個係multi-tenancy journey嘅Part 2。佢由第2日build multi-tenancy開始。Schema rebuild之後,我發現31個blank screens因為lost navigation context同我嘅database係correct但296x太慢

---

Build緊有complex isolation需求嘅multi-tenant SaaS? STRAŦUM用schema routing嚟serve唔同tenant types有true data isolation。喺https://stratum.chandlernguyen.com/request-invitation申請alpha access

---

仲喺學緊「multi-tenant」有好多levels嘅isolation。仲喺半夜debug RLS policies。仲喺質疑我Day 2嘅architecture decisions(但而家少咗)。更多database adventures喺 https://www.chandlernguyen.com/

---

繼續閱讀

我嘅旅程
聯繫
語言
偏好設定