मैंने Day 2 पर Multi-Tenancy बनाया। Day 67 पर, मैंने इसे Rebuild किया

27 अक्टूबर, 2025, रात 11:47 बजे। मैं STRAŦUM पर एक routine security audit चला रहा था जो मुझे लगा routine होगा। सब कुछ हफ्तों से ठीक चल रहा था — SMEs का अपना data, agencies का अपना, multi-tenancy solid।

कॉफी ठंडी हो चुकी थी। Audit script logs चबा रही थी। फिर मैंने देखा: Agencies SME tables में write कर रही थीं।

किसी bug से नहीं। किसी security hole से नहीं। Architecture से ही।

दो महीने पहले, मैंने Day 2 से multi-tenant architecture बनाने का फैसला किया था। एक working AI agent के साथ solo founder के लिए bold move। मैंने हर table में `org_id` जोड़ा, RLS policies लिखीं, SMEs और Agencies के लिए अलग routing बनाया। सब काम कर रहा था — SMEs के अपने campaigns, agencies के अपने clients, data सही जगह जा रहा था।

या मुझे ऐसा लगा।

मैं शायद 20 मिनट तक schema को बस देखता रहा। यह कैसे छूट गया? मैंने हफ्ते multi-tenant architecture बनाने, 83 RLS policies लिखने, SME और Agency दोनों accounts से test करने में बिताए। सब कुछ *काम* कर रहा था। लेकिन "काम करना" और "सही होना" एक बात नहीं है।

यह उस तरह का bug है जो आपको सोचने पर मजबूर करता है कि क्या आपको software बनाना भी चाहिए। क्योंकि यह typo नहीं है। Missed edge case नहीं है। यह architectural naivety है।

मैंने classic गलती की थी: मैंने मान लिया कि `org_id` filtering multi-tenant isolation के लिए काफी है। नहीं थी।

यह कहानी है उस खोज की कि true multi-tenant isolation के लिए हर table में `org_id` जोड़ने से ज़्यादा चाहिए — और 48 घंटों में 33 migrations की जिन्होंने आखिरकार इसे हल किया।

---

> **Note**: इस post में SQL examples genericized schema और table names (`tenant_b`, `workspace_entities`, `entity_data`) इस्तेमाल करते हैं security के लिए। Concepts वही रहते हैं आपकी specific naming conventions से फर्क नहीं पड़ता।

---

समस्या: सभी Tenants बराबर नहीं हैं

मैंने शुरू में यह बनाया था:

```sql
-- Brand guidelines table (shared by SMEs and Agencies)
CREATE TABLE brand_guidelines (
  id UUID PRIMARY KEY,
  org_id UUID REFERENCES organizations(id),
  name TEXT,
  guidelines JSONB,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- RLS policy (seems safe)
CREATE POLICY brand_guidelines_org_isolation ON brand_guidelines
  FOR ALL TO authenticated
  USING (org_id = get_user_org_id());
```

यह SMEs के लिए बिल्कुल ठीक काम करता है। हर organization की अपनी brand guidelines। Row-Level Security ensure करती है कि वे एक-दूसरे का data न देख सकें।

लेकिन Agencies अलग हैं।

Agencies के पास सिर्फ एक set brand guidelines नहीं होतीं। उनके पास हर client के लिए एक होती है:

- Client A की brand guidelines (vibrant color palette, bold typography, innovation-focused messaging)

- Client B की brand guidelines (muted color palette, minimal design, quality-focused positioning)

एक ही agency, अलग-अलग clients, बिल्कुल अलग brands।

Naive solution (जो मैंने पहले बनाया):

```sql
-- Add entity_id to the shared table
ALTER TABLE brand_guidelines ADD COLUMN entity_id UUID;

-- Update RLS policy
CREATE POLICY brand_guidelines_isolation ON brand_guidelines
  FOR ALL TO authenticated
  USING (
    organization_id = get_user_org_id() AND
    (entity_id IS NULL OR entity_id = get_user_entity_id())
  );
```

समस्या: इसने दो अलग data models वाला एक table बना दिया:

```
SME row:    organization_id='org-123',  entity_id=NULL,      guidelines={...}
Agency row: organization_id='org-456',  entity_id='entity-a', guidelines={...}
Agency row: organization_id='org-456',  entity_id='entity-b', guidelines={...}
```

Queries complex `NULL` handling के साथ गड़बड़ हो गईं, और हर feature को application code में "if SME, else Agency" logic चाहिए था।

हर feature को custom logic चाहिए थी: "If SME, तो यह करो। If Agency, तो वो करो।"

इससे भी बुरा, architecture ने गलत assumptions बनाए:

- Agencies `entity_id=NULL` लिखने पर SME data pollute करतीं

- SMEs के पास sub-entities नहीं हो सकते भले ही sub-accounts चाहें

- Schema "Swiss cheese" बन गया nullable columns के साथ

यह multi-tenant architecture नहीं था। यह एक single table था जो दो अलग data models serve करने की कोशिश कर रहा था।

---

खुलासा: अलग-अलग Tenants को अलग-अलग Schemas चाहिए

अक्टूबर के अंत तक, मुझे सच्चाई समझ आई: SMEs और Agencies एक ही data model share नहीं करते।

SME data model:

```
organization → campaigns → agent_outputs
```

Agency data model:

```
organization → workspace_entities → campaigns → agent_outputs
                   ↓
            entity_data (e.g., brand guidelines, personas)
```

Agencies के पास एक पूरी layer (workspace entities) है जो SMEs के पास नहीं। उनके पास entity-specific intelligence भी है जो SME world में exist नहीं होनी चाहिए।

Solution: अलग database schemas।

```sql
-- SME tables (public schema)
public.brand_guidelines
public.campaigns
public.outputs

-- Agency tables (tenant_b schema)
tenant_b.workspace_entities
tenant_b.entity_data  -- Includes brand guidelines, personas, etc.
tenant_b.campaigns
tenant_b.outputs
```

अब SMEs और Agencies के पास बिल्कुल अलग tables हैं। कोई shared schema नहीं। कोई nullable `entity_id` pollution नहीं। कोई "if SME, else Agency" logic नहीं।

---

यह क्यों मायने रखता है: Schema Routing का Business Case

Technical implementation में जाने से पहले, बात करते हैं कि यह architectural decision "cleaner code" से आगे क्यों मायने रखता है।

Growth के लिए Future-Proofing (शायद)

Schema routing सिर्फ आज की समस्या हल करने के बारे में नहीं है। यह ऐसे opportunities के लिए दरवाज़े खुले रखने के बारे में है जिनकी मैं predict भी नहीं कर सकता।

मैं अभी 15 users के साथ private alpha में हूँ। मेरे पास enterprise customers नहीं हैं। मैंने GDPR lawyer से बात नहीं की। लेकिन schema routing अगर STRAŦUM बढ़ता है तो *क्या* enable कर सकता है:

International Expansion:

- EU expand करें तो: अलग schemas data residency enable कर सकते हैं

- Right to deletion सरल हो जाता है: एक schema query करो, mixed tables filter मत करो

- Audit trails: "मुझे Client X का सारा data दिखाओ" = एक schema query

Compliance Conversations:

- जब कोई आखिरकार पूछे "आप data isolation कैसे guarantee करते हैं?"

- `org_id` filtering से: "हम Row-Level Security policies इस्तेमाल करते हैं" (vague, verify करना मुश्किल)

- Schema routing से: "हर client का data अलग database schema में रहता है" (concrete, auditable)

सच्ची बात:

मैं अभी HIPAA या SOC 2 compliance के लिए नहीं बना रहा। मैं SMEs और छोटी agencies के लिए बना रहा हूँ जिन्हें बेहतर marketing strategy चाहिए।

लेकिन schema routing का मतलब है कि अगर कोई कभी पूछे "क्या आप healthcare clients handle कर सकते हैं?" या "क्या data residency support करते हैं?" तो जवाब है "हाँ, architecture दिखाता हूँ" बजाय "पहले सब rebuild करना पड़ेगा।"

Downsides (ईमानदारी से)

Schema routing सब upside नहीं है। यह वास्तव में क्या cost करता है:

Development Complexity:

- हर WRITE operation को router function चाहिए

- हर READ operation को security view चाहिए

- Testing को SME और Agency दोनों paths चाहिए

- Claude Code के साथ: 2 दिन intense work (27-29 Oct, 2025) शामों में

- AI tools के बिना: हफ्ते लगते

Migration Risk:

- 33 sequential migrations = 33 typos के मौके

- एक गलत `ALTER TABLE` = production data corruption

- हर migration staging पर 3X चलानी पड़ी production छूने से पहले

- Paranoia real था

Query Performance Overhead:

- `UNION ALL` वाली Views = थोड़े धीमे reads

- Router functions = writes पर extra function call

- RLS + views = ज़्यादा complex query plans

- (Practice में: मुझे अभी तक slowdowns notice नहीं हुए, लेकिन मेरे पास सिर्फ 15 alpha users हैं)

फिर भी मैंने यह Trade-Off क्यों किया

Option value बहुत बड़ी हो सकती है। या शायद मायने ही न रखे।

Schema routing ऐसे दरवाज़े खुले रखता है जिनसे मुझे यकीन भी नहीं कि गुज़रना चाहता हूँ:

- White-label partnerships: Partner को अपना schema दे सकते हैं, UI rebrand कर सकते हैं

- Reseller opportunities: Agencies provable data isolation के साथ resell कर सकती हैं

- अलग pricing tiers: "Premium" customers को dedicated schemas मिल सकते हैं

- Geographic expansion: EU schema, US schema, APAC schema - same codebase

यह बात है: मैं private alpha में हूँ। मुझे नहीं पता इनमें से कुछ भी matter करेगा या नहीं।

वो bet जो मैंने लगाई: अभी 2 extra दिन (Claude Code के साथ) खर्च करो ताकि बाद में options खुले रहें।

सही bet है? एक साल बाद पूछिए।

---

Architecture: Schema Routing

Pattern 1: Schema-Specific Tables

कुछ tables सिर्फ एक tenant type के लिए exist करती हैं:

```sql
-- Specialized tenant schema
CREATE SCHEMA tenant_b;

-- Workspace entities (specific to this tenant type)
CREATE TABLE tenant_b.workspace_entities (
  id UUID PRIMARY KEY,
  organization_id UUID,
  name TEXT,
  metadata JSONB
);

-- Entity-specific data
CREATE TABLE tenant_b.entity_data (
  id UUID PRIMARY KEY,
  organization_id UUID,
  entity_id UUID REFERENCES tenant_b.workspace_entities(id),
  data_type TEXT,
  content JSONB
);
```

SMEs इन tables को कभी छूते नहीं। ये `public` schema में exist नहीं करतीं।

Pattern 2: Database Router Functions

सही schema में write कैसे करें? **Router functions**।

यहाँ concept है (simplified):

```sql
CREATE FUNCTION save_resource_routed(params)
RETURNS JSONB
LANGUAGE plpgsql
SECURITY DEFINER  -- Run with elevated privileges
AS $$
BEGIN
  -- Step 1: Detect organization type
  SELECT type INTO org_type FROM organizations WHERE id = p_org_id;

  -- Step 2: Route to correct schema based on type
  IF org_type = 'TENANT_B' THEN
    INSERT INTO tenant_b.entity_data (...) VALUES (...);
  ELSE
    INSERT INTO public.brand_guidelines (...) VALUES (...);
  END IF;

  RETURN result;
END;
$$;
```

Application code (सभी tenant types के लिए same):

```typescript
// Just call the router function - no tenant-specific logic
const result = await supabase.rpc('save_resource_routed', {
  p_org_id: orgId,
  p_entity_id: entityId,  // null for simple tenants
  p_data: { ... }
});
```

Application code में कोई if/else नहीं। Database routing करता है।

अपना पहला router function लिखने में 4 घंटे लगे। Debug करने में क्यों काम नहीं कर रहा? और 6 घंटे। समस्या? मैं EXECUTE permissions grant करना भूल गया था। Classic solo founder energy: architectural brilliance, permission oversights। :P

Pattern 3: Reads के लिए Security-Invoker Views

Writing router functions इस्तेमाल करती है। Reading **views** इस्तेमाल करती है।

```sql
-- Unified view combining both schemas
CREATE VIEW resources_unified
WITH (security_invoker = on)  -- Respects RLS policies
AS
  SELECT id, organization_id, NULL AS entity_id, data, 'public' AS source
  FROM public.brand_guidelines
UNION ALL
  SELECT id, organization_id, entity_id, content AS data, 'tenant_b' AS source
  FROM tenant_b.entity_data
  WHERE data_type = 'brand_guidelines';
```

Key detail: `WITH (security_invoker = on)` ensure करता है कि RLS policies enforce हों। इसके बिना, views RLS bypass करती हैं (security disaster)।

---

Migration: 48 घंटों में 33 Migrations

Schema routing जोड़ना एक single migration नहीं था। यह एक यात्रा थी।

जानते हैं क्या मज़ेदार है? लगातार 33 database migrations लिखना यह जानते हुए कि अगर एक में भी typo हुई, तो production data corrupt हो जाएगा। Actually, "मज़ेदार" सही शब्द नहीं। "खौफनाक" ज़्यादा सही है। मैंने हर migration staging पर तीन बार चलाई production छूने से पहले।

27-29 अक्टूबर, 2025: Complete schema routing के लिए 33 sequential migrations।

कुल effort: 33 migrations, Claude Code के साथ 2 दिन, 100% worth it।

---

Results: True Multi-Tenant Isolation

पहले (Shared Tables with org_id)

समस्याएं:

- ❌ एक tenant type के लिए Nullable `entity_id` (data model confusion)

- ❌ `NULL` handling के साथ complex queries

- ❌ Application logic: `if (tenantTypeA) { ... } else { ... }`

- ❌ Cross-contamination का risk

बाद में (Schema Routing)

फायदे:

- ✅ Clean data models (कोई nullable foreign keys नहीं)

- ✅ Complex `NULL` handling के बिना simple queries

- ✅ कोई application if/else नहीं (database routing handle करता है)

- ✅ Cross-schema contamination impossible (physically separate)

Security Improvements

पहले: Nullable columns और shared tables से cross-contamination का risk

बाद में: Router functions automatically organization type के basis पर writes सही schema में direct करते हैं। Physical schema separation cross-contamination impossible बनाती है।

Isolation level: Database-enforced separation। Application-level checks नहीं।

---

सीखे गए सबक

1. org_id ज़रूरी है, पर्याप्त नहीं

हर table में `org_id` जोड़ना row-level filtering देता है। लेकिन अगर अलग-अलग tenant types को अलग-अलग data models चाहिए, तो schema routing चाहिए।

2. Application Logic → Database Logic

आपके application में हर `if (tenantType === 'TYPE_B')` एक code smell है। Tenant-aware logic database में router functions से move करें।

3. Views + RLS = Unified Reads

Multiple schemas से read करना complex है। Views + `security_invoker = on` proper isolation के साथ unified reads देते हैं।

4. Architectural Flaws Code Bugs से ज़्यादा दर्द देती हैं

रात 11:47 बजे null pointer exception मिलना? Annoying। पता चलना कि आपका पूरा multi-tenant architecture fundamentally broken है? यह उस तरह की खोज है जो रात को जगाए रखती है।

लेकिन मैंने जो सीखा: architectural mistakes fixable हैं। Early fix करो, right fix करो, और बेहतर सोओगे।

5. Architecture शायद Strategy हो (या शायद बस Over-Engineering)

Schema routing का फैसला सिर्फ "clean code" के बारे में नहीं था। यह future options खुले रखने के बारे में था।

लेकिन सच्ची बात: मैं 15 users के साथ private alpha में हूँ। शायद मैं technical decision ले रहा था। शायद business strategy decision। या शायद बस over-engineering कर रहा था क्योंकि मुझे database architecture interesting लगती है। :)

क्या आपने कभी ऐसी architectural flaw खोजी जो bug नहीं बल्कि design mistake थी? Rebuild कैसे handle किया — incrementally fix किया या पूरी चीज़ उखाड़ दी जैसे मैंने की?

शुभकामनाओं सहित,

Chandler

STRAŦUM architecture series: यह multi-tenancy journey का part 2 है। शुरुआत हुई Day 2 पर multi-tenancy बनाने से। Schema rebuild के बाद, मुझे lost navigation context से 31 blank screens मिलीं और पता चला कि मेरा database correct लेकिन 296x too slow था।

---

*अभी भी सीख रहा हूँ कि "multi-tenant" के isolation के कई levels हैं। अभी भी आधी रात RLS policies debug कर रहा हूँ। अभी भी Day 2 architecture decisions पर सवाल कर रहा हूँ (लेकिन अब कम)। और database adventures https://www.chandlernguyen.com/ पर।

---