The single most impactful pattern
If you take one architecture from this entire course, this is the one. The gateway pattern is the reason enterprises that were blocked on AI adoption for years are now deploying AI across their organisations. It resolves the fundamental tension — AI needs data, privacy demands restriction — by processing the data locally before it ever leaves your environment.
The architecture is straightforward:
Raw Data → Local PII Detection → Redaction/Pseudonymisation → Cloud AI → Re-hydration → ResponseYour user types a prompt containing sensitive data. Before that prompt reaches any cloud AI service, a local gateway intercepts it, detects PII using the pipeline from Module 4, redacts or pseudonymises the PII using the techniques from Module 5, and then forwards the sanitised prompt to the cloud AI. When the response comes back, the gateway replaces pseudonymised tokens with the originals and delivers the complete response to the user.
The cloud AI never sees the sensitive data. Your audit log captures what was detected and how it was handled. Your users get the full power of cloud AI capabilities. Your compliance team gets the privacy guarantees they require.
An enterprise has been blocking AI adoption for 18 months due to data privacy concerns. The CISO wants zero sensitive data leaving the environment. The CEO wants AI-powered productivity gains. Who does the gateway pattern satisfy?
Building the gateway: component by component
The gateway has five core components. Each one is a distinct engineering concern.
Component 1: Request interceptor
The request interceptor captures AI requests before they leave your environment. Depending on your architecture, this can be:
- API proxy: Your application sends requests to an internal endpoint (e.g.,
https://ai-gateway.internal/v1/chat/completions) instead of directly to the AI provider. The gateway proxies to the actual provider after sanitisation. - SDK wrapper: A custom SDK that wraps the AI provider's client library, adding detection and redaction before the underlying HTTP call.
- Network-level proxy: A transparent proxy (e.g., Envoy, NGINX) that intercepts outbound requests to AI provider domains and routes them through the sanitisation pipeline.
The API proxy approach is the most common because it gives you full control over the request lifecycle and does not require network infrastructure changes.
Component 2: PII detection engine
This is the three-layer pipeline from Module 4: regex patterns, ML NER (spaCy/Presidio), and optionally a local LLM (Gemma 4) for contextual detection. The detection engine takes the raw prompt text and returns a list of PII entities with their positions, types, and confidence scores.
Component 3: Redaction/pseudonymisation engine
Based on the detected PII and the configured strategy (redaction, typed redaction, or pseudonymisation), this component transforms the prompt. For pseudonymisation, it generates consistent tokens and stores the mapping in session state.
Component 4: Provider forwarder
The sanitised prompt is forwarded to the cloud AI provider using their standard API. The gateway maintains the provider's expected request format — headers, authentication, model parameters — so the sanitised request is indistinguishable from a direct request except for the content.
Component 5: Response re-hydrator
When the AI response comes back, the re-hydrator replaces pseudonymised tokens with the originals. "Person_A reported the issue on Date_A" becomes "John Smith reported the issue on 15 March 2025." This step only applies when pseudonymisation was used — typed redaction and simple redaction do not have a re-hydration step because the originals were not preserved in the prompt.
The re-hydration mapping is stored in the gateway's session state (in-memory for low-latency requirements, or in a short-lived Redis cache for distributed gateways). The mapping is destroyed after the response is delivered — there is no persistent store of the PII-to-token associations.
Your gateway uses pseudonymisation. The cloud AI responds with 'Person_A should contact Person_B at their earliest convenience.' The re-hydration step replaces Person_A with 'Sarah Chen' and Person_B with 'James Wilson.' What should happen to the pseudonymisation mapping after re-hydration?
Re-hydration: when the AI generates new references
Re-hydration is straightforward when the AI response uses the same pseudonymised tokens from the prompt. But AI models generate text — they do not just echo your input. What happens when the model creates new references?
The pronoun problem. You send: "Person_A reported a billing error." The AI responds: "I recommend reaching out to them about the billing error and confirming their account details." The pronouns "them" and "their" refer to Person_A (Sarah Chen), but they are not pseudonymised tokens — they are natural language references. The re-hydrator does not need to replace these because they do not contain PII, but the user needs to understand who "them" refers to. As long as Person_A has been re-hydrated earlier in the response, the pronouns resolve naturally.
The hallucination risk. The AI might generate PII that was not in the prompt. If you ask "Draft a follow-up email to Person_A about the billing error," the AI might generate a plausible email address or phone number for Person_A — one that does not correspond to any real person. The re-hydrator needs to distinguish between pseudonymised tokens (which should be replaced) and model-generated content (which should not).
The practical approach: the re-hydrator maintains a strict token list. It only replaces exact matches of tokens it generated (Person_A, Date_A, Email_A, etc.). Any PII-like content in the response that is not a known token is passed through unchanged — it is generated by the model and is not real PII. However, a post-response PII scan can flag if the model appears to have generated realistic PII (a valid-looking email address, a phone number) that might confuse the user.
Context loss: when redaction removes information the model needs
This is the most significant challenge with the gateway pattern. Some AI tasks require the actual PII to produce useful results.
- Address-based analysis: "Analyse the geographical distribution of these customers" — redacted addresses make geographical analysis impossible.
- Name disambiguation: "Is this the same John Smith from the previous complaint?" — pseudonymisation preserves consistency within a session but cannot answer questions about identity.
- Compliance checking: "Does this contract correctly reference the parties?" — redacted party names make compliance checking infeasible.
For these use cases, the options are:
- Local inference: Process the data entirely locally where the model has access to the unredacted data. This is covered in Module 7.
- Partial redaction: Redact only PII that is not essential to the task. This requires task-aware redaction policies — more complex to implement but possible when the task type is known in advance.
- Accept the limitation: Some tasks genuinely cannot be done with redacted data and cloud AI. Document these as use cases that require local inference or human review.
A user sends a prompt through the gateway: 'Person_A lives at Address_A. What is the nearest hospital to their home?' The cloud AI cannot answer because the address has been pseudonymised. What is the best solution?
Gateway API: request flow and implementation
Here is a concrete implementation of the gateway as an API service. This example uses Python with FastAPI, but the architecture is framework-agnostic.
# Gateway API endpoint (simplified)
@app.post("/v1/chat/completions")
async def gateway_chat(request: ChatCompletionRequest):
# Step 1: Classify the data
classification = classify_data(request.messages)
# Step 2: Detect PII (layers depend on classification level)
detections = detect_pii(
text=extract_text(request.messages),
level=classification, # Level 3+ adds LLM layer
)
# Step 3: Pseudonymise (generate mapping)
sanitised_messages, mapping = pseudonymise(
messages=request.messages,
detections=detections,
)
# Step 4: Forward to cloud AI
response = await forward_to_provider(
provider=request.model_provider,
messages=sanitised_messages,
model=request.model,
)
# Step 5: Re-hydrate response
rehydrated_response = rehydrate(response, mapping)
# Step 6: Audit log (no PII in the log)
audit_log.record(
session_id=request.session_id,
entities_detected=len(detections),
entity_types=[d.type for d in detections],
classification_level=classification,
provider=request.model_provider,
timestamp=now(),
)
# Step 7: Destroy mapping
del mapping
return rehydrated_responseLatency budget:
For a typical request through the gateway:
- Data classification: ~1ms (rule-based lookup)
- PII detection (regex + NER): ~20-50ms
- PII detection (+ LLM, Level 3+ only): ~1-2s additional
- Pseudonymisation: ~1-5ms
- Cloud AI inference: 1-10s (provider-dependent)
- Re-hydration: ~1ms
- Audit logging: ~1-5ms (async)
Total gateway overhead: 25-60ms for Level 1-2 data, 1-2s for Level 3+ data. This is typically 2-10% of the total request time (dominated by cloud AI inference).
Streaming support:
Most cloud AI providers support streaming responses (Server-Sent Events). The gateway must handle streaming re-hydration — replacing pseudonymised tokens as they appear in the stream rather than waiting for the complete response. This is more complex because a token like "Person_A" might be split across two stream chunks ("Person_" and "A"). The re-hydrator needs a buffer that accumulates partial tokens until a complete match is found.
Multi-turn conversations:
In a multi-turn conversation, the pseudonymisation mapping must persist across turns. If "John Smith" is Person_A in turn 1, it must remain Person_A in turns 2, 3, and 4. The mapping is associated with the conversation session and persists until the session ends. For stateless gateway architectures, the mapping can be stored in a short-lived cache (Redis with TTL) keyed by session ID.
The tool landscape: from OSS to commercial
You do not have to build the gateway from scratch. Several open-source and commercial tools implement variations of this pattern.
Open-source:
- Microsoft Presidio (discussed in Modules 4-5) provides the detection and anonymisation engine. It does not provide the gateway proxy layer — you build that around Presidio.
- LiteLLM is an open-source proxy that normalises requests across 100+ LLM providers. While not a privacy gateway, it provides the proxy infrastructure (request interception, provider forwarding, streaming support) that you can extend with PII detection.
- Langchain / LlamaIndex both support middleware pipelines where you can insert PII detection and redaction steps before the LLM call. More of a library approach than a gateway approach.
Commercial:
- Nightfall AI provides a DLP platform with AI-specific capabilities: scanning prompts and responses for PII, PHI, PCI data, secrets, and custom patterns. Integrates with major AI providers and chat platforms. Pricing is per-scan.
- Private AI focuses specifically on PII detection and redaction for AI pipelines. Supports 50+ entity types across 52 languages. Provides both cloud and on-premises deployment.
- Protecto offers an AI privacy gateway with PII detection, tokenisation, and re-identification. Positions itself specifically as the gateway between your applications and cloud AI services.
- Skyflow provides a data privacy vault architecture with tokenisation. While not AI-specific, its tokenisation capabilities integrate with AI pipelines for the pseudonymisation-and-re-hydration flow.
- Portal26 focuses on enterprise AI governance, including data privacy monitoring for AI tool usage across the organisation.
Build vs buy decision:
Build the gateway if: you need deep customisation of detection rules, you have specific regulatory requirements that commercial tools do not cover out of the box, or your data classification is complex enough to require custom routing logic.
Buy a commercial tool if: you need to deploy quickly (weeks, not months), you want pre-built integrations with your existing DLP and SIEM tools, or you need vendor support for compliance audits.
A common hybrid approach: use Presidio as the open-source detection engine (customised with your domain-specific recognisers), LiteLLM as the proxy layer, and add a commercial monitoring tool (Nightfall, Portal26) for visibility and alerting.
Your enterprise uses Slack, Microsoft 365 Copilot, and a custom internal AI tool. You need PII protection across all three. What is the most practical approach?
Module 6 — Final Assessment
In the gateway pattern, what is the correct order of operations for a request containing PII?
What should happen to the pseudonymisation mapping (e.g., 'John Smith' = 'Person_A') after the gateway delivers the re-hydrated response to the user?
A gateway adds 40ms of latency for Level 2 data and 1.5 seconds for Level 4 data. Why is the Level 4 latency significantly higher?
A user asks the gateway-protected AI: 'What is the nearest hospital to Address_A?' after the gateway pseudonymised the address. The cloud AI cannot answer because it does not know the real address. What is the architecturally correct solution?