AI Data Privacy & PII Management

Building a Data Privacy Pipeline

End-to-end architecture from ingestion through classification, detection, redaction, routing, and audit — with detection stack choices, threshold configuration, testing, and performance benchmarks.

The end-to-end architecture

The previous modules gave you the individual components: classification (Module 2), detection (Module 4), redaction (Module 5), the gateway pattern (Module 6), and local inference (Module 7). This module wires them together into a production pipeline.

The pipeline has six stages:

Ingestion → Classification → Detection → Redaction → Routing → Audit

Stage 1: Ingestion captures the AI request from whatever source it originates — a chat interface, an API call, an automated workflow, a RAG system retrieving documents.

Stage 2: Classification determines the data sensitivity level using the framework from Module 2. This drives every downstream decision.

Stage 3: Detection identifies PII using the layered approach from Module 4, with detection depth proportional to the classification level.

Stage 4: Redaction transforms detected PII using the appropriate technique from Module 5 — simple redaction, typed redaction, pseudonymisation, or no redaction (for Level 1 data).

Stage 5: Routing directs the request to the appropriate AI endpoint — local inference, gateway-to-cloud, or direct cloud — based on the classification level and task complexity from Module 7.

Stage 6: Audit logs every decision made by the pipeline — what was detected, what was redacted, where the request was routed — without logging the actual PII.

Each stage must be independently configurable, testable, and monitorable. If detection accuracy degrades, you need to know. If routing latency increases, you need to know. If a new PII type emerges that your pipeline does not catch, you need to add a recogniser without redesigning the pipeline.

?

You are designing the pipeline for an organisation that processes both customer support tickets (medium sensitivity) and medical records (high sensitivity) through AI. What is the most important design principle?