Capstone: Your AI Data Privacy Blueprint

From learning to action

You have spent the previous eleven modules building a comprehensive understanding of AI data privacy: the exposure vectors, the classification framework, the regulatory landscape, the detection and redaction techniques, the gateway architecture, local inference, the pipeline, audit and compliance, vendor assessment, and organisational training.

This capstone converts that knowledge into action. Each exercise produces a concrete artifact — a document, a diagram, a policy, or a mapping — that you can take directly into your organisation and use immediately. Together, the five exercises produce your AI Data Privacy Blueprint: a one-page architecture document backed by detailed supporting materials.

These exercises are designed to be done with your actual organisation's data flows, tools, and regulatory requirements in front of you. Generic answers will not help. The value is in applying the frameworks to your specific situation.

Before starting the exercises, which statement best describes your current AI data privacy posture?

Exercise 1: Audit your current AI data flows

Map every place data leaves your environment for AI processing. Most organisations are surprised by how many AI data flows exist when they actually enumerate them.

Step 1: Identify all AI tools in use

Start with the approved tools:

Corporate AI assistants (ChatGPT Enterprise, Claude for Work, internal tools)
AI coding assistants (GitHub Copilot, Cursor, Tabnine)
AI features embedded in existing tools (Microsoft 365 Copilot, Salesforce Einstein, HubSpot AI)
AI-powered analytics (Tableau AI, PowerBI Copilot)
AI search (Perplexity, Google AI Overviews in Workspace)

Then estimate the unapproved tools (shadow AI):

Consumer AI chatbots (free ChatGPT, free Claude, Gemini)
AI browser extensions
AI features in personal productivity tools
AI APIs accessed by developers on personal accounts

Step 2: For each tool, map the data flow

For every AI tool identified, document:

Field	Description
Tool name	The specific AI tool or service
Provider	The vendor operating the AI service
Data input types	What types of data are sent to this tool (text, code, documents, images)
Data sensitivity range	What classification levels of data are sent (estimate for shadow AI)
PII types potentially included	Names, emails, phone numbers, financial data, health data, etc.
Data flow path	User device → [intermediaries] → AI provider infrastructure
Retention policy	How long does the provider retain the data (per their terms/DPA)
Training usage	Does the provider use data for model training (per their terms)
Geographic location	Where is the data processed/stored
DPA in place?	Yes/No
Privacy controls	What controls exist (gateway, DLP, manual classification)
Volume estimate	Approximate requests per day/week

Step 3: Identify gaps

With the data flow map complete, identify:

Which flows have no privacy controls?
Which flows lack DPAs?
Which flows involve data above the classification level the tool is approved for?
Which flows involve cross-border transfers without legal mechanisms?
Which flows are entirely invisible (shadow AI)?

Deliverable: A data flow inventory table covering all AI tools (approved and estimated shadow AI) with the fields above. Highlight gaps in red.

Your data flow audit reveals that Microsoft 365 Copilot processes data across your entire Microsoft 365 tenant, including SharePoint sites with Restricted-classification documents. What is the appropriate response?

Exercise 2: Classify your data for AI use

Apply the classification framework from Module 2 to your actual data estate. This exercise produces the classification map that drives every technical control in your pipeline.

Step 1: Enumerate your data repositories

List every system that contains data your employees might use with AI tools:

Customer databases (CRM, support systems)
Document management (SharePoint, Google Drive, Confluence)
Code repositories (GitHub, GitLab, Bitbucket)
Communication archives (email, Slack, Teams)
Financial systems (ERP, accounting)
HR systems (HRIS, performance management)
Healthcare systems (EHR, claims)
Legal systems (contract management, case management)

Step 2: Apply the classification decision tree

For each data repository, walk through the decision tree from Module 2:

Is it public? → Level 1
Does it contain PHI? → Level 4
Does it contain PCI cardholder data? → Level 4
Does it contain direct PII? → Check jurisdiction → Level 3 or 4
Does it contain quasi-identifiers? → Level 3
Is it tagged as trade secret/IP? → Level 4 or 5
Is it under litigation hold or classified? → Level 5
Otherwise → Level 2

Step 3: Document the classification

Repository	Description	Classification Level	Key PII types	Applicable regulations	Approved AI processing
Salesforce CRM	Customer contacts and deals	Level 3	Names, emails, phone numbers, company names	GDPR (EU customers), CCPA (CA customers)	Gateway + Cloud AI
GitHub (proprietary repos)	Product source code	Level 4	None (IP risk, not PII risk)	Trade secret law	Local inference only
Workday HRIS	Employee records	Level 4	Names, SSNs, salary, health info	GDPR, various employment regulations	Local inference only

Deliverable: A classification table covering every data repository with its AI classification level, key PII types, applicable regulations, and approved AI processing method.

Exercise 3: Design your privacy architecture

Using the data flow audit (Exercise 1) and the classification map (Exercise 2), design the privacy architecture that enforces the classification decisions.

Choose your architecture pattern:

Based on your classification distribution:

If most data is Level 1-2 (primarily public and internal data): Lightweight architecture — gateway with regex-only detection for basic PII catching. Focus investment on policy and training rather than deep technical controls.

If data spans Level 2-4 (mix of internal, confidential, and restricted): Full gateway architecture — the three-layer detection pipeline (regex + NER + LLM for Level 4), pseudonymisation for Level 3, local inference for Level 4, direct cloud for Level 1-2.

If most data is Level 4-5 (primarily restricted or prohibited): Local-first architecture — local inference as the default, gateway only for tasks that genuinely require cloud model capabilities, with strict approval workflow for cloud routing.

Design decisions to document:

Detection stack: Which layers (regex, NER, LLM) and which specific tools (Presidio, spaCy model version, Gemma 4 variant)?
Redaction strategy per level: Simple redaction, typed redaction, or pseudonymisation? When is re-hydration needed?
Routing rules: What classification levels route to local vs gateway vs cloud?
Local inference deployment: What model, what hardware, what capacity?
Cloud AI providers: Which providers for which use cases? What DPAs and retention policies?
Audit logging: What metadata is captured, what is the retention period, who has access?
Monitoring: What metrics are tracked, what triggers alerts?

The architecture diagram:

Draw a diagram showing:

User-facing AI interfaces (chat, API, embedded tools)
The gateway (detection, redaction, routing)
Local inference endpoints (model, hardware)
Cloud AI endpoints (providers, via gateway)
Direct cloud endpoints (for Level 1-2, if applicable)
Audit log collection
Monitoring dashboard

Deliverable: An architecture diagram with a supporting document covering the seven design decisions above.

Your classification analysis shows: 20% Level 1-2, 50% Level 3, 25% Level 4, 5% Level 5. Which architecture pattern is most appropriate?

Exercise 4: Draft your AI acceptable use policy

Using the policy template from Module 11, draft an AI acceptable use policy tailored to your organisation.

Customisation points:

Approved tools: Replace the template placeholders with your actual approved tools, URLs, and access instructions. Be specific about which tools are approved for which purposes.
Data classification rules: Map your classification levels (from Exercise 2) to plain-language guidance. Employees should be able to determine the classification of their data within 30 seconds using the simplified decision tree.
Department-specific rules: If certain departments handle disproportionately sensitive data (legal, HR, R&D, clinical), add department-specific sections that address their unique requirements.
Request process: Define the process for requesting new AI tools or use cases. Include the expected response time, who reviews requests, and what information the requester should provide.
Enforcement and consequences: Define proportionate consequences. First-time accidental violations should result in re-training, not disciplinary action. Repeated or deliberate violations warrant escalation.

Review checklist:

Before finalising the policy, verify:

Every approved tool is named with a URL and access instructions
The data classification guidance is understandable by non-technical employees
The prohibited actions list is specific (not "do not misuse AI tools")
The policy explains why, not just what
There is a clear process for reporting concerns without fear of punishment
The policy has a review date (every 6 months)
Legal, HR, and IT have reviewed and approved the policy

Deliverable: A complete AI acceptable use policy ready for organisational review and approval.

Exercise 5: Map your architecture to your regulatory requirements

The final exercise validates that your privacy architecture (Exercise 3) satisfies your regulatory requirements (identified during classification in Exercise 2).

For each applicable regulation, create a control mapping:

GDPR control mapping (if applicable):

GDPR Requirement	Your Control	Evidence
Lawful basis (Art. 6)	Legitimate Interest Assessment for AI processing	LIA document
Data minimisation (Art. 5(1)(c))	Gateway redacts PII not needed for the task	Detection and redaction audit logs
DPIA (Art. 35)	DPIA conducted for AI system	DPIA document (Module 9)
Cross-border transfers (Chapter V)	SCCs with cloud AI providers; local inference for highest-sensitivity data	DPAs with SCCs; architecture diagram showing local routing
Right to explanation (Art. 22)	AI decisions logged with rationale; human review available	Audit log schema; escalation process
Data subject rights (Arts. 15-22)	Deletion capability across AI systems and providers	Data deletion procedure

HIPAA control mapping (if applicable):

HIPAA Requirement	Your Control	Evidence
De-identification (Safe Harbor)	18-identifier detection pipeline	Red-team test results showing recall per identifier type
BAA with processors	BAAs with cloud AI providers	Signed BAAs
Minimum necessary	Gateway redacts PHI not needed for the task	Redaction configuration per use case
Access controls	RBAC on AI tools; audit log access restricted	Access control documentation
Audit controls	Comprehensive audit logging	Audit log schema and sample logs
Breach notification	Incident response plan with 60-day notification	Incident response procedure

SOC 2 control mapping (if applicable):

Trust Service Criteria	Your Control	Evidence
CC6.1 (Logical Access)	RBAC on gateway and AI tools	Access control documentation
CC6.7 (Data Transmission)	TLS encryption; gateway sanitisation	Architecture diagram; TLS configuration
CC7.2 (System Monitoring)	Pipeline monitoring dashboard; alerting	Dashboard screenshots; alert rules
C1.1 (Confidentiality)	Data classification framework; gateway detection	Classification policy; detection results
P6.1 (Privacy Disclosure)	AI acceptable use policy; privacy notice	Policy document; notice text

Deliverable: A regulatory control mapping table for each applicable regulation, showing the requirement, the control that addresses it, and the evidence that demonstrates the control.

The one-page AI Data Privacy Blueprint

You now have five supporting documents. Distil them into a single-page blueprint that any stakeholder — CISO, CEO, auditor, board member — can read in five minutes.

AI Data Privacy Blueprint

ORGANISATION: [Your organisation]
DATE: [Date]
OWNER: [CISO / DPO / AI Governance Lead]
REVIEW CYCLE: Every 6 months

━━━ DATA CLASSIFICATION ━━━
Level 1 (Public):     [X]% of AI-processed data → Direct cloud AI
Level 2 (Internal):   [X]% → Cloud AI with enterprise agreement
Level 3 (Confidential): [X]% → Gateway (detect + redact) → Cloud AI
Level 4 (Restricted): [X]% → Local inference only
Level 5 (Prohibited): [X]% → No AI processing

━━━ ARCHITECTURE ━━━
Gateway:    [Tool/custom] with [detection stack]
Local AI:   [Model] on [hardware] for Level 4 workloads
Cloud AI:   [Providers] with [DPAs/BAAs] for Level 2-3 workloads
Audit:      [Logging system] with [retention period]

━━━ KEY METRICS ━━━
Local processing ratio:    [X]% (target: [X]%)
PII interception rate:     [X]% (target: >95%)
Shadow AI rate:            [X]% (target: under 10%)
Detection recall (red-team): [X]% (target: >99% for Level 4)

━━━ REGULATORY COMPLIANCE ━━━
GDPR: [Status] — DPIA completed [date], SCCs in place
HIPAA: [Status] — BAAs in place, Safe Harbor pipeline operational
CCPA: [Status] — Consumer rights process operational
SOC 2: [Status] — AI system in scope for next audit [date]

━━━ ORGANISATIONAL CONTROLS ━━━
Policy: AI Acceptable Use Policy v[X] — [date]
Training: [X]% completion rate, role-based
AI Champions: [X] across [X] departments
Review: Next scheduled review [date]

This blueprint becomes the executive summary of your AI data privacy programme. It is backed by the five detailed deliverables from the exercises, any of which can be produced when an auditor, board member, or regulator asks for details.

You have completed all five exercises and produced the blueprint. What is the single most important next step?

✎

Module 12 — Final Assessment

What is the primary purpose of the data flow audit in Exercise 1?

When classifying data repositories for AI use, which factor distinguishes Level 3 (Confidential) from Level 4 (Restricted)?

What is the purpose of the regulatory control mapping in Exercise 5?

The AI Data Privacy Blueprint is a one-page document. Why is brevity important for this artifact?