From learning to action
You have spent the previous eleven modules building a comprehensive understanding of AI data privacy: the exposure vectors, the classification framework, the regulatory landscape, the detection and redaction techniques, the gateway architecture, local inference, the pipeline, audit and compliance, vendor assessment, and organisational training.
This capstone converts that knowledge into action. Each exercise produces a concrete artifact — a document, a diagram, a policy, or a mapping — that you can take directly into your organisation and use immediately. Together, the five exercises produce your AI Data Privacy Blueprint: a one-page architecture document backed by detailed supporting materials.
These exercises are designed to be done with your actual organisation's data flows, tools, and regulatory requirements in front of you. Generic answers will not help. The value is in applying the frameworks to your specific situation.
Before starting the exercises, which statement best describes your current AI data privacy posture?
Exercise 1: Audit your current AI data flows
Map every place data leaves your environment for AI processing. Most organisations are surprised by how many AI data flows exist when they actually enumerate them.
Step 1: Identify all AI tools in use
Start with the approved tools:
- Corporate AI assistants (ChatGPT Enterprise, Claude for Work, internal tools)
- AI coding assistants (GitHub Copilot, Cursor, Tabnine)
- AI features embedded in existing tools (Microsoft 365 Copilot, Salesforce Einstein, HubSpot AI)
- AI-powered analytics (Tableau AI, PowerBI Copilot)
- AI search (Perplexity, Google AI Overviews in Workspace)
Then estimate the unapproved tools (shadow AI):
- Consumer AI chatbots (free ChatGPT, free Claude, Gemini)
- AI browser extensions
- AI features in personal productivity tools
- AI APIs accessed by developers on personal accounts
Step 2: For each tool, map the data flow
For every AI tool identified, document:
| Field | Description |
|---|---|
| Tool name | The specific AI tool or service |
| Provider | The vendor operating the AI service |
| Data input types | What types of data are sent to this tool (text, code, documents, images) |
| Data sensitivity range | What classification levels of data are sent (estimate for shadow AI) |
| PII types potentially included | Names, emails, phone numbers, financial data, health data, etc. |
| Data flow path | User device → [intermediaries] → AI provider infrastructure |
| Retention policy | How long does the provider retain the data (per their terms/DPA) |
| Training usage | Does the provider use data for model training (per their terms) |
| Geographic location | Where is the data processed/stored |
| DPA in place? | Yes/No |
| Privacy controls | What controls exist (gateway, DLP, manual classification) |
| Volume estimate | Approximate requests per day/week |
Step 3: Identify gaps
With the data flow map complete, identify:
- Which flows have no privacy controls?
- Which flows lack DPAs?
- Which flows involve data above the classification level the tool is approved for?
- Which flows involve cross-border transfers without legal mechanisms?
- Which flows are entirely invisible (shadow AI)?
Deliverable: A data flow inventory table covering all AI tools (approved and estimated shadow AI) with the fields above. Highlight gaps in red.
Your data flow audit reveals that Microsoft 365 Copilot processes data across your entire Microsoft 365 tenant, including SharePoint sites with Restricted-classification documents. What is the appropriate response?
Exercise 2: Classify your data for AI use
Apply the classification framework from Module 2 to your actual data estate. This exercise produces the classification map that drives every technical control in your pipeline.
Step 1: Enumerate your data repositories
List every system that contains data your employees might use with AI tools:
- Customer databases (CRM, support systems)
- Document management (SharePoint, Google Drive, Confluence)
- Code repositories (GitHub, GitLab, Bitbucket)
- Communication archives (email, Slack, Teams)
- Financial systems (ERP, accounting)
- HR systems (HRIS, performance management)
- Healthcare systems (EHR, claims)
- Legal systems (contract management, case management)
Step 2: Apply the classification decision tree
For each data repository, walk through the decision tree from Module 2:
- Is it public? → Level 1
- Does it contain PHI? → Level 4
- Does it contain PCI cardholder data? → Level 4
- Does it contain direct PII? → Check jurisdiction → Level 3 or 4
- Does it contain quasi-identifiers? → Level 3
- Is it tagged as trade secret/IP? → Level 4 or 5
- Is it under litigation hold or classified? → Level 5
- Otherwise → Level 2
Step 3: Document the classification
| Repository | Description | Classification Level | Key PII types | Applicable regulations | Approved AI processing |
|---|---|---|---|---|---|
| Salesforce CRM | Customer contacts and deals | Level 3 | Names, emails, phone numbers, company names | GDPR (EU customers), CCPA (CA customers) | Gateway + Cloud AI |
| GitHub (proprietary repos) | Product source code | Level 4 | None (IP risk, not PII risk) | Trade secret law | Local inference only |
| Workday HRIS | Employee records | Level 4 | Names, SSNs, salary, health info | GDPR, various employment regulations | Local inference only |
Deliverable: A classification table covering every data repository with its AI classification level, key PII types, applicable regulations, and approved AI processing method.
Exercise 3: Design your privacy architecture
Using the data flow audit (Exercise 1) and the classification map (Exercise 2), design the privacy architecture that enforces the classification decisions.
Choose your architecture pattern:
Based on your classification distribution:
If most data is Level 1-2 (primarily public and internal data): Lightweight architecture — gateway with regex-only detection for basic PII catching. Focus investment on policy and training rather than deep technical controls.
If data spans Level 2-4 (mix of internal, confidential, and restricted): Full gateway architecture — the three-layer detection pipeline (regex + NER + LLM for Level 4), pseudonymisation for Level 3, local inference for Level 4, direct cloud for Level 1-2.
If most data is Level 4-5 (primarily restricted or prohibited): Local-first architecture — local inference as the default, gateway only for tasks that genuinely require cloud model capabilities, with strict approval workflow for cloud routing.
Design decisions to document:
- Detection stack: Which layers (regex, NER, LLM) and which specific tools (Presidio, spaCy model version, Gemma 4 variant)?
- Redaction strategy per level: Simple redaction, typed redaction, or pseudonymisation? When is re-hydration needed?
- Routing rules: What classification levels route to local vs gateway vs cloud?
- Local inference deployment: What model, what hardware, what capacity?
- Cloud AI providers: Which providers for which use cases? What DPAs and retention policies?
- Audit logging: What metadata is captured, what is the retention period, who has access?
- Monitoring: What metrics are tracked, what triggers alerts?
The architecture diagram:
Draw a diagram showing:
- User-facing AI interfaces (chat, API, embedded tools)
- The gateway (detection, redaction, routing)
- Local inference endpoints (model, hardware)
- Cloud AI endpoints (providers, via gateway)
- Direct cloud endpoints (for Level 1-2, if applicable)
- Audit log collection
- Monitoring dashboard
Deliverable: An architecture diagram with a supporting document covering the seven design decisions above.
Your classification analysis shows: 20% Level 1-2, 50% Level 3, 25% Level 4, 5% Level 5. Which architecture pattern is most appropriate?
Exercise 4: Draft your AI acceptable use policy
Using the policy template from Module 11, draft an AI acceptable use policy tailored to your organisation.
Customisation points:
-
Approved tools: Replace the template placeholders with your actual approved tools, URLs, and access instructions. Be specific about which tools are approved for which purposes.
-
Data classification rules: Map your classification levels (from Exercise 2) to plain-language guidance. Employees should be able to determine the classification of their data within 30 seconds using the simplified decision tree.
-
Department-specific rules: If certain departments handle disproportionately sensitive data (legal, HR, R&D, clinical), add department-specific sections that address their unique requirements.
-
Request process: Define the process for requesting new AI tools or use cases. Include the expected response time, who reviews requests, and what information the requester should provide.
-
Enforcement and consequences: Define proportionate consequences. First-time accidental violations should result in re-training, not disciplinary action. Repeated or deliberate violations warrant escalation.
Review checklist:
Before finalising the policy, verify:
- Every approved tool is named with a URL and access instructions
- The data classification guidance is understandable by non-technical employees
- The prohibited actions list is specific (not "do not misuse AI tools")
- The policy explains why, not just what
- There is a clear process for reporting concerns without fear of punishment
- The policy has a review date (every 6 months)
- Legal, HR, and IT have reviewed and approved the policy
Deliverable: A complete AI acceptable use policy ready for organisational review and approval.
Exercise 5: Map your architecture to your regulatory requirements
The final exercise validates that your privacy architecture (Exercise 3) satisfies your regulatory requirements (identified during classification in Exercise 2).
For each applicable regulation, create a control mapping:
GDPR control mapping (if applicable):
| GDPR Requirement | Your Control | Evidence |
|---|---|---|
| Lawful basis (Art. 6) | Legitimate Interest Assessment for AI processing | LIA document |
| Data minimisation (Art. 5(1)(c)) | Gateway redacts PII not needed for the task | Detection and redaction audit logs |
| DPIA (Art. 35) | DPIA conducted for AI system | DPIA document (Module 9) |
| Cross-border transfers (Chapter V) | SCCs with cloud AI providers; local inference for highest-sensitivity data | DPAs with SCCs; architecture diagram showing local routing |
| Right to explanation (Art. 22) | AI decisions logged with rationale; human review available | Audit log schema; escalation process |
| Data subject rights (Arts. 15-22) | Deletion capability across AI systems and providers | Data deletion procedure |
HIPAA control mapping (if applicable):
| HIPAA Requirement | Your Control | Evidence |
|---|---|---|
| De-identification (Safe Harbor) | 18-identifier detection pipeline | Red-team test results showing recall per identifier type |
| BAA with processors | BAAs with cloud AI providers | Signed BAAs |
| Minimum necessary | Gateway redacts PHI not needed for the task | Redaction configuration per use case |
| Access controls | RBAC on AI tools; audit log access restricted | Access control documentation |
| Audit controls | Comprehensive audit logging | Audit log schema and sample logs |
| Breach notification | Incident response plan with 60-day notification | Incident response procedure |
SOC 2 control mapping (if applicable):
| Trust Service Criteria | Your Control | Evidence |
|---|---|---|
| CC6.1 (Logical Access) | RBAC on gateway and AI tools | Access control documentation |
| CC6.7 (Data Transmission) | TLS encryption; gateway sanitisation | Architecture diagram; TLS configuration |
| CC7.2 (System Monitoring) | Pipeline monitoring dashboard; alerting | Dashboard screenshots; alert rules |
| C1.1 (Confidentiality) | Data classification framework; gateway detection | Classification policy; detection results |
| P6.1 (Privacy Disclosure) | AI acceptable use policy; privacy notice | Policy document; notice text |
Deliverable: A regulatory control mapping table for each applicable regulation, showing the requirement, the control that addresses it, and the evidence that demonstrates the control.
The one-page AI Data Privacy Blueprint
You now have five supporting documents. Distil them into a single-page blueprint that any stakeholder — CISO, CEO, auditor, board member — can read in five minutes.
AI Data Privacy Blueprint
ORGANISATION: [Your organisation]
DATE: [Date]
OWNER: [CISO / DPO / AI Governance Lead]
REVIEW CYCLE: Every 6 months
━━━ DATA CLASSIFICATION ━━━
Level 1 (Public): [X]% of AI-processed data → Direct cloud AI
Level 2 (Internal): [X]% → Cloud AI with enterprise agreement
Level 3 (Confidential): [X]% → Gateway (detect + redact) → Cloud AI
Level 4 (Restricted): [X]% → Local inference only
Level 5 (Prohibited): [X]% → No AI processing
━━━ ARCHITECTURE ━━━
Gateway: [Tool/custom] with [detection stack]
Local AI: [Model] on [hardware] for Level 4 workloads
Cloud AI: [Providers] with [DPAs/BAAs] for Level 2-3 workloads
Audit: [Logging system] with [retention period]
━━━ KEY METRICS ━━━
Local processing ratio: [X]% (target: [X]%)
PII interception rate: [X]% (target: >95%)
Shadow AI rate: [X]% (target: under 10%)
Detection recall (red-team): [X]% (target: >99% for Level 4)
━━━ REGULATORY COMPLIANCE ━━━
GDPR: [Status] — DPIA completed [date], SCCs in place
HIPAA: [Status] — BAAs in place, Safe Harbor pipeline operational
CCPA: [Status] — Consumer rights process operational
SOC 2: [Status] — AI system in scope for next audit [date]
━━━ ORGANISATIONAL CONTROLS ━━━
Policy: AI Acceptable Use Policy v[X] — [date]
Training: [X]% completion rate, role-based
AI Champions: [X] across [X] departments
Review: Next scheduled review [date]This blueprint becomes the executive summary of your AI data privacy programme. It is backed by the five detailed deliverables from the exercises, any of which can be produced when an auditor, board member, or regulator asks for details.
You have completed all five exercises and produced the blueprint. What is the single most important next step?
Module 12 — Final Assessment
What is the primary purpose of the data flow audit in Exercise 1?
When classifying data repositories for AI use, which factor distinguishes Level 3 (Confidential) from Level 4 (Restricted)?
What is the purpose of the regulatory control mapping in Exercise 5?
The AI Data Privacy Blueprint is a one-page document. Why is brevity important for this artifact?