The Regulatory Landscape for AI and Data

Regulations are not a reason to stop — they are a design constraint

The regulatory landscape for AI and data is frequently cited as "uncertain" or "evolving." This framing is often used to justify inaction. In reality, the core regulations are well-established, their requirements for AI data handling are largely clear, and the remaining uncertainties are at the margins, not the centre.

GDPR has been enforceable since 2018. HIPAA has been law since 1996. CCPA took effect in 2020. PCI DSS has been around since 2004. These are not new or ambiguous frameworks. What is relatively new is the application of these frameworks to AI systems, and even here, regulators have provided substantial guidance.

This module maps each major regulation to its specific requirements for AI data handling. The goal is precision: what does each regulation actually require, what does it not require, and where are the genuine grey areas? By the end, you should be able to assess your AI data handling against each applicable regulation without conflating one regulation's requirements with another's.

Which regulation are you most concerned about for your AI systems?

GDPR: the global baseline

The General Data Protection Regulation applies to any organisation that processes personal data of individuals in the European Economic Area (EEA), regardless of where the organisation is based. For AI systems, GDPR creates several specific requirements.

Lawful basis for processing (Article 6). Every use of personal data in an AI system requires a lawful basis. The six lawful bases are: consent, contractual necessity, legal obligation, vital interests, public task, and legitimate interest. For most enterprise AI use cases, legitimate interest (Article 6(1)(f)) is the practical basis, but it requires a Legitimate Interest Assessment (LIA) that balances the organisation's interest against the data subject's rights.

Consent is often assumed to be required but is frequently not the best basis for AI processing. Consent must be freely given, specific, informed, and revocable — and revocation means you must be able to stop processing that individual's data, which is technically challenging for data already used in model training.

Data Protection Impact Assessment (Article 35). A DPIA is required for processing that is "likely to result in a high risk to the rights and freedoms of natural persons." The GDPR explicitly lists automated decision-making and profiling as high-risk processing. If your AI system makes or informs decisions about individuals — credit scoring, hiring, insurance underwriting, medical diagnosis — a DPIA is almost certainly required.

A DPIA must describe: the processing operations and their purposes, the necessity and proportionality of the processing, the risks to data subjects' rights, and the measures to mitigate those risks. This is covered in detail in Module 9.

Data minimisation (Article 5(1)(c)). You may only process personal data that is "adequate, relevant and limited to what is necessary." For AI systems, this means you should not send more personal data to the model than is required for the specific task. Sending an entire customer record when you only need the customer's industry and company size violates data minimisation.

Cross-border transfers (Chapter V). Transferring personal data outside the EEA requires a legal mechanism: an adequacy decision (the destination country is deemed adequate by the European Commission), Standard Contractual Clauses (SCCs), Binding Corporate Rules, or a derogation. The EU-US Data Privacy Framework, adopted in July 2023, provides an adequacy mechanism for certified US organisations. For AI systems, the critical question is: where is the inference infrastructure located? If you send personal data to an AI provider whose inference runs on US servers, that is a cross-border transfer requiring a legal mechanism.

The right to explanation (Article 22 and Recital 71). Article 22 gives data subjects the right not to be subject to a decision based solely on automated processing that produces legal or similarly significant effects. When such processing occurs, the data subject has the right to obtain meaningful information about the logic involved. This does not require you to explain how a neural network works. It requires you to explain the factors considered, the data used, and the general logic of the decision process in terms the data subject can understand.

Your organisation uses a cloud AI service (hosted in the US by a certified EU-US Data Privacy Framework participant) to summarise EU customer feedback. The summaries inform product decisions but do not directly affect individual customers. Do you need a DPIA?

CCPA/CPRA: California's AI-relevant requirements

The California Consumer Privacy Act (CCPA), as amended by the California Privacy Rights Act (CPRA), creates requirements distinct from GDPR. Do not conflate them — the rights, obligations, and enforcement mechanisms are different.

Scope. CCPA applies to for-profit businesses that: (a) have gross annual revenue over $25 million, or (b) buy, sell, or share the personal information of 100,000+ California residents/households, or (c) derive 50%+ of revenue from selling or sharing California consumers' personal information.

Personal information definition. CCPA defines personal information broadly as "information that identifies, relates to, describes, is reasonably capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household." This is broader than GDPR's "personal data" in some respects — it explicitly includes household-level data, not just individual-level.

AI-specific provisions. The CPRA amendments created the California Privacy Protection Agency (CPPA) and directed it to issue regulations on automated decision-making technology (ADMT). The CPPA's proposed ADMT regulations (expected to be finalised in 2026) would:

Require businesses to provide consumers with pre-use notice before using ADMT to make significant decisions about them
Grant consumers the right to opt out of ADMT for significant decisions
Require businesses to provide access to the logic of ADMT that produces significant decisions
Define "significant decisions" to include decisions about employment, insurance, credit, housing, education, and access to essential services

Key differences from GDPR:

CCPA is opt-out, not opt-in. You can process personal information without consent, but consumers can opt out of sale/sharing.
CCPA does not require a lawful basis for processing. The restrictions are on sale, sharing, and specific uses, not on processing generally.
CCPA does not have an equivalent to GDPR's DPIA requirement, though the proposed ADMT regulations would create impact assessment requirements for automated decision-making.
CCPA's enforcement is through the California Attorney General and the CPPA, not a network of supervisory authorities.

For AI data handling, the practical implications of CCPA are: ensure consumers can opt out of having their data used in AI processing that constitutes "sale" or "sharing," provide transparency about how AI is used to make significant decisions, and honour deletion requests that would require removing data from AI training sets or retrieval databases.

HIPAA: the two paths to de-identification

HIPAA's requirements for AI are concentrated in the de-identification standard: how do you strip PHI from data so it can be processed without violating the Privacy Rule? There are exactly two methods.

Method 1: Safe Harbor (45 CFR 164.514(b))

Remove all 18 identifier categories listed in Module 2. After removal, you must also have no actual knowledge that the remaining information could identify an individual. This is the method most organisations use because it is deterministic — you either removed all 18 categories or you did not.

For AI systems, Safe Harbor is the foundation of a PHI de-identification pipeline. Your detection system must catch all 18 identifier types with extremely high recall (missed identifiers = HIPAA violation), and you must validate that the de-identified dataset does not enable re-identification.

Method 2: Expert Determination (45 CFR 164.514(b)(1))

A qualified statistical or scientific expert determines that the risk of identifying any individual in the dataset is "very small." The expert must document the methods and results. This method allows more data to remain in the dataset — you might retain partial dates, partial geographic data, or other quasi-identifiers if the expert determines that re-identification risk is sufficiently low.

Expert Determination is more flexible but more expensive and less scalable. It requires a qualified expert (typically a statistician or data scientist with privacy expertise) to analyse each dataset. For one-off research datasets, it may be appropriate. For a pipeline processing data continuously, Safe Harbor with automated detection is more practical.

Business Associate Agreements. If a cloud AI provider processes PHI on your behalf, they are a Business Associate under HIPAA and must sign a BAA. As of early 2026, the major cloud AI providers that offer BAAs include:

Microsoft (Azure OpenAI) — BAA available through the Azure BAA programme
Google (Vertex AI) — BAA available through Google Cloud's HIPAA compliance programme
Amazon (AWS Bedrock) — BAA available through the AWS BAA programme
OpenAI — BAA available for ChatGPT Enterprise with specific configurations
Anthropic — BAA available for Claude for Work Enterprise plans with specific configurations

A BAA does not authorise you to send raw PHI without controls. It establishes the legal framework for the provider to process PHI in compliance with HIPAA. You still need technical safeguards (encryption, access controls, audit logging) and should minimise the PHI sent to what is necessary for the task.

A hospital wants to use a cloud AI service to analyse patient readmission patterns. The dataset contains patient ages, admission dates, discharge dates, diagnosis codes, and treating hospital name. Which de-identification path is most appropriate?

EU AI Act: risk classification for data handling

The EU AI Act, which entered into force in August 2024 with phased compliance deadlines through 2027, introduces a risk-based classification system for AI that has direct implications for data handling.

Risk levels and data requirements:

Unacceptable risk (banned). AI systems for social scoring, real-time biometric identification in public spaces (with limited exceptions), and manipulation through subliminal techniques. If your AI system falls here, data handling is moot — the system is prohibited.

High risk (strict requirements). AI systems used in: biometric identification, critical infrastructure, education and vocational training, employment and worker management, essential services (credit, insurance), law enforcement, migration and border control, and administration of justice. High-risk systems must meet requirements including:

Data governance: training, validation, and testing datasets must be relevant, representative, free of errors, and complete
Documentation: detailed technical documentation including data sources, design choices, and validation methodology
Record-keeping: automatic logging of system operation
Transparency: clear information to deployers about capabilities and limitations
Human oversight: designed to allow effective human oversight

Limited risk (transparency obligations). AI systems that interact with people (chatbots), generate synthetic content (deepfakes), or are used for emotion recognition or biometric categorisation. The primary requirement is transparency — users must be informed that they are interacting with an AI system.

Minimal risk (no specific requirements). The majority of AI systems. Spam filters, AI-powered search, content recommendation. No specific data handling requirements beyond existing regulations.

For enterprise AI data privacy, the EU AI Act's main impact is the data governance requirement for high-risk systems. If your AI system is classified as high-risk, you must demonstrate that the data used for training, validation, and testing meets quality standards, and you must document your data practices in technical documentation. This does not replace GDPR — it adds to it.

ITAR and EAR: when AI processing requires export licences

The International Traffic in Arms Regulations (ITAR) and Export Administration Regulations (EAR) apply when AI systems process controlled technical data.

ITAR governs defence articles and services. Technical data related to items on the US Munitions List (USML) is controlled. Sending ITAR-controlled data to a cloud AI service constitutes an export if the service is hosted outside the US or if non-US persons can access the data. This means: no foreign-hosted cloud AI for ITAR data, no AI providers that use non-US persons for data handling or model training, and strict access controls even within US-hosted environments.

EAR governs dual-use items and technology. The Commerce Control List (CCL) identifies controlled technologies. AI processing of EAR-controlled technical data may constitute a "deemed export" if non-US persons access the data, even within the United States.

For organisations handling ITAR or EAR data, the practical guidance is: use only US-hosted, US-person-operated AI infrastructure. Cloud AI providers must be able to demonstrate that no non-US persons have access to the processing environment. In many cases, local inference on air-gapped infrastructure is the only compliant option.

SOC 2 and FedRAMP: audit frameworks for AI

SOC 2: what auditors look for in AI systems

SOC 2 is not a regulation — it is an audit framework developed by the AICPA. But most enterprise customers require SOC 2 Type II reports from their vendors, and AI systems are increasingly within audit scope.

SOC 2 evaluates controls across five Trust Service Criteria: Security, Availability, Processing Integrity, Confidentiality, and Privacy. For AI systems, auditors focus on:

Security: How is access to AI systems controlled? Are API keys managed securely? Is data encrypted in transit and at rest? Are there network controls separating AI processing environments?

Confidentiality: How is confidential data identified and protected within AI workflows? Are there controls preventing data leakage between tenants in multi-tenant AI systems? What are the data retention and destruction policies?

Processing Integrity: Are AI outputs accurate and complete? Are there controls to detect and correct AI errors? Is there monitoring for model drift or degradation?

Privacy: How is personal data collected, used, retained, disclosed, and disposed of within AI systems? Are privacy notices accurate? Are data subject access requests handled correctly?

The key difference between SOC 2 Type I and Type II: Type I evaluates the design of controls at a point in time. Type II evaluates the operating effectiveness of controls over a period (typically 6-12 months). For AI data privacy, Type II is what matters — it demonstrates that your controls actually work in practice, not just that they exist on paper.

FedRAMP: AI in government contexts

The Federal Risk and Authorization Management Program (FedRAMP) provides a standardised approach to security assessment for cloud services used by US federal agencies. If your AI system processes federal data, FedRAMP authorisation is likely required.

FedRAMP impact levels:

Low: Loss of confidentiality, integrity, or availability would have limited adverse effect
Moderate: Serious adverse effect (most government data falls here)
High: Severe or catastrophic adverse effect (law enforcement, healthcare, financial regulatory data)

For AI systems, FedRAMP adds requirements around: data location (must be within authorised boundaries), personnel security (background investigations for personnel with data access), incident response (federal agency notification requirements), and continuous monitoring (ongoing security assessment, not just point-in-time).

AWS GovCloud, Azure Government, and Google Cloud Government are FedRAMP-authorised at Moderate and High impact levels. AI services running within these environments can inherit the FedRAMP authorisation. Running AI workloads on standard commercial cloud and claiming FedRAMP compliance is not permissible.

The cross-border problem

Data residency requirements create perhaps the most complex challenge for enterprise AI. The issue is that AI inference must happen somewhere, and that somewhere may not comply with every data residency requirement you face.

Consider an enterprise with: EU customers (GDPR — transfer restrictions for data leaving the EEA), Chinese operations (PIPL — data localisation requirements for data leaving China), Australian employees (Privacy Act — restrictions on cross-border disclosure), and US government contracts (FedRAMP — data must stay within authorised boundaries).

No single cloud AI provider can serve all of these requirements from a single infrastructure. You need either: regional AI deployments (different providers or regions for different data), local inference (data never leaves the relevant jurisdiction), or a routing layer that directs data to the correct AI infrastructure based on the data's jurisdiction and classification.

This routing layer is the enterprise extension of the gateway pattern described in Module 6. The gateway does not just detect and redact PII — it also determines where the sanitised data should be sent based on the data subject's jurisdiction and the applicable regulatory requirements.

Your organisation holds a SOC 2 Type II report. A customer asks whether your AI system is 'SOC 2 compliant.' What is the accurate response?

✎

Module 3 — Final Assessment

Under GDPR, which of the following is a lawful basis for processing personal data in an AI system that analyses customer behaviour to improve product features?

What is the key difference between HIPAA's Safe Harbor and Expert Determination de-identification methods?

Under the EU AI Act, which requirement applies specifically to high-risk AI systems but not to minimal-risk systems?

An enterprise has EU customers, US government contracts, and operations in China. What is the most architecturally sound approach to AI data compliance?