Vendor trust is not binary
The question is not "should we trust this AI vendor?" The question is "what specific commitments does this vendor make about data handling, how are those commitments enforced, and what residual risk remains even with those commitments?"
Every major AI provider publishes data handling policies. Most enterprises read the marketing summary and skip the actual terms. This module teaches you to read the terms, identify the commitments that matter, spot the gaps, and ask the questions your vendor hopes you will not ask.
The distinction between what a vendor says ("Your data is safe with us") and what a vendor commits to in a legally binding agreement ("We will not use Customer Data to train models, we will retain Customer Data for no more than 30 days for abuse monitoring, and we will delete Customer Data upon written request within 30 days") is the difference between marketing and risk management.
Your procurement team says: 'We have reviewed the vendor's website and their data handling seems fine.' What is the appropriate response?
The actual data policies: provider by provider
Here is what each major AI provider commits to in their enterprise agreements as of early 2026. These are based on publicly available terms and published policies. Your specific contract may differ — always verify against your actual agreement.
OpenAI (ChatGPT Enterprise, API)
- Training: API and Enterprise data is not used for model training.
- Retention: Default 30-day retention for abuse monitoring. Zero-retention (zero data retention, or ZDR) is available for eligible API customers.
- Encryption: Data encrypted in transit (TLS 1.2+) and at rest (AES-256).
- SOC 2: Type II report available. Covers Security and Confidentiality trust service criteria.
- Sub-processors: OpenAI uses Microsoft Azure for cloud infrastructure. Sub-processor list is published and updated.
- DPA: Available and includes Standard Contractual Clauses for international transfers.
- Key limitation: The 30-day default retention means your data exists on OpenAI's infrastructure for up to a month unless you have a ZDR agreement.
Anthropic (Claude API, Claude for Work)
- Training: API and Claude for Work data is not used for model training.
- Retention: Up to 30 days for trust and safety. Zero-retention option available for eligible customers.
- Encryption: Data encrypted in transit and at rest.
- SOC 2: Type II report available.
- Sub-processors: Anthropic uses AWS and GCP for cloud infrastructure. Sub-processor list is available upon request.
- DPA: Available with SCCs.
- Key limitation: Similar 30-day retention window as OpenAI by default. Consumer Claude product may use data for training.
Google (Vertex AI)
- Training: Customer data is not used for model training under the Cloud Data Processing Addendum.
- Retention: Configurable logging retention. Vertex AI prediction logs can be disabled.
- Encryption: Data encrypted in transit and at rest. Customer-managed encryption keys (CMEK) available.
- SOC 2: Type II report available as part of Google Cloud's compliance programme.
- Sub-processors: Published Google Cloud sub-processor list. Infrastructure is Google-owned.
- DPA: Cloud Data Processing Addendum is comprehensive and GDPR-compliant.
- Key advantage: CMEK gives you control over encryption keys, adding a layer of protection. Regional endpoint selection allows data residency control.
Microsoft (Azure OpenAI Service)
- Training: Customer data is not used for model training or improvement.
- Retention: Prompts and completions are stored for 30 days for abuse monitoring by default. This can be disabled by applying for modified abuse monitoring.
- Encryption: Data encrypted in transit and at rest. CMEK available.
- SOC 2: Type II as part of Azure's compliance portfolio. Also ISO 27001, FedRAMP High (in Azure Government), HIPAA BAA available.
- Sub-processors: Microsoft's published sub-processor list. OpenAI is listed as a sub-processor for certain model access.
- DPA: Microsoft's Data Protection Addendum (DPA) is one of the most comprehensive in the industry.
- Key advantage: The broadest compliance portfolio (FedRAMP, HIPAA, ITAR via Azure Government). Content filtering is built in by default.
AWS (Amazon Bedrock)
- Training: Customer data is not used for model training.
- Retention: Configurable CloudWatch logging. Model invocation logs are opt-in, not default.
- Encryption: Data encrypted in transit and at rest. CMEK via AWS KMS.
- SOC 2: Type II as part of AWS's compliance programme. FedRAMP (via GovCloud), HIPAA, PCI DSS.
- Sub-processors: AWS-owned infrastructure. Third-party model providers (Anthropic, Meta, Mistral, etc.) are sub-processors for their respective models.
- DPA: AWS Data Processing Addendum with SCCs.
- Key advantage: Bedrock's architecture keeps data within your AWS account. You can use VPC endpoints for private connectivity, keeping all traffic within your network. The broadest model selection across multiple providers.
You need to process HIPAA-regulated PHI through a cloud AI service. Which vendor configuration provides the strongest compliance posture?
Reading DPAs and spotting red flags
A Data Processing Agreement (DPA) is the legal document that governs how a processor (the AI vendor) handles your data. Under GDPR, a DPA is required whenever a controller (your organisation) shares personal data with a processor. Even outside GDPR scope, a DPA is best practice for any AI vendor relationship.
What a DPA must contain (GDPR Article 28(3)):
- Processing only on documented instructions from the controller
- Confidentiality obligations for persons authorised to process the data
- Appropriate technical and organisational security measures
- Conditions for engaging sub-processors (including your right to object)
- Assistance with data subject rights requests (access, deletion, portability)
- Assistance with breach notification
- Deletion or return of data upon termination of the contract
- Availability of information for audits, including inspections by the controller
Red flags in AI vendor DPAs:
"We may use aggregated or anonymised data for service improvement." The word "anonymised" is doing heavy lifting here. If the vendor determines what constitutes "anonymised," they might consider pseudonymised data (which is still personal data under GDPR) to be anonymised. Insist on a clear definition of anonymisation that aligns with GDPR Recital 26.
"Data may be processed by our sub-processors as listed on our website." If the sub-processor list is on the website rather than in the contract, the vendor can change sub-processors without amending the DPA. Insist on a right to be notified of sub-processor changes and a right to object.
"We retain data for a reasonable period for service operation." "Reasonable" is undefined. Insist on specific retention periods (e.g., 30 days for abuse monitoring) with a mechanism to request earlier deletion.
"Our obligations under this DPA are limited to data processing within the scope of the Service." This may exclude incidental processing — for example, if your data appears in error logs, monitoring systems, or backup systems that are "outside the scope of the Service." Clarify that the DPA covers all processing of your data, not just the primary service function.
Absence of audit rights. GDPR requires that the DPA gives you the right to audit the processor's compliance. Some vendors limit this to reviewing their SOC 2 report. While a SOC 2 report is valuable, it is the vendor's chosen audit scope, not yours. Negotiate for the right to conduct your own audit (or engage a third party) if the SOC 2 report does not cover your specific concerns.
A vendor's DPA states: 'Customer Data may be used in anonymised form to improve the quality and safety of our AI services.' You are processing EU customer data through this vendor. What should you do?
Sub-processor chains and the 'zero retention' claim
Sub-processor chains: who else touches your data?
When you send data to an AI vendor, that vendor may use sub-processors — other companies that provide infrastructure, support, or processing services. Each sub-processor in the chain has access to some or all of your data and introduces additional risk.
A typical AI vendor sub-processor chain:
- Cloud infrastructure provider (AWS, GCP, Azure) — hosts the vendor's servers and stores data
- CDN provider (Cloudflare, Fastly) — may process data in transit
- Monitoring provider (Datadog, New Relic) — may receive telemetry that includes request metadata
- Support tools (Zendesk, Intercom) — may receive data if you file a support ticket that includes examples
- Trust and safety contractor — may review flagged content for policy compliance
Each sub-processor should be covered by data processing obligations that flow down from the vendor's DPA. Ask the vendor:
- What sub-processors have access to customer data?
- What data does each sub-processor access?
- Are sub-processor changes notified to you in advance?
- Do you have the right to object to a new sub-processor?
- Are sub-processor agreements at least as protective as your DPA?
The "zero retention" claim
Several vendors offer "zero retention" agreements. This sounds like it means your data is never stored. What it actually means varies by vendor:
What zero retention typically means:
- Prompt and completion data is not persisted to disk after the request completes
- Data exists in memory only during processing
- No logs of prompt/completion content are created
What zero retention typically does NOT mean:
- That your data was never on the vendor's infrastructure (it was — in memory, during processing)
- That metadata (request timestamp, model used, token count, IP address) is not retained
- That no trace of the request exists anywhere in the vendor's systems
- That sub-processors have the same zero-retention commitment
Questions to ask about zero retention:
- Does zero retention apply to all processing, including abuse monitoring and content filtering?
- Are there any exceptions where data may be retained (e.g., if flagged by automated content policy systems)?
- Does zero retention extend to sub-processors?
- How is zero retention verified? (Is there an audit mechanism?)
- What happens to data in flight if the system crashes during processing?
Zero retention is a valuable commitment, but it is not the same as "data never leaves your environment." The strongest guarantee remains local inference. Zero retention is the next best option for data that must be processed by cloud AI.
SOC 2 reports and your vendor assessment questionnaire
How to read a SOC 2 report for AI-specific risks
Most enterprises request SOC 2 reports from AI vendors but struggle to extract useful information. Here is what to look for:
Section III: System Description. This describes the system in scope. Verify that the AI service you use is explicitly included. If the report covers "Company X's cloud platform" but does not mention the AI API specifically, the AI service may be outside scope.
Section IV: Trust Service Criteria and Controls. For each relevant criterion, check:
- CC6.1 (Logical Access): How is access to customer data controlled? Who can view your prompts and completions?
- CC6.7 (Data Transmission): Is data encrypted in transit? What protocol and cipher suites?
- CC7.2 (System Monitoring): Is there monitoring for unauthorised access to customer data?
- C1.1 (Confidentiality — Identification): How is confidential data identified and classified?
- C1.2 (Confidentiality — Protection): What controls protect data from unauthorised access or disclosure?
- P6.1 (Privacy — Disclosure): To whom is personal data disclosed? Under what conditions?
Section V: Auditor's Opinion. Read the opinion carefully. "Unqualified" means controls were effective throughout the examination period. "Qualified" means there were exceptions — read the details of any exceptions.
Exceptions and Management Responses. If the auditor identified exceptions (controls that did not operate effectively), read the vendor's management response. Did they remediate? Is the remediation verified?
Your AI vendor assessment questionnaire
Here is a comprehensive questionnaire for evaluating AI vendors. Distribute this before signing any agreement.
Data handling:
- Is customer prompt and completion data used for model training or improvement? Under any circumstances?
- What is the default data retention period? Can it be reduced to zero?
- Are there any exceptions to the retention policy (e.g., content flagged by safety systems)?
- Where is data processed geographically? Can we specify the region?
- Is data encrypted in transit and at rest? What encryption standards? Can we use our own keys (CMEK)?
Access and control: 6. Who within your organisation can access customer prompt and completion data? Under what conditions? 7. What access controls and logging exist for internal data access? 8. Do you use any automated content review systems? If so, what data do they access?
Sub-processors: 9. What sub-processors have access to customer data? 10. How are we notified of sub-processor changes? 11. Do sub-processor agreements include data protection obligations equivalent to the DPA?
Compliance: 12. What compliance certifications do you hold? (SOC 2, ISO 27001, FedRAMP, HIPAA BAA, PCI DSS) 13. Is the specific AI service we are using within the scope of these certifications? 14. Can we review your most recent SOC 2 Type II report? 15. Do you support GDPR data subject access and deletion requests? What is the response timeline?
Incident response: 16. What is your breach notification timeline? Does it meet our regulatory requirements? 17. Have you experienced any data breaches affecting customer AI data? If so, what was the resolution?
A vendor's SOC 2 Type II report covers their 'cloud platform' but their AI API was launched six months after the audit period ended. What can you conclude about the AI API's SOC 2 compliance?
Module 10 — Final Assessment
A vendor's DPA says 'data may be used in anonymised form for service improvement.' What is the key risk?
What does 'zero retention' typically NOT mean when offered by an AI vendor?
When reviewing a vendor's SOC 2 Type II report, what should you verify first about the scope?
When evaluating sub-processors for an AI vendor, which question is most critical for data privacy?