Every API call is a data leak
Let us start with the uncomfortable truth that most enterprise AI strategies quietly ignore.
Every time an employee pastes a customer contract into a cloud-hosted LLM, that contract text travels across the public internet to someone else's data centre. It is processed on someone else's GPUs. It is logged -- at minimum for abuse monitoring, often for longer. The response travels back across the internet. Even with TLS encryption in transit, the data is plaintext at the provider's inference endpoint.
This is not a hypothetical risk. This is the normal operating mode for every cloud AI API. OpenAI, Anthropic, Google -- all of them process your data on their infrastructure. Enterprise agreements and data processing addendums reduce the contractual risk, but they do not change the physics. The data leaves your environment.
For many organisations, this is acceptable. For many others, it is not. And the gap between those two groups is widening, not shrinking.
Consider what enterprise employees actually want to do with AI: summarise internal legal documents, analyse financial reports, draft responses to customer complaints that reference account details, generate code that touches proprietary algorithms, review HR documents containing employee personal data. Every one of these use cases involves data that has no business being on someone else's infrastructure.
The conventional response is to negotiate a zero-data-retention agreement with a cloud provider, deploy behind a VPC, or use a provider's "private" offering. These help. They do not eliminate the fundamental problem: the data still leaves your environment during inference.