From knowledge to action
You have spent twelve modules building the technical foundation for edge AI deployment. You understand the model landscape, the deployment targets, the privacy architectures, the economics, and the operational patterns.
Now it is time to apply all of that to your organisation.
This capstone is not a quiz. It is a structured planning exercise. Each exercise builds on the previous one, and the output is a one-page Edge AI deployment blueprint that you can take to your leadership team.
Work through these exercises with your actual data. Real project names, real query volumes, real regulatory requirements, real hardware inventory. The blueprint is only as useful as the specificity you put into it.
Before we start, what is the primary driver for edge AI in your organisation?
Exercise 1: Map your sensitive data flows
Objective: Identify every place where your organisation's data currently leaves your environment for AI processing, and classify the sensitivity of that data.
Step 1: Inventory current AI usage
List every AI tool, API, or service your organisation currently uses. For each, document:
| Tool/Service | Provider | Data type processed | Sensitivity level | Volume (queries/day) | Users |
|---|---|---|---|---|---|
| Example: ChatGPT | OpenAI | Customer emails, internal docs | HIGH | ~2,000 | ~200 |
| Example: Copilot | Microsoft | Source code | MEDIUM | ~5,000 | ~50 |
Include both sanctioned (IT-approved) and unsanctioned (shadow AI) usage. Survey a sample of departments to discover shadow usage -- in most enterprises, the unsanctioned usage exceeds the sanctioned usage by 2-5x.
Step 2: Classify data sensitivity
For each data type, apply your organisation's data classification scheme. If you do not have one, use this simple framework:
- PUBLIC: Data that is already publicly available or intended for public release
- INTERNAL: Data for internal use that would not cause significant harm if disclosed
- CONFIDENTIAL: Data whose disclosure would cause material harm -- customer PII, financial data, trade secrets, legal matters
- RESTRICTED: Data subject to specific regulatory requirements (HIPAA PHI, ITAR-controlled, classified) or whose disclosure would cause severe harm
Step 3: Identify the edge AI candidates
Data classified as CONFIDENTIAL or RESTRICTED that is currently processed by cloud AI services is your primary candidate for edge AI migration. This is where the data sovereignty value proposition is strongest.
Data classified as INTERNAL is your secondary candidate -- the cost reduction value proposition applies here.
Data classified as PUBLIC can stay on cloud APIs unless cost is a concern.
Output: A table of AI data flows with sensitivity classifications and migration priority.
Exercise 2: Select your deployment targets
Objective: For each edge AI candidate identified in Exercise 1, determine where the AI should run.
Decision matrix:
For each use case, answer these questions:
| Question | Browser | Desktop/Mobile | On-Premises |
|---|---|---|---|
| Users need it on personal/varied devices? | Yes | Maybe | No |
| Data must stay on the individual's device? | Yes | Yes | No (stays in your DC) |
| Needs to work offline? | Partial | Yes | No (needs network to DC) |
| Requires >4B parameter model? | No | Maybe (12B) | Yes (27B+) |
| Needs to serve many users concurrently? | No | No | Yes |
| Users in regulated environment (HIPAA, ITAR)? | Case-by-case | Case-by-case | Yes (easiest compliance) |
Map your use cases:
| Use case | Users | Devices | Connectivity | Model need | Best target |
|---|---|---|---|---|---|
| Example: Contract review | 50 lawyers | Laptops | Office Wi-Fi | 27B for quality | On-premises vLLM |
| Example: Field inspection | 200 technicians | Phones | Intermittent | 2-4B sufficient | Mobile (offline) |
| Example: Help desk | 30 agents | Desktops + browser | Reliable | 2-4B sufficient | Browser (WebGPU) |
Your organisation has 500 office workers (reliable connectivity, company laptops) and 2,000 field workers (intermittent connectivity, personal and company phones). Both groups need AI for document search and Q&A. What deployment architecture covers both?
Exercise 3: Choose models for each target
Objective: Select the specific model, quantisation level, and inference engine for each deployment target.
Model selection checklist:
For each deployment target from Exercise 2:
- What is the maximum model size that fits? (Use the memory tables from Module 3)
- What licence is acceptable? (Apache 2.0 preferred for enterprise)
- What quality level is required? (Run your own benchmarks from Module 2)
- What languages must be supported? (CJK needs → evaluate Qwen; European → Gemma or Llama)
- What inference engine will you use? (Browser → WebLLM or Transformers.js; Desktop → llama.cpp or MLX; Server → vLLM)
Reference stack:
| Target | Model | Quantisation | Size | Engine | Use case |
|---|---|---|---|---|---|
| Browser | Gemma 4 E2B | Q4_K_M | ~1.5GB | WebLLM | Quick Q&A, summarisation |
| Browser | Gemma 4 E4B | Q4_K_M | ~3GB | WebLLM | Higher quality Q&A, extraction |
| Browser (embedding) | Nomic Embed v1.5 | FP16 | ~270MB | Transformers.js | RAG embedding |
| Mobile | Gemma 4 E2B | Q4 | ~1.5GB | MediaPipe LLM | Offline field AI |
| Desktop (Mac) | Gemma 4 12B | Q4_K_M | ~7GB | MLX | Professional tools |
| Desktop (Windows) | Gemma 4 12B | Q4_K_M | ~7GB | llama.cpp (CUDA) | Professional tools |
| On-prem (single GPU) | Gemma 4 27B | AWQ INT4 | ~15GB | vLLM | Enterprise service |
| On-prem (dual GPU) | Gemma 4 27B | INT8 | ~27GB | vLLM | Quality-critical tasks |
Adjust based on your Exercise 2 findings. If your field workers need CJK language support, substitute Qwen 3 4B for Gemma 4 E4B. If your on-premises use case requires maximum quality, consider a larger model with tensor parallelism.
Output: A model manifest listing every model, format, quantisation, and engine you will deploy.
Exercise 4: Design your privacy architecture
Objective: Select the privacy architecture pattern for each deployment and map it to your regulatory requirements.
Step 1: Regulatory inventory
List every regulation that applies to your AI data processing:
| Regulation | Applies to | Key requirement | Impact on AI architecture |
|---|---|---|---|
| Example: GDPR | EU customer data | Data cannot leave EU; minimisation; right to erasure | On-prem in EU data centre or browser-local |
| Example: HIPAA | Patient records | PHI requires BAA; audit trails | On-prem only; metadata logging |
Step 2: Map use cases to patterns
From Module 10, the four patterns:
- Fully Local (maximum privacy)
- On-Premises Centralised (high privacy)
- VPC-Isolated Cloud (moderate privacy)
- Hybrid with Sanitisation (pragmatic privacy)
| Use case | Data classification | Regulation | Pattern | PII detection needed? |
|---|---|---|---|---|
| Example: Contract review | CONFIDENTIAL | GDPR | Pattern 2 (on-prem) | No (no cloud component) |
| Example: Customer support | CONFIDENTIAL | GDPR, CCPA | Pattern 4 (hybrid) | Yes (PII gateway) |
| Example: Code assistance | INTERNAL | None specific | Pattern 3 (VPC cloud) | No |
Step 3: Define your PII detection requirements
If any use case requires Pattern 4 (hybrid with sanitisation):
- What PII types must be detected? (Names, emails, addresses, account numbers, health data, etc.)
- What detection layers will you use? (Regex, NER model, LLM-based)
- What is your tolerance for false negatives? (Zero tolerance for PHI, some tolerance for non-regulated PII)
- How will you audit PII detection effectiveness?
Step 4: Define your audit logging
For each deployment:
- What metadata will be logged?
- Where will logs be stored?
- What retention period applies?
- Who has access to logs?
- Do any regulations require content logging (not just metadata)?
Output: A privacy architecture diagram showing data flows, PII detection boundaries, and compliance mapping.
Having completed Exercises 1-4, what is the most common gap you have identified in your current AI data handling?
Exercise 5: Build the ROI case with your actual numbers
Objective: Produce a defensible ROI analysis using your organisation's actual data.
Step 1: Quantify current costs
From your Exercise 1 inventory:
Total AI queries per day: _________
Average cost per query (cloud): $_________
Monthly cloud AI API spend: $_________
Monthly cloud storage/compute: $_________
Annual compliance overhead: $_________
Total annual cloud AI cost: $_________Step 2: Size your edge infrastructure
From your Exercise 2 and 3 selections:
Browser deployments: _________ (no infrastructure cost)
Mobile deployments: _________ (no infrastructure cost)
Desktop deployments: _________ (no infrastructure cost)
On-premises GPUs needed: _________ x _________ (type)
Total hardware cost: $_________
Annual operating cost: $_________
Staff requirement: _________% of an FTE
Annual staff cost for AI ops: $_________Step 3: Calculate the numbers
Annual cloud cost (current): $_________ (A)
Annual edge cost (projected): $_________ (B)
Annual savings: $_________ (A - B)
Hardware investment: $_________ (C)
Payback period: _________ months (C / ((A-B)/12))
3-year net savings: $_________ ((A-B) x 3 - C)
First-year ROI: _________% ((A-B-C) / C x 100)Step 4: Quantify risk reduction (non-financial)
Queries with sensitive data currently sent to cloud: _________/day
After migration, queries with sensitive data on cloud: _________/day
Reduction in external data exposure: _________%
Regulations now fully satisfied: _________
Vendor dependency eliminated for: _________% of queriesStep 5: Assemble the one-page blueprint
Use this template to produce the document you take to leadership:
EDGE AI DEPLOYMENT BLUEPRINT
[Your Organisation Name]
[Date]
─── CURRENT STATE ───
[X] employees using AI across [Y] use cases
[Z] AI queries/day, [W]% processing sensitive data via cloud APIs
Annual cloud AI cost: $[A]
Compliance gap: [specific issue]
─── PROPOSED ARCHITECTURE ───
Tier 1 -- Browser (WebGPU)
Model: [name, size]
Use cases: [list]
Users: [count]
Tier 2 -- Mobile/Desktop (native)
Model: [name, size]
Use cases: [list]
Users: [count]
Tier 3 -- On-Premises (vLLM)
Model: [name, size]
Hardware: [GPU count and type]
Use cases: [list]
Users: [count]
Tier 4 -- Cloud (retained)
Provider: [name]
Use cases: [list, with justification for why edge is insufficient]
Expected volume: [X]% of total queries
─── FINANCIAL IMPACT ───
Hardware investment: $[C]
Annual savings: $[A-B]
Payback period: [N] months
3-year net savings: $[3-year figure]
─── RISK REDUCTION ───
Data exposure reduction: [X]%
Regulations satisfied: [list]
Vendor dependency reduction: [Y]%
─── IMPLEMENTATION PLAN ───
Phase 1 (Month 1-2): Pilot with [team], [use case], single GPU
Success metrics: quality >= [threshold], latency <= [target]
Phase 2 (Month 3-4): Expand to [departments], add browser deployment
Phase 3 (Month 5-6): Full rollout, mobile/offline deployment
Phase 4 (Month 7+): Optimise, expand use cases, evaluate model upgrades
─── DECISION REQUESTED ───
Approve Phase 1 pilot: $[pilot cost] investment, [N]-week timeline
Go/no-go decision for Phase 2 based on pilot resultsModule 13 -- Final Assessment
When building an Edge AI deployment plan, what should be the first step?
Your deployment blueprint includes browser-based AI for office workers and mobile-native AI for field workers. What connects these two deployment targets?
What is the most critical assumption in any Edge AI ROI calculation?
What is the recommended approach for presenting the Edge AI business case to leadership?