Capstone: Your Edge AI Deployment Plan

From knowledge to action

You have spent twelve modules building the technical foundation for edge AI deployment. You understand the model landscape, the deployment targets, the privacy architectures, the economics, and the operational patterns.

Now it is time to apply all of that to your organisation.

This capstone is not a quiz. It is a structured planning exercise. Each exercise builds on the previous one, and the output is a one-page Edge AI deployment blueprint that you can take to your leadership team.

Work through these exercises with your actual data. Real project names, real query volumes, real regulatory requirements, real hardware inventory. The blueprint is only as useful as the specificity you put into it.

Before we start, what is the primary driver for edge AI in your organisation?

Exercise 1: Map your sensitive data flows

Objective: Identify every place where your organisation's data currently leaves your environment for AI processing, and classify the sensitivity of that data.

Step 1: Inventory current AI usage

List every AI tool, API, or service your organisation currently uses. For each, document:

Tool/Service	Provider	Data type processed	Sensitivity level	Volume (queries/day)	Users
Example: ChatGPT	OpenAI	Customer emails, internal docs	HIGH	~2,000	~200
Example: Copilot	Microsoft	Source code	MEDIUM	~5,000	~50

Include both sanctioned (IT-approved) and unsanctioned (shadow AI) usage. Survey a sample of departments to discover shadow usage -- in most enterprises, the unsanctioned usage exceeds the sanctioned usage by 2-5x.

Step 2: Classify data sensitivity

For each data type, apply your organisation's data classification scheme. If you do not have one, use this simple framework:

PUBLIC: Data that is already publicly available or intended for public release
INTERNAL: Data for internal use that would not cause significant harm if disclosed
CONFIDENTIAL: Data whose disclosure would cause material harm -- customer PII, financial data, trade secrets, legal matters
RESTRICTED: Data subject to specific regulatory requirements (HIPAA PHI, ITAR-controlled, classified) or whose disclosure would cause severe harm

Step 3: Identify the edge AI candidates

Data classified as CONFIDENTIAL or RESTRICTED that is currently processed by cloud AI services is your primary candidate for edge AI migration. This is where the data sovereignty value proposition is strongest.

Data classified as INTERNAL is your secondary candidate -- the cost reduction value proposition applies here.

Data classified as PUBLIC can stay on cloud APIs unless cost is a concern.

Output: A table of AI data flows with sensitivity classifications and migration priority.

Exercise 2: Select your deployment targets

Objective: For each edge AI candidate identified in Exercise 1, determine where the AI should run.

Decision matrix:

For each use case, answer these questions:

Question	Browser	Desktop/Mobile	On-Premises
Users need it on personal/varied devices?	Yes	Maybe	No
Data must stay on the individual's device?	Yes	Yes	No (stays in your DC)
Needs to work offline?	Partial	Yes	No (needs network to DC)
Requires >4B parameter model?	No	Maybe (12B)	Yes (27B+)
Needs to serve many users concurrently?	No	No	Yes
Users in regulated environment (HIPAA, ITAR)?	Case-by-case	Case-by-case	Yes (easiest compliance)

Map your use cases:

Use case	Users	Devices	Connectivity	Model need	Best target
Example: Contract review	50 lawyers	Laptops	Office Wi-Fi	27B for quality	On-premises vLLM
Example: Field inspection	200 technicians	Phones	Intermittent	2-4B sufficient	Mobile (offline)
Example: Help desk	30 agents	Desktops + browser	Reliable	2-4B sufficient	Browser (WebGPU)

Your organisation has 500 office workers (reliable connectivity, company laptops) and 2,000 field workers (intermittent connectivity, personal and company phones). Both groups need AI for document search and Q&A. What deployment architecture covers both?

Exercise 3: Choose models for each target

Objective: Select the specific model, quantisation level, and inference engine for each deployment target.

Model selection checklist:

For each deployment target from Exercise 2:

What is the maximum model size that fits? (Use the memory tables from Module 3)
What licence is acceptable? (Apache 2.0 preferred for enterprise)
What quality level is required? (Run your own benchmarks from Module 2)
What languages must be supported? (CJK needs → evaluate Qwen; European → Gemma or Llama)
What inference engine will you use? (Browser → WebLLM or Transformers.js; Desktop → llama.cpp or MLX; Server → vLLM)

Reference stack:

Target	Model	Quantisation	Size	Engine	Use case
Browser	Gemma 4 E2B	Q4_K_M	~1.5GB	WebLLM	Quick Q&A, summarisation
Browser	Gemma 4 E4B	Q4_K_M	~3GB	WebLLM	Higher quality Q&A, extraction
Browser (embedding)	Nomic Embed v1.5	FP16	~270MB	Transformers.js	RAG embedding
Mobile	Gemma 4 E2B	Q4	~1.5GB	MediaPipe LLM	Offline field AI
Desktop (Mac)	Gemma 4 12B	Q4_K_M	~7GB	MLX	Professional tools
Desktop (Windows)	Gemma 4 12B	Q4_K_M	~7GB	llama.cpp (CUDA)	Professional tools
On-prem (single GPU)	Gemma 4 27B	AWQ INT4	~15GB	vLLM	Enterprise service
On-prem (dual GPU)	Gemma 4 27B	INT8	~27GB	vLLM	Quality-critical tasks

Adjust based on your Exercise 2 findings. If your field workers need CJK language support, substitute Qwen 3 4B for Gemma 4 E4B. If your on-premises use case requires maximum quality, consider a larger model with tensor parallelism.

Output: A model manifest listing every model, format, quantisation, and engine you will deploy.

Exercise 4: Design your privacy architecture

Objective: Select the privacy architecture pattern for each deployment and map it to your regulatory requirements.

Step 1: Regulatory inventory

List every regulation that applies to your AI data processing:

Regulation	Applies to	Key requirement	Impact on AI architecture
Example: GDPR	EU customer data	Data cannot leave EU; minimisation; right to erasure	On-prem in EU data centre or browser-local
Example: HIPAA	Patient records	PHI requires BAA; audit trails	On-prem only; metadata logging

Step 2: Map use cases to patterns

From Module 10, the four patterns:

Fully Local (maximum privacy)
On-Premises Centralised (high privacy)
VPC-Isolated Cloud (moderate privacy)
Hybrid with Sanitisation (pragmatic privacy)

Use case	Data classification	Regulation	Pattern	PII detection needed?
Example: Contract review	CONFIDENTIAL	GDPR	Pattern 2 (on-prem)	No (no cloud component)
Example: Customer support	CONFIDENTIAL	GDPR, CCPA	Pattern 4 (hybrid)	Yes (PII gateway)
Example: Code assistance	INTERNAL	None specific	Pattern 3 (VPC cloud)	No

Step 3: Define your PII detection requirements

If any use case requires Pattern 4 (hybrid with sanitisation):

What PII types must be detected? (Names, emails, addresses, account numbers, health data, etc.)
What detection layers will you use? (Regex, NER model, LLM-based)
What is your tolerance for false negatives? (Zero tolerance for PHI, some tolerance for non-regulated PII)
How will you audit PII detection effectiveness?

Step 4: Define your audit logging

For each deployment:

What metadata will be logged?
Where will logs be stored?
What retention period applies?
Who has access to logs?
Do any regulations require content logging (not just metadata)?

Output: A privacy architecture diagram showing data flows, PII detection boundaries, and compliance mapping.

Having completed Exercises 1-4, what is the most common gap you have identified in your current AI data handling?

Exercise 5: Build the ROI case with your actual numbers

Objective: Produce a defensible ROI analysis using your organisation's actual data.

Step 1: Quantify current costs

From your Exercise 1 inventory:

Total AI queries per day:        _________
Average cost per query (cloud):  $_________
Monthly cloud AI API spend:      $_________
Monthly cloud storage/compute:   $_________
Annual compliance overhead:      $_________
Total annual cloud AI cost:      $_________

Step 2: Size your edge infrastructure

From your Exercise 2 and 3 selections:

Browser deployments:             _________ (no infrastructure cost)
Mobile deployments:              _________ (no infrastructure cost)
Desktop deployments:             _________ (no infrastructure cost)
On-premises GPUs needed:         _________ x _________ (type)
Total hardware cost:             $_________
Annual operating cost:           $_________
Staff requirement:               _________% of an FTE
Annual staff cost for AI ops:    $_________

Step 3: Calculate the numbers

Annual cloud cost (current):     $_________ (A)
Annual edge cost (projected):    $_________ (B)
Annual savings:                  $_________ (A - B)
Hardware investment:             $_________ (C)
Payback period:                  _________ months  (C / ((A-B)/12))
3-year net savings:              $_________ ((A-B) x 3 - C)
First-year ROI:                  _________% ((A-B-C) / C x 100)

Step 4: Quantify risk reduction (non-financial)

Queries with sensitive data currently sent to cloud:  _________/day
After migration, queries with sensitive data on cloud: _________/day
Reduction in external data exposure:                   _________%
Regulations now fully satisfied:                       _________
Vendor dependency eliminated for:                      _________%  of queries

Step 5: Assemble the one-page blueprint

Use this template to produce the document you take to leadership:

EDGE AI DEPLOYMENT BLUEPRINT
[Your Organisation Name]
[Date]

─── CURRENT STATE ───
[X] employees using AI across [Y] use cases
[Z] AI queries/day, [W]% processing sensitive data via cloud APIs
Annual cloud AI cost: $[A]
Compliance gap: [specific issue]

─── PROPOSED ARCHITECTURE ───
Tier 1 -- Browser (WebGPU)
  Model: [name, size]
  Use cases: [list]
  Users: [count]

Tier 2 -- Mobile/Desktop (native)
  Model: [name, size]
  Use cases: [list]
  Users: [count]

Tier 3 -- On-Premises (vLLM)
  Model: [name, size]
  Hardware: [GPU count and type]
  Use cases: [list]
  Users: [count]

Tier 4 -- Cloud (retained)
  Provider: [name]
  Use cases: [list, with justification for why edge is insufficient]
  Expected volume: [X]% of total queries

─── FINANCIAL IMPACT ───
Hardware investment: $[C]
Annual savings: $[A-B]
Payback period: [N] months
3-year net savings: $[3-year figure]

─── RISK REDUCTION ───
Data exposure reduction: [X]%
Regulations satisfied: [list]
Vendor dependency reduction: [Y]%

─── IMPLEMENTATION PLAN ───
Phase 1 (Month 1-2): Pilot with [team], [use case], single GPU
  Success metrics: quality >= [threshold], latency <= [target]
Phase 2 (Month 3-4): Expand to [departments], add browser deployment
Phase 3 (Month 5-6): Full rollout, mobile/offline deployment
Phase 4 (Month 7+): Optimise, expand use cases, evaluate model upgrades

─── DECISION REQUESTED ───
Approve Phase 1 pilot: $[pilot cost] investment, [N]-week timeline
Go/no-go decision for Phase 2 based on pilot results

✎

Module 13 -- Final Assessment

When building an Edge AI deployment plan, what should be the first step?

Your deployment blueprint includes browser-based AI for office workers and mobile-native AI for field workers. What connects these two deployment targets?

What is the most critical assumption in any Edge AI ROI calculation?

What is the recommended approach for presenting the Edge AI business case to leadership?