Detection is the foundation of everything else
You cannot redact what you cannot find. Every privacy architecture in this course — the gateway pattern, the data privacy pipeline, the audit system — depends on a PII detection layer that is accurate, fast, and comprehensive. If your detection misses a Social Security Number, your redaction pipeline passes it through to the cloud model. If your detection flags every occurrence of "John" including the word "john" in "john_doe_table," your users will abandon the system within a week.
PII detection is a precision-recall tradeoff, and the right balance depends on your risk tolerance. A healthcare organisation processing PHI under HIPAA needs recall above 99% — missing even one identifier is a compliance violation. A marketing team using AI to analyse customer feedback might accept 95% recall if it means fewer false positives disrupting their workflow.
This module covers the three layers of PII detection — rule-based, ML-based, and LLM-based — and how to combine them into a pipeline that achieves both high recall and acceptable precision.