Question 1

What is PII (Personally Identifiable Information)?

Accepted Answer

PII is any data that can be used to identify a specific individual, either alone or in combination with other information. Common examples include names, email addresses, Social Security numbers, phone numbers, dates of birth, medical record numbers, and IP addresses. In research contexts, PII must be removed or de-identified before datasets can be shared, published, or deposited in repositories.

Question 2

What is HIPAA de-identification?

Accepted Answer

HIPAA de-identification is the process of removing protected health information (PHI) from datasets so that individuals cannot be identified. The HIPAA Privacy Rule provides two methods: the Safe Harbor method, which requires removal of 18 specific identifier types, and the Expert Determination method, which requires a qualified statistical expert to certify that the risk of identification is very small. This tool checks for patterns matching the 18 Safe Harbor identifiers.

Question 3

What is the Safe Harbor method?

Accepted Answer

The Safe Harbor method (45 CFR 164.514(b)(2)) requires removal of 18 categories of identifiers: names, geographic data smaller than state, dates (except year), phone numbers, fax numbers, email addresses, SSNs, medical record numbers, health plan beneficiary numbers, account numbers, certificate/license numbers, vehicle identifiers, device identifiers, URLs, IP addresses, biometric identifiers, full-face photos, and any other unique identifying number. After removal, the covered entity must have no actual knowledge that remaining information could identify an individual.

Question 4

Does this tool store or transmit my data?

Accepted Answer

No. All file scanning happens entirely in your browser using client-side JavaScript. Your files are never uploaded to any server. The text content is read into memory, scanned against regex patterns, and results are generated locally. When you close or refresh the page, all data is discarded.

Question 5

What file formats are supported?

Accepted Answer

The tool accepts CSV, TSV, JSON, XML, and plain text (TXT) files. Any text-based format will work since the scanner reads the file as plain text and applies pattern matching across all lines. Binary formats (XLSX, DOCX, PDF) are not supported -- convert them to CSV or text first.

Question 6

Can this tool guarantee my data is fully de-identified?

Accepted Answer

No. This tool uses pattern matching (regex) to detect common PII formats and is intended as a screening aid, not a certification tool. It cannot detect all forms of PII -- for example, it cannot identify indirect identifiers, rare name formats, or context-dependent information that could be used for re-identification. For HIPAA compliance, consult a qualified privacy officer or use the Expert Determination method with a statistician.

Metadata Anonymization Checker.

Calculator

Drop your file here to scan for PII

PII can hide in free-text fields

Dates of birth vs. study dates

Column headers reveal data structure

Pseudonymization is not anonymization

Method

Validated

How to cite

How to Cite

Data Anonymization Fundamentals

The 18 HIPAA Safe Harbor Identifiers

Frequently asked

Next steps

Data Management Software

Statistical Analysis Software

Need help?