The HIPAA Privacy Rule  describes patient rights and protections pertaining to healthcare information.  The Privacy Rule applies to all forms of protected health information(PHI): verbal, written, electronic, and pantomime(not tested in court).  However, note that the Privacy Rule only applies to persons and organizations that meet the definition of a Covered Entity or Business Associate.

To meet the definition of Protected Health Information(PHI), there must be:

  1. An individual’s identifier, also known as PHI, and
  2. Any kind of information pertaining to the individuals health or health care, including billing and appointment information.


When it comes to HIPAA, the term identifier means any information that provides “a reasonable basis to identify an individual”. Identifiers are not only limited to the specific individual patient;   identifying information about the patient’s relatives, employers, and household members is ALSO considered an identifier of the individual patient.

The following lists the identifiable elements:1

  • Names
  • All geographic subdivisions smaller than a state, except for the initial three digits of the ZIP code if it contains more than 20,000 people
  • All elements of dates (except year) and all ages over 89
  • Telephone numbers
  • Vehicle identifiers and serial numbers, including license plate numbers
  • Fax numbers
  • Device identifiers and serial numbers
  • Email addresses
  • Web Universal Resource Locators (URLs)
  • Social security numbers
  • Internet Protocol (IP) addresses
  • Medical record numbers
  • Biometric identifiers, including finger and voice prints
  • Health plan beneficiary numbers
  • Full-face photographs and any comparable images
  • Account numbers
  • Any other unique identifying number, characteristic, or code
  • Certificate and license numbers


De-Identification is like pulling the pantyhose over the data

Careless treatment of PHI can lead to fines and even criminal actions. Fortunately, HIPAA describes two ways to de-identify PHI, rendering it NOT PHI, and safe to share. Section 164.514(a) of the HIPAA Privacy Rule provides the standard for de-identification of protected health information.  The ultimate goal of the standard is to manipulate PHI until it does not identify an individual and if the covered entity has no reasonable basis to believe it can be used to identify an individual. The two methods are 1) the Safe Harbor Method and 2) the Expert Determination Method.

The Safe Harbor Method

A Safe Harbor

The Safe Harbor Method is the most straightforward way to convert your PHI into just ‘HI’ that won’t get you into trouble later. Why the nautical name? A ‘safe harbor’ is a legal concept that essentially means: “If you follow these rules, you wont get in trouble”

Step 1 is simple: remember that list of identifiers (above)? Remove. Them. All.

Step 2 requires the covered entity to act in good faith. After de-identifying the data, the covered entity must not have actual knowledge that the information could be used alone or in combination with other information to identify an individual who is a subject of the information.

The Expert Determination Method

You need this stamp

The Expert Determination Method is less common. I suspect this is either because it requires an expert, requires documentation, or because someone has to take responsibility for it.

Step 1 – Find an expert. This person needs the knowledge and experience with generally accepted statistical and scientific principles and methods for de-identifying data. Think statistician, data scientist, etc. The bona fides are part of the method, so don’t skimp and ask the intern to do it.

Step 2 – The expert does her thing using data methods and science-based manipulations that result in data that could not reasonably be used to identify the individuals.

Step 3 – All those methods and manipulations in Step 2 must be clearly documented, and the PHI-free determination justified.

Which De-Identification Method Should You Use?

I has a method for u

I know what you are thinking: What if you hire an expert to do the Expert Determination, and then she simply deletes all the identifiers like the Safe Harbor Method? Well, I would say two things. First, if your ‘expert’ was Roy, tell him I said “hi”.  Second, you probably paid the expert too much for the work, because deleting identifiers isn’t super difficult.

So how do you decide which method to use?  It should be an easy decision if you have got no skills and no expert: choose to dock your data ship in the Safe Harbor.

In general, the Safe Harbor should be the first method you consider. It is the safest and simplest to implement. The most common problem caused by using the Safe Harbor Method, is removing all those identifiers results in information loss. This might be OK for what you need to do, but maybe you absolutely need those rare diseases, detailed geography, or old people in your data? In that case, Safe Harbor won’t get you there. You will need fancy statistics and data manipulation that result in bullet-proof privacy while still preserving the information you need. It’s not easy, hence the whole ‘expert’ thing.