Good information is the foundation of healthcare.  Data measures quality, revenue, and efficiency. Data informs providers and patients on diagnosis and treatments.  Accrediting authorities and regulators use data to ensure compliance. Data comes from many internal and external sources. Knowing where the data you use comes from helps you find it faster and understand how best to use it.   In this lesson we will learn where health data comes from, and how it it got there in the first place.

This lesson contains three sections: Medical Records, Claims Data, Other Internal Data, Patient Reported Outcomes, Vital Records, and Surveillance.  Although imperfect, categorizing data like this will help when it comes to finding the data you need.

Medical Records are data that reflects an individual patient’s health status, the treatments they received, medications, laboratory results, office and hospital visits, and all other healthcare.

Medical records are used to track events and transactions between patients and health care providers as documentation to support care. Medical records are also used for billing, legal and compliance, research, and as a communication platform between providers and patients and providers. Medical record data helps us measure and analyze trends in health care use, patient characteristics, provider behaviors, and quality of care.

Medical records are typically accurate and detailed because they come from health care providers, but data entry mistakes do happen. Sometimes medical record data will be reported by the patient(e.g. allergies, but entered into the EHR by staff. Most medical record data reflects the provider’s judgment, and contains information that patients might not think to add or feel comfortable sharing. But, as it is written down in a specific context, it can be misinterpreted if taken out of context. Because the medical records are (by definition) for people who are receiving medical care, they may be misinterpreted if taken out of context.

Medical records are most often found within the EHR system, but may be stored in other systems as well. The most common data model for linking(arranging) medical record data is to key off of encounters. See figure on right.  An encounter is a patient visit.    A patient and provider is associated with each encounter. A provider may order multiple medications and labs that are associated with an encounter. Because of this data model, a little bit of care is needed to extract and use medical record data correctly.    You can link the tables, or pull data from them one at a time.  For example, if you wanted to pull Medication and Laboratory data for a specific PATIENT_ID, you could not do this directly because these tables don’t have PATIENT_ID. You would need to get the ENCOUNTER_IDs for the specific patient, and use the ENCOUNTER_ID to pull the Medication and Laboratory data.

Encounter Table

Encounter ID Patient ID Arrival Date Time Departure Date Time Attending Provider Visit Type
11111 122345 11/22/2019 08:40 11/22/2019 09:55 Dr. Jones’s ID Urgent

Patient Table

Patient ID First Name Last Name Date of Birth Sex Race
122345 Garfield Cat 06/19/1981 M Orange

Diagnosis Table

Encounter ID Diagnosis Present on Admit Entered By Date Time Assigned ….
11111 E55.0 (Rickets) Yes Dr. Jones 11/22/2019 08:22


Common Data Elements Found in the Encounter Table

Encounters can be inpatient or outpatient
  • Patient ID
  • Attending Provider(s) ID
  • Insurance Used
  • Type of Visit (urgent, elective, etc.)
  • Visit/Admit start date & Time
  • Visit /Admit end date & Time
  • Disposition of Patient (Discharged, AMA, expired, etc.)
  • Where they were admitted from (e.g. Home, long term care facility, etc.)
  • Where patient was discharged to (Home, transfer, etc.)

NOTE:  Admission, Discharge, and Transfer (called “ADT”) data are often not considered clinical and may be managed in a separate ADT-dedicated computer system.  ADT data is enormously important administratively and legally because the organization must know who it is responsible for at all times.

Common Data Elements Found in the Patient Table

It can be tricky to handle some data elements in this table that change
  • First, Last, and Middle Names
  • Birthday
  • Sex
  • Race
  • Ethnicity
  • Marital Status
  • Social Security Number
  • Address
  • Phone(s)
  • Guarantor
  • Parent or Guardian (if minor)
  • Insurance

Common Data Elements Found in the Diagnosis Table

Most Diagnosis are assigned by a professional medical coder based on provider notes
  • Encounter ID
  • Diagnosis Code Version (ICD-10, SNOMED, etc.)
  • Date/Time Assigned
  • Purpose (billing, problem list, etc.)
  • Assigned by Staff ID
  • Order of diagnosis (1,2,3…)
  • Primary or secondary diagnosis
  • Present on admission (Y/N)

Common Data Elements Found in the Laboratory Table

Lab data includes both what tests are ordered, and what tests come back
  • Encounter ID
  • Laboratory System Unique ID
  • Ordering Provider ID
  • Date/Time Ordered
  • Laboratory Code Ordered (LOINC, etc.)
  • Date/Time Completed
  • Numeric Results (“15”, etc)
  • Textual Results (“Normal”, etc.)
  • Reference Ranges
  • Sample Collection Data
  • Additional Microbiology or Pathology Data

Common Data Elements Found in the Medication Table

Pharmacy systems contain dispensing, administration, and other data
  • Encounter ID
  • Prescriber (Ordering) Provider ID
  • Ordering Date/TIme
  • Medication Code (RxNorm, NDC, etc.)
  • Dosage
  • Route
  • Packaging
  • Frequency
  • Amount
  • Start Date Time
  • Stop Date Time
  • Dispensed By Provider ID
  • Administered By Provider ID
  • Infusion and titer data

Other Common Data Elements in Medical Record Data

Radiology data includes images as well as notes from the provider to what they see in the images. Most EHRs also have genetic laboratory test information that may be separate from the other labs. This data may include sequence data as well as mutations observed. Surgical procedure cases, bedside treatments, patient assessments, progress notes, and many others can be found in the Medical Records data.

Advantages and Disadvantages of Using Medical Record Data

Most of the limitations of medical data are a result of the fragmented nature of storing the records. A ‘complete’ view of the care given to one patient may be spread across a dozen systems and provider organizations. Sensitive records such as behavioral health may also be kept separate. However, a large part of the medical record (up to 75%) is entered and stored as narrative text, such as notes. Surgical reports, pathology, radiology, progress notes, and family history are all often stored as free text in the database. Because computers are not great at reading text yet, it is difficult to use all of the data in the medical record for analysis.

Other limitations of medical data are more nuanced. For example, providers document within the clinical record either to support billing, or to make a record for themselves/coworkers that they can view and understand later. Now that patients can view their own documentation, providers are documenting with this in mind and using friendlier words and omitting some diagnosis. For example, non-compliant and obese patients may not be described as such. Laboratory equipment and assay preparations may differ from one provider to the next, resulting in negligent clinical differences, but noticeable in large numbers.

Advantages of Medical Record Data

    • Rich in clinical detail.
    • Viewed by providers as credible.

Challenges of Medical Record Data

    • The complexity, and time required to compile data across different sites, particularly if a different record format is used.
    • Current use of paper for some records, which means that trained staff must manually abstract information.

Claims Data

CMS 1500 Form

Claims Data is used primarily in the financing of healthcare. A medical claims record is generated every time a patient sees a doctor, pharmacy or any healthcare provider. Because healthcare services often track their fees more adamantly than Medical Records data, and Billing was computerized long before clinical care, Claims Data has historically been more available than Clinical Data. The data elements in Health Provider Billing systems reflect the standard billing forms for Inpatient and outpatient care. The UB-04 (CMS 1450) is a claim form used by hospitals, nursing facilities, and other inpatient facility providers. For outpatient services, the HCFA1500 (CMS 1500) is a claim form used by individual doctors & practices, nurses, therapists, and other professionals. Review the two forms above to see the various data captured for claims.

Claims data has diagnosis (usually up to 12 of them) and procedures, but lacks many of the Clinical Record patient data elements. The data it does have, could even be slightly different than data entered by physicians. This is because billing data is entered by professional medical coders, who are trained to assign the most appropriate level diagnosis for billing. So it might be slightly off, or at the wrong level of granularity, when compared to problem lists that providers keep on patients.

The following table lists example data elements of Claims Data. More detail can be found at the Virginia Patient Level Data Data Directory

Hospital ID Number Medicare Provider Number Provider’s NPI Age in Days
Admit Source Age in Years Sex Race
Admit Type Patient Zip LOS Patient Status
Diagnosis 1-18 Diagnosis 1-18 Present on Admit Procedure 1-6 Procedure Length
DRG MDC APRG Total Charges
Room and Board CHarges Routine Care Charges Intensive Care Charges Anesthesiology Charges
Pharmacy Charges Radiology Charges MRI/CT Charges Nuclear Medicine Charges
Clinical Lab Charges Labor & Delivery Charges Operating Room Charges Oncology Charges
Med/Surg Supply Charges Other Charges Payer County
Health Planning District State External Injury Codes Infant Birth Weight
Attending Physician Operating Physicians 1-3

As you can see from the table, the charges are fairly detailed in claims data. Charges certainly indicate resources used can be useful for various analyses. However a major limitation of charges is that they do not represent actual payment. Also, note that provider information (attending and operating physicians) is includes in this publicly accessible data. Privacy regulations apply only to patients and their families.

Claims data has its limitations. Claims Data from one provider may be fragmented and not represent all of the patient care. For example, if you rely on billing data from one integrated delivery network(IDN), you will have the diagnosis, procedures, and encounters performed by those providers at those facilities within the network. However, if the patient sees a doctor outside of the network, fills a prescription, or receives services outside of the IDN, that care will not be present in the data. If you want a more complete view of claims data, you must get the comprehensive data from the insurer or from an all payers claims database. Still, this data source will not include healthcare not covered by insurance, such as some dermatology, elective procedures, etc.

  • Advantages of Administrative Claims Data
    • Available electronically.
    • Less expensive than obtaining medical record data.
    • Available for an entire population of patients and across payers.
    • Fairly uniform (and improving) coding systems and practices.
  • Challenges of Administrative Data
    • Limited clinical information.
    • Questionable accuracy for public reporting because the primary purpose is billing.
    • Completeness.
    • Timeliness.



Do You Get IT?

Other Internal Data

Aside from Medical Record Data and Claims Data, there may be other valuable Internal data found within your organization. Payroll and staff scheduling systems will contain data on when and where your staff are working. This is useful when you need outcomes to be associated with staff members, or you are evaluating processes related to scheduling. For example, you can link patient satisfaction to specific providers, or nurse-sensitive outcomes to specific nursing schedules. If you linked scheduling data with -usually (but not always) on your IT systems. Internal data can come from a variety of systems, including the EHR, Human Resources systems, Billing, and others. Whether it is organization costs and revenues, patient information, or payroll figures, internal data is nearly always sensitive. Internal data is nearly always kept strictly confidential, sometimes creating sharing challenges even within the organization.

Log files constantly capture who does what

Log Files and Clickstream Data are generated automatically by the computer systems while they are being used. Typical logs are a running record of actions including who does what, and when they do it.  Logs are often simple lists contained a text file (like a cash register receipt), but can also be stored in databases if they are intended to be used regularly.  The most common use of logs within healthcare are audits related to compliance and security.  For example, analyzing database log data verifies patient confidentiality by ensuring only the right staff access patient data, and analyzing network logs could identify security breeches.  Clickstream data is a type of log data that represents how people use software. Clickstream data shows what buttons and links are clicked, and what features (and pages) are used.  In healthcare, Clickstream data is commonly used to analyze provider’s response to an alert message.  If an alert is frequently ignored or overruled without reason, it should be addressed.  At best it is an inconvenience, but it could possibly a safety risk.

Supply Inventory, equipment maintenance, and housekeeping systems track how much supplies are kept on hand, how quickly rooms get

RTLS Systems Know where people and things are in real time.

cleaned (and turned over) and how long medical devices are out of service while being maintained. Some critical supplies such as blood products and implantable devices (like various stents ) are difficult to keep in large quantities, and run out sometimes. Analytics could help with this supply chain challenge.

Real Time Location Tracking Systems (RTLS) are described in more detail in another lesson. Using RFID and other technologies, RTLS systems track where Staff, Equipment, and Patients are located over time. Joining this spatial data with Medical Record data could be used for many different important analyses, such as uncover root causes and transmission of hospital acquired infections, Identifying facility specific contributions to patient falls, etc.

Patient Reported Outcomes

The patient’s preferences, feelings, experience, and perceptions are very important and useful data. These self reported information are called patient

PROMIS Patient Reported Outcomes

reported outcomes (PROs) and can be almost anything including pain levels, depression, and satisfaction with their care. Patient reported outcomes are captured during the visit as well as outside of the patient encounter. Survey-like forms are often used to self- administer the PROs using tablets, paper, or verbally from a staff member.

Patient reported outcomes do not have to be scientifically validated, but untested patient reported outcomes might be unreliable, give you biased results, or may not be measuring what you think you are measuring. Validated patient reported outcomes are considered trustworthy and dependable. PROMIS is NIH-funded repository of validated measures maintained by Northwestern University that are available for free and can be administered in multiple ways. PROMIS is a great resource, but has a relatively small number of Patient Reported Outcomes. You may have to search PubMed to identify specific condition PROs. Still ,these may require permission and conditions of use.

The PHQ-2 is an easy to use depression screening tool.

The US Agency for Healthcare Research and Quality hosts list of valuable PRO tools and calculators on its Electronic Preventive Services Selector page. A two question depression screening tool called the PHQ-2 is among several options for depression screening. It comes with instructions for use and scientific information regarding its development and validation. You might include these in a questionnaire you give to patients each time they visit, or send it to patients considered higher risk for depression via the patient portal.

You can often find some PRO data in the EHR and Patient Portal, and EHRs are getting better at supporting patient reported data. However many organizations have dedicated survey systems such as REDCap and Qualtrics, that offer a higher degree of sophistication to collect the data from patients.


  • Advantages of PRO data
    • Captures types of information for which patients are the best source
    • Well-established methods for survey design and administration
    • Easy to understand and relate to survey results
  • Challenges of PRO data
    • Cost of survey administration
    • Possibility of misleading results if questions are worded poorly, survey administration procedures are not standardized, the population sampled is not representative of the population as a whole (sampling bias), or the population is not represented in the responses (response bias).

Vital Records

Vital Records Data typically consist of Birth Records, Death Records, as well as Marriage and Divorce records. These records are maintained in local jurisdictions and can often be found through state health departments. In addition to local death certificate data, National Death Index (NDI) and the Social Security Administration’s Death Master File (SSDMF) also maintain mortality records, but at a national level. Healthcare delivery organizations analyze death rates for compliance reporting, research, and quality improvement. Not all patients die within the health care system so observed mortality does not sufficiently capture actual death in your patient population. Governmental death data is often imported and used to supplement observed mortality within patient records. Death data files have name, birth date, death date, social security number, sex, race, and even father’s surname if available.

Advantages of Vital Records Data

  • You can’t get it anywhere else
  • Information required to properly maintain patient records

Challenges of Vital Records Data

  • Can be costly and data requests burdensome
  • Does not have all deaths

Medicare Data

Medicare provides extensive data that can (usually) be obtained and used free of charge. All publicly available Medicare data have the patient identifiers removed, but the physician and hospital identifiers are retained. Medicare provides data on:

  • Physicians
  • Hospitals
  • Nursing Homes
  • Dialysis Centers
  • Hospice
  • Long term care
  • Inpatient Rehabilitation

Within these provider types, the data may include procedures, beneficiary (patient) diagnosis, prescriptions, payments, patient experience, demographics and other valuable information. For example, you could download Medicare Provider Utilization and Payment data for your region and see which local physicians treat the most patients of a certain type. You could look at opioid prescribing rates by provider and almost limitless number of other useful analyses.

Advantages of Medicare data

  • Freely available to download and use
  • Provider and hospital identifying
  • Complete for older population over many years

Challenges of Medicare data

  • Same as all administrative claim data (see above)
  • Only older population is represented in data

Do You Get IT?