Baby Toys Hold the Key to Understanding Healthcare Data Types

August 5, 2013

Babies learn to recognize and differentiate objects, so why can't today's EHRs recognize and differentiate data?

The Short Version

One of a baby's first tasks is to learn to recognize and differentiate objects. Some taste good or are fun; others are unpleasant. Before long, most children become proficient at recognizing objects and their subtle differences. They come to learn that there are different types of objects. You may recall the "Sesame Street" segment designed to reinforce this skill: "One of these things is not like the others…" Even as the ability to form abstractions develops, objects remain the subject (data) of much of the brain's "computing."  

Every object has characteristics (properties). Toy blocks have properties such as shape, size, and color. The differences between blocks are reflected in the values of the properties. Babies do not need to attend "CS101: Introduction to Object Oriented Programming" to learn that objects have properties. They learn it in a tangible, non-verbal way, only rarely grasping the abstract concept. By the time they "grow up to be doctors and lawyers and such…" they forget that they ever knew it. The baby toy shown below encapsulates most of what there is to know about object data types:

 

 

 

 

 

 

 

 

 

 

 

 

Photo courtesy: Fisher-Price

The objects are "data" and their shape determines their type. The storage device (toy box or computer) can only store and manipulate the types (shapes) it was designed to accommodate. Other types can't be stored unless they are damaged by squeezing, bending, trimming, beating with a hammer, etc. Alternatively one could decompose each object (toy block) apart into smaller pieces of types that can be stored (the Lego approach), and hope that, unlike Humpty-Dumpty, you will be able to put the pieces together again. Lego's basic shapes are 2x2 bricks and plates and 4x4 bricks and plates. The typical computer's basic shapes are unadorned integers, decimal point numbers, date/time values, and character strings. Neither a disassembled Lego construction nor a disassembled complex data object such as an encounter note gives any indication of what it was or how to reassemble it.

A computer can't store actual blocks, it can only store numbers and letters. A block could be represented as {orange, star} or {3, 2}. Properties considered irrelevant or difficult to represent are ignored or drastically simplified (with an attendant loss of detail). Later, if it is realized that other properties such as composition, size, and taste (in the case of blocks) were important, these attributes, not being part of the computer representation, will not be available because they were never captured. The "system" will have to be redesigned. 

So there you have it. Medicine is a source of complex information that requires a rich representation. Each encounter note can be thought of as an information-bearing object - an instance of the encounter-note data-type. Today, narrative is not directly usable as data. Anything not captured faithfully at the point of care will be forever lost. To remedy this it will be necessary to build computers that understand (have built-in support for) the encounter-note and other complex data-types in addition to integer, float, date, and string.  

The Long Version

One of the best developed human skills is the ability to recognize objects by simply observing and interacting with them. People naturally think in terms of objects. A major use of computers is to help people do things to or with everyday objects or information about them but before the computer can do its work, people must supply it with the objects' descriptions. Only then can inventory and orders be tracked, the design of buildings and cars automated, production lines controlled, etc.

Computer programs concerned with real-world objects must adopt some convention for representing those objects using the (limited) palette of "built-in" data types that the typical computer provides. Some programming languages operate on separate data elements and depend on the programmers to keep track of all the pieces of each computation. Other languages use representations that group together those object properties that are relevant to the program. Programmers can leverage their innate ability to understand physical objects if a language allows them to think in terms of "manipulating" data objects rather than performing a series of computations.

"Object-oriented programming made its first appearance at MIT in the late 1950s and early 1960s… Smalltalk… developed at Xerox PARC (by Alan Kay and others) in the 1970s, introduced the term object-oriented programming to represent the pervasive use of objects and messages as the basis for computation" and became the basis for Apple's Macintosh. Each of the dozens of object-oriented languages that have been developed is an effort to coax "meaningful" behavior out of computers that would otherwise be little more the giant ENIACs performing math computations. 

Objected-oriented design begins by identifying and isolating those object properties that are directly related to the goals (specifications) of the program. Any other properties are ignored. This materially simplifies the job at the expense of generality - should other object properties become interesting or necessary in the future, the application will need to be redesigned.

Object-oriented programming has not proven to be the "silver bullet" that will make it easy to create a robust computerized medical record. The techniques and technology needed to produce a computerized record differ substantially from traditional programming (object-oriented or not) as it is commonly practiced; a point that seems to have escaped almost everyone.

In addition to being a collection of paper or computer bits, "medical record" is an abstract concept. In order fulfill its promises of longevity, authenticity, accuracy, security, and privacy a medical record must manifest specific performance characteristics. These characteristics are "theoretical" in the sense that they can be established and validated by reasoning from first principles as opposed to the current certification and "meaningless use" criteria that are a capricious manifestation of opinion and belief.

Rather than being a product derived from a theoretical foundation, today's EHRs, and almost all other software, are the product of one of two development approaches: market-driven and bespoke

The market-driven approach involves the interplay of three forces: features, schedule, and budget. Microsoft (MS) Word, for example, originated as a market-driven activity. It began with a business decision to create a word processor that people might buy. The marketing department identified and prioritized the features. A feature might be "cut and paste," "save as," or spell checking. The executives set the schedule - it must be on the shelves by Thanksgiving. The CEO and the board set the budget - we are willing to allocate $X to this venture. At some point in market-driven development, schedule or budget constraints come to predominate and features that have not been finished or tested are dropped. They may or may not make an appearance in a subsequent version.

On the other hand, the FAA's Air Traffic Control System (ATCS) and the avionics for the Boeing 787 are examples of bespoke development. There is a long list of requirements that must be met. Most are not negotiable. Many are interdependent. All are spelled out in minute detail. Some avionics requirements are, as it turns out, based on the expectation that the ATCS will behave as specified. Since the feature list (requirements) is not negotiable, bespoke systems are notorious for being delivered late and/or over budget. Obviously application like these are not Commercially Available Off-the-Shelf (COTS).

In either model, application developers generally do not consider, or are not allowed to consider, building a radically new kind of computer or creating a new programming language before writing their code.*

In market-driven development there is no external, preexisting object that must be faithfully represented. Data structures are devised only to ensure that the program functions as specified. If the initial choice of a data structure interferes with a desired function, it can be replaced with another. Bespoke development is similar in that most specifications focus on desired behavior.

Developers are free to choose whatever they consider to be the "best" representations - best may mean faster processing, smaller, more compatible with constraints imposed by the tools and hardware, etc. In neither approach are that data structures devised for any reason other than to satisfy the specifications. Obstacles that arise during development are rarely allowed to derail a project; a workaround (compromise) will be found, using the tools and data types that are available.

Many workarounds exploit the fact that computers excel at performing repetitive sequences of simple operations. Since computers lack complex native (built-in) data types, developers simulate complex data types (that are not built-in) by doing simple things repetitively. One common approach is to retrieve an assortment of basic data elements from non-volatile memory (disk), construct a data object (a complex representation constructed in volatile memory) using a repetitive series of operations and later, disassemble the object, and save its parts, piece by piece.

As I have pointed out before, this leaves the knowledge about how the parts interrelate nowhere except in the mind of the developer, (perhaps) in a manual and implied by the program code. Should the data on disk outlive the computer itself and the developers, its meaning will be difficult or impossible to reconstruct. There is nothing equivalent to the Rosetta Stone for the data stored in an EHR. This highlights the need for data types capable of representing complex information that are also self-documenting. 

Encounter notes are a distinct data-type. Narrative notes not only present details about a patient and their care in words that are expected to accurately convey the author's thoughts to others. They also have formatting and internal structure, similar in some ways to an outline. Notes created with the aid of a computer can contain contextual information (such as the units in which the weight was recorded) in a way that is hidden from the view of a reader but is accessible for computer processing. While today's EHRs accept narrative notes, not much use is made of them to satisfy an organization's need for data. Instead, data is collected as a separate activity, duplicating much of what was included in the narrative.

Duplication is not only costly, it increases the risk that patients will be injured by the fact that data has been separated from its context. Consider simple numerical data. People recognize that 12 mph is not the same as 12 lbs. yet to the typical computer they are both just 12; the data-type for representing an integer makes no provision for carrying the units along with the value.

Current programming practice lacks a built-in mechanism for ensuring that each 12 is used in only the appropriate context or for detecting situations in which it is not. Although the 12 itself, provides programmers no clue as to its meaning, they are expected to always get it right. If you doubt the importance of this sort of contextual information, consider the following: "[O]n September 23, 1999, communication with the [Mars Climate Orbiter] was lost as the spacecraft went into orbital insertion, due to ground-based computer software which produced output in non-SI units of pound-seconds (lbf×s) instead of the metric units of newton-seconds (N×s) specified in the contract between NASA and Lockheed." If a data collection procedure causes a loss of detail or causes the meaning of what was captured to be distorted, the error can never be corrected, nor will it ever be detected. All will appear to be well until calamity strikes.

In order to function optimally, a computerized medical record should be built on a platform that provides a built-in data type tailored to the needs of an encounter note. Without such support it is difficult and costly to create a general framework that allows the document everything a practitioner, the standard of practice or local policy considers important. Once the encounter has concluded, that which was not captured because a specific "field" was not provided is lost forever.

For EHRs to meet the goal of creating meaningful medical records capable of remaining usable during a patient's entire lifetime. Two things must change: 1.) health information technologists must come to understand and accept that their current methodologies and computers, while suitable for many applications, are completely unsuited to the needs of a computerized medical record; and 2.) computer scientists and computer manufacturers must recognize that the computerized medical record is a qualitatively different type of information than that to which they are accustomed.

New computer architectures and programming languages must be developed that understand (have built-in support for) the encounter-note and other complex data-types. While awaiting the new technology, effective workarounds are only possible if developers recognize and accept that they have a problem. Failure to address this crucial issue will mean that EHRs will continue to be expensive disappointments.

* For an example of a case where this was done,  read Frederick P. Brooks: The mythical man-month (anniversary ed.) Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA ©1995 ISBN:0-201-83595-9