Why Data Trumps Essential Narrative Notes in Today's EHRs

July 22, 2013

The government has committed over $20 billion to incentivize the use of today's EHRs and …not one dollar to explore an alternative.

Some of you may find my articles provocative and yet may wonder, "Where's the beef?" Others may prefer something short and pithy. In an effort to satisfy both interests, the next few articles will include both a short version that concentrates on the conclusions and a longer, more detailed discussion that is more technical. Please do not feel obliged to wade through the details if they are not your cup of tea.

The Short Version

Traditional narrative notes are inherently superior to discrete data elements for the purpose of patient care, but many objectives of healthcare cannot be satisfied without discrete data. Narrative notes are neither an efficient nor a reliable source of such data using today's technology. The very aspects of today's EHRs that make them quite good at data collection make them ill-suited to computerizing medical records. The reasons for this can be traced to decisions made when the first digital computers were designed and built in the 1940s.

Several factors were influential: the available technology, the nature of the computations the device was designed to carry out, urgency, and cost. The available technology consisted of relays and, later, vacuum tubes, both of which act as switches and can be used as "bits" to represent numeric data. The first computer, Electronic Numerical Integrator and Computer (ENIAC), was built to perform ballistic calculations on relatively small numbers. The design of both the hardware and software were tailored to this requirement. ENIAC was begun before the end of WWII; it was urgently needed. And finally, relays and vacuum tubes were large, expensive and power hungry - ENIAC consumed 174 kilowatts when running - hence a need to use a minimum number of components.

Most computers designed since then have been targeted at the same sort of computational problems, albeit ones more complex. Cost and power consumption have remained factors that have been optimized at the expense of generality. Crucially, when a decision is taken to build a system, urgency becomes a factor. This fosters a willingness to utilize whatever is on hand, as a compromise, rather than to develop from scratch a computer and software tailored to the nature of the specific problem.

So there you have it. Most of today's computers are conceptual progeny of ENIAC and share its traits and goals, which relate to data collection and calculation rather than to creating faithful in-computer embodiments of the complex information that reflects a practitioner's thoughts. With no obvious way to get usable data out of narrative, EHRs have relied heavily on the built-in data-related features of today's computers to build pretty good data systems that are mediocre medical records.

It is possible to design information representations to capture a practitioner's findings and thoughts more effectively than those that are routinely used. It is possible to design and build processors that are optimized to process those representations efficiently and effectively. Nevertheless, the government has committed over $20 billion to incentivize the use of today's demonstrably ineffective computers and not, as far as I can determine, one dollar to explore an alternative.

The Long Version

Last week, I emphasized the inherent superiority of narrative over discrete data elements for the purpose of accurately representing information that describes a patient's condition and care. I also noted that today's computers are ill-suited to computerizing this sort of information in a meaningful way. The obvious question is: How can this be and why is it so?

The answer can be found in the history of computing and the history of electrical and electronic devices. First, the history of computing: "Ballistics computing, a man's job during World War I, was feminized by World War II," according to "When Women Were Computers" by Jennifer S. Light. At that time "computers" were women and the computing that they were doing was of the following sort:

 

 

 

 

 

 

 

Courtesy: Wikipedia

Because the work was tedious and time-consuming, there was great interest in "automating" the computations. "ENIAC (Electronic Numerical Integrator and Computer) was the first electronic general-purpose computer. [Announced in 1946], it was… designed to calculate artillery firing tables for the United States Army's Ballistic Research Laboratory." The women "computers," who had performed these calculations during the war became the first programmers of ENIAC.

As you can see from the formula above, the variables (data) needed to perform these calculations are all numeric and consist of distances, altitudes, angles, velocities and mass. These were the data with which the designers of ENIAC were concerned. Because each type of calculation was performed by a separate program, the context was not in doubt and the output was expressed in the appropriate units.

In order to perform calculations, the computer must store the program itself, the input variables, any intermediate results that accumulate and the final results.

Two design questions must be answered:

1.) What electromechanical devices are available that could be used to build a computer capable of performing these functions?

2.) How are each of the data items and program instructions to be represented while the program is running, given the choice of devices?

The electrical components that were available initially were relays. A relay is an electromagnetically controlled switch that can be either open or closed and it can remain in that state until changed. Later, the relays were replaced by vacuum tubes and then by transistors, all functioning as switches. Switches, being either open or closed lend themselves to represent things expressed in binary notation (base 2). Numbers can be represented as a sequence of binary digits (switches aka bits). Letters can each be assigned a numeric code (such as ASCII) and so can each of the instructions that the computer is capable of carrying out.

The second question relates to the number of bits (switches) necessary to represent each number with sufficient precision and how many will be needed to store the program instructions themselves. The number of bits that are needed to accommodate the largest number that is expected will determine the size of a computer word. Today bits are usually manipulated in groups (8 bits = one byte; 8 bytes = one 64-bit word). The total number of words required to hold the program and the data determines the minimum amount of storage (memory) that is required.

There was a strong incentive to make both the word size and the memory size small. According to "Electronic Computers within the Ordnance Corps":
"War circumstances had made it imperative to construct the ENIAC out of conventional electronic circuits and elements with a minimum of redesign. This fact, together with the requirements for capacity, speed, and accuracy, led to an extremely large machine. Its 30 separate units, plus power supply and forced air cooling, weighed over thirty tons and occupied 1,800 square feet. Its 19,000 vacuum tubes (16 different types), 1,500 relays, and many thousands of resistors, capacitors, and inductors consumed about 174 kilowatts of electrical power.

"The design of the Memory System itself had to take into consideration the characteristics already built into the ENIAC. They were not particularly adaptable to the new techniques that had been built up since the development and design of the ENIAC. The language problem was of paramount importance. The ENIAC used a pulse-count code so that, to represent a single decimal digit, ten signals were applied to a single wire. Therefore, to store 100 words of 10 decimal digits, plus its sign, 9,100 bits of storage would be required. Economy dictated that code converters should be used to translate the pulse-count code of the ENIAC to a four-bit binary expression to represent a single decimal digit. This code conversion reduced the number of storage bits from 9,100 to 4,100."

ENIAC eventually cost $486,804.

From this history, one can appreciate the extent to which the design of today's "modern" computers has been influenced by workaround and compromise - workarounds to achieve the desired functionality and save power and compromises to save money and how work would be divided between the computer and people.

A computer today costs 1,000 times less than ENIAC and typically has 64,000,000,000 bits of memory - 7 million times as much as ENIAC's original 9100 bits. ENIAC had no disk. A typical modern computer, instead of occupying 1800 ft², consists of a single 263 mm² chip. An ENIAC would require a chip only 0.07 x 0.07 mm. Today 4 terabytes of disk costs roughly $150 and 4 terabytes of computer memory can be had for less than $30,000.

In spite of this explosion in hardware capacity, we are still using the memory of the computer and languages that manipulate it as if the problems we have to solve today are ballistic trajectories. Instead of allocating memory in units large enough to accommodate entire notes or web pages, memory is still segmented into small chunks designed to each of which, it is still assumed, will hold a single number or character. Reflecting this arrangement of the memory, the majority of "data types" that are predefined in popular programming languages are numeric types. With workarounds, these types can be suitable for banking and retailing but they impose near-fatal restrictions on the ability to faithfully represent and manipulate medical information.

This raises the interesting question: If today's commodity computers are ill-suited to the needs of medical record capture and processing, why continue to use them? Are there other options?

Two examples are sufficient to demonstrate that there are other possible approaches. The first example is the advanced video cards found in most personal computers. Driven by the needs of "gamers," each video card (there may be more than one) contains a very high speed computer called a Graphics Processing Unit (GPU) with an internal architecture that differs radically from that of the main CPU. The entire arrangement of a GPU is optimized to render images at high speed in 3D and move them around the screen efficiently. These chips are so powerful that they now are used preferentially by hackers (using as many as 25 running in parallel) who steal and decrypt passwords.

The second example is best described by a quote from a 1992 paper - "Text Retrieval with the TRW Fast Data Finder" by Matt Mettler: "TRW has been building high performance text processing and retrieval systems for a number of years. Most of these systems have involved the application of the TRW Fast Data Finder (FDF) text search hardware and have been designed to meet the requirements of specific government customers." … "Our experience… has left us encouraged about the ability of a text scanning approach to be competitive with the more [traditional] information retrieval techniques."

I personally witnessed an application in which the FDF was fed news stories about the activities of South American guerrilla groups, extracted information about their activities (blowing things up), and plotted them on a map making it easy to predict where they would strike next. While not much has been heard about the FDF lately, Text Retrieval continues to receive attention by the National Institute of Standards. I think it is safe to assume that processors akin to the FDF are being developed and used by the NSA in the PRISM project.

The important points are:

1. Computer processors with internal architectures that are optimized for healthcare information are possible.

2. Healthcare reform is costing the government and the private sector untold billions. The success of the effort is highly dependent on computers which have, so far, been a major disappointment.

3 .If gaming and national security deserve, and have gotten, specially designed processors and software. Healthcare deserves them as well. 

Next time, more on data types.