Misconceptions about 'Big Data' in Healthcare Can be Risky

July 1, 2013

Those in healthcare get so excited about the data, especially "big data," that they come to believe that data elements have some intrinsic value, but they don't.

Miscommunication can lead to medical errors. Misinterpretation of what was said or written (the data) because of insufficient detail (context) results in faulty assumptions about what was intended. People get so excited about the data, especially "big data," that they come to believe that data elements have some intrinsic value, but they don't.

Without context, data are meaningless. Stated another way, data is not informative to people unless accompanied by context. The typical EHR does not capture and store context. Therefore, the data an EHR collects - that stuff that you are expected to extract from your system and send elsewhere - can never be truly meaningful. Not now, once removed from the applications that collected it, and especially not in the future when the EHR has been replaced. Though the problems caused by inadequate context are of both immediate and long-term practical concern to every practitioner, few study the problem academically.

"Semiotics [is] the study not only of what we refer to as a 'sign' in everyday speech, but of anything which 'stands for' something else. In a semiotic sense, signs take the form of words, [numbers,] images, sounds, gestures and objects. … [Signs] have no intrinsic meaning and only become signs when [people] invest them with meaning. Anything can be a sign as long as long as someone [a person] interprets it as signifying something - referring or standing for something other than itself. It is this meaningful use of signs which is at the heart of"  understanding the difference between data and information, according to Daniel Chandler's 2007 book, "Semiotics: The Basics" (second edition).

Data are those things which a person may choose to use as a sign. Information is what results from the meaningful use of the data. Meaningful does not refer to some abstruse government definition. It refers to the act of ascribing meaning to particular signs, not in a static, absolute way but dynamically, depending on the current context. Signs are to people what data is to a computer: raw material. People can mentally combine data with context, memory, cultural background, and reasoning and derive information. Computers cannot.

An example may help. Some EHRs store what they collect in a relational database where table-like structures are used to store letters, numbers and bits as rows with pre-defined columns. Every column has metadata associated with it including the column name and its type (string, integer, float, etc.) Let's assume that one table holds nurse-taken bedside vital signs (including BP), one contains automated blood pressure readings (including SystolicBP) in the ICU and one records triage events in the ER (capturing vital signs including blood pressure). [Note the column names. The computer doesn't "know" that these may be equivalent.] A practitioner (me, for example) might like a report of all the blood pressures taken on a patient last year. It can be difficult to know what tables to query because the decision depends on information that is difficult to discover or that may not stored explicitly anywhere in the database.

Why can't I simply ask the database to find all the records in any and all tables if they contain blood pressure measurements made on patient number 1345? Why? Because it is not technically (mathematically) possible. The algebraic nature of Structured Query Language (SQL) does not allow a single request to query both the metadata describing tables (to discover the column names) and the values contained within the table's records.

The questions that interest me the most as a consultant - such as "What has been done to and for this patient in the past year?" - cannot be answered by a single query; they require that a program be written. I can't even be completely confident that the person writing that program understands either my needs or the database well enough to produce a report that has not overlooked something important.

Failing to store context is not the only aspect of today's computers that prevent them from storing data in ways that make it meaningful. More on this next week.