[This is a guest post by Conal Boyce]

The following was drafted as an Appendix to a project whose working title is "The Emperor's New Information" (after Penrose, The Emperor's New Mind). It's still a work-in-progress, so feedback would be welcome. For example: Are the two examples persuasive? Do they need technical clarification or correction? Have others at LL noticed how certain authors "who should know better" use the term information where data is dictated by the context, or employ the two terms at random, as if they were synonyms?

For your amusement, here is 'Appendix P' to a manuscript that I've been working at, on and off, over a five-year period. The MS is motivated by the phenomenon of biologists and physicists who are unable to grasp the difference between data and information. For example, Richard Dawkins (The Blind Watchmaker, 1987) and Antoine Danchin (La barque de Delphes, 1998, tr. to English 2002) are both computer savvy, even proficient as programmers, yet they remain oddly clueless about the data / information distinction, and both sing the praises of DNA as a kind of 'information technology' (when DNA is really just a big blob of mindless data in my opinion — a concept that both Dawkins and Danchin would agree with, yet their use of terminology clashes with it). Even more annoying are the astrophysicists such as Hawking and Susskind, who yammer about 'information lost in a black hole', an absurd idea for anyone who actually knows what information is. What Hawking and Susskind are actually fretting about is the possibility of 'mass lost in a black hole', i.e., that Conservation of Mass might be violated, so they should just say so!

Appendix P (for 'Polish'), which follows below, is one of several examples where I try to make it clear, even for a biologist or astrophysicist set in his ways, that there is a profound gulf separating data from information. And the 'punch line' of the example makes use of Victor Mair's 1990 translation of the Tao Te Ching, so I thought I'd pass along a copy of it for your amusement. (Someday I'll get the MS itself cleaned up and try submitting it to a journal somewhere — not in the realm of biology or physics, where it would be offensive, but perhaps in a computer science or data processing journal.)

Appendix P to: The Emperor’s New Information

Consider a string of zeroes and ones that starts and ends as follows:

01000100 01110010 01101111 01100111 01101001 00101100 00100000 …

01110111 01101001 01100101 01100011 01111010 01101110 01111001 01101101.

Then a series of hex values that looks like this:

44 72 6F 67 69 2C 20 6B 74 A2 72 79 6D 69 20 6D 6F BE 6E 61 20 63 68 6F 64 7A 69 86 2C 20

6E 69 65 20 73 A5 20 64 72 6F 67 A5 20 77 69 65 63 7A 6E A5

49 6D 69 6F 6E 61 2C 20 6B 74 A2 72 65 20 6D 6F BE 6E 61 20 6E 61 7A 77 61 86 2C 20

6E 69 65 20 73 A5 20 69 6D 69 65 6E 69 65 6D 20 77 69 65 63 7A 6E 79 6D.

And then a text that looks like this:

Drogi, którymi można chodzić,

nie są drogą wieczną

Imiona, które można nazwać,

nie są imieniem wiecznym

I trust that most readers will agree that the string 01000100…01101101 is only data, something comparable to the dits and dahs of Morse Code; not YET information. What about the string ‘44 72 6F 67 69 2C 20 … 77 69 65 63 7A 6E 79 6D’: is that information? Only in the very rudimentary sense that it reflects knowledge of how hex and binary relate to one another. (For example, binary 01000100 translates to ASCII code 44 hex, which later becomes the ‘D’ at the beginning of the text; binary 01110010 translates to 72 hex, which later becomes the ‘r’ in ‘Drogi’; and from ASCII 2C we obtain the comma after ‘Drogi’; and so on.) By performing all the substitutions, we arrive at the sixteen-word text shown above. At that point, do we have some information at last?

Let’s take this a step at a time. If one reads Polish, one will glean the following from the first half of the text: “The ways that can be walked are not the eternal way.” Fine. But is this practical advice or something philosophical? It sounds philosophical. Does the reader know why it sounds philosophical, and where it actually originates? Some readers will recognize that line as a Polish rendition of the first six characters shown here…

Dào kě dào,

fēi cháng dào.

Míng kě míng,

fēi cháng míng.

道可道,非常道. 名可名, 非常名.



…from the Dào Dé Jīng 道德經. Some readers might even suspect that the Polish rendition shows the influence of page 59 in Mair (Tao Te Ching, 1990). (And in fact, that is the genesis of the Polish text: I produced it by plugging two lines of Mair’s translation into Google Translate; then I rewrote the Polish in ASCII hex, then translated the hex to binary.)

Let’s step back and appreciate how many steps are involved in getting from the data to the information: First, one must have a suspicion that the string of zeroes and ones is the binary translation of some ASCII codes. Next, one must know how to get from ASCII to the Central European Alphabet (which includes exotica such as ‘lower case a with ogonek’). Call these two steps rudimentary if you like, but neither can they be avoided. Next, one must either know Polish or recognize the text as ‘something like Polish’ so that one can get it translated to one’s own native tongue. But is the message just that the ways that can be walked are not the eternal way? Without a philosophical interpretation, that is near gibberish; not yet good information, not yet the message intended. For this particular message (01000100…01101101), the information ‘payload’ depends on the reader already knowing what the Dào Dé Jīng 道德經 is. Just “knowing Polish” is not enough to get the message.

The little story above may sound contrived and convoluted. Well, it is slightly contrived, but data and information often relate to one another in ways that are nearly this complex. The salient point is that there is no such thing as information just floating in a vacuum. A sentient being needs to ‘observe’ the data (shades of Berkeley and the tree falling down), and this sentient being must also bring with her a context into which to place the data. Then and only then does actual information come into play. The information step always involves the contribution of some such ‘outside’ element which will bring the dead data to life.

To further illustrate the point about the need for context, I will follow up with a seemingly very simple example, call it the minimal or ‘paradigm’ case:

01000111 01101111 00100001b

47 6F 21h

The binary (b) and hex (h) digits above decode to the Roman letters ‘G’ and ‘o’ followed by an exclamation mark, which is to say: ‘Go!’ At the trivial level, we may say that the information conveyed by ‘Go!’ is the imperative form of the verb ‘to go’. Could ‘Go!’ convey something else? Yes. Given a bit of context, the information content of ‘Go!’ could be: “Hurry up, children. It’s time to go! Otherwise, you might miss the school bus.” Or, in another context, the information content of ‘Go!’ could be the following: A military commander is ordering a pilot to take off from Tinian and drop a bomb on Hiroshima (or, to update the example, a commander orders a technician to launch a drone that will, collaterally, kill women and children). Which of these three instances of information (one trivial, one nontrivial, and one rather horrifying) is conveyed by ‘47 6F 21h’? Surely even a physicist (Susskind) who frets about what gets lost in a black hole (or biologists such as Dawkins and Danchin who glibly sing the praises of a supposed ‘information technology’ inherent in DNA), should be able to see that none of our three instances of information is conveyed by ‘47 6F 21’. That’s because ‘47 6F 21’ is just six digits of data. This has been a long way saying: There really is such a thing as a data / information distinction.

