“The essence and raison d’etre of communication is the creation of redundancy, meaning, pattern, predictability, information, and/or the reduction of the random by ‘restraint.’”
This is part 2 in a sequence of posts exploring the ideas in Gregory Bateson’s Steps to an Ecology of Mind (start here).
My work as a software engineer consists almost entirely of communication, in one sense or another. I not only translate my ideas into functioning code, a painstaking process of telling the computer exactly what to do, I also communicate on a daily basis with designers, product managers, and other engineers. Since I am a systematizer, I take pleasure in deeply understanding the processes that I engage in – in this case person-to-person communication.
I started thinking seriously about the mechanics of interpersonal communication three or four years ago. At the time, my communication style was reasonably effective, if somewhat inconsistent. I’ve since been able to refine my natural style into a repeatable high-bandwidth process. I now do less throw-away work and enjoy higher productivity and more satisfying work relationships. My personal relationships have benefited in a similar, though more subtle way. This kind of improvement is available to anyone willing to take on a few simple models, which I’ll attempt to lay out in this post with the help of some ideas from Bateson.
Some Features of Communication
It’s a familiar trope that a written message may be intelligible even after some of its letters have been scrambled. Yet scrambling all the letters of a sentence renders it unreadable.
Written language is remarkably versatile: an unlimited number of messages might be expressed by stringing together the letters of the alphabet. Most such messages are meaningless, while some make sense to some readers and not to others. (I cannot understand messages written in Finnish, for example, though I might recognize the language and I can certainly guess that it contains some content. A more extreme example is the Voynich Manuscript pictured above, a mysterious 15th century book which has yet to be translated.)
Intuitively we think of communication as a transmission of information and emotional state from one mind to another, and as a rule we speak of communication in terms of spatial metaphors (of a message “getting across” or “coming through”). Bateson, in the quote at the top of the post and elsewhere, considered the kind of transmission that happens during communication as a kind of mutual restraint: when I speak of pink elephants your mind is restrained toward certain thoughts and away from others.
I will say more about “restraint” later, and still more in a future post. For now it suffices to imagine communication consisting of a sharing of some internal structure (an idea or a thought) that was previously unshared, though as we will see this is a limited view of the phenomenon. Effective communication in particular has a much richer structure than mere one-way transmission.
Before we can solve these puzzles of how communication works we must first assemble some building blocks, those of redundancy and entropy.
Redundancy and Mutual Information
The world and our everyday lives are filled with semi-predictable patterns and order: in written English the letter C is followed by K more than you would expect by chance; the visible part of a tree is accompanied reliably by an invisible root system underground; when I say that it is raining it is more likely than usual that there are in fact raindrops falling from the sky. This extends in a literal way even to explanations of physical phenomena:
“... verbal description is often iconic in its larger structure. A scientist describing an earthworm might start at the head end and work down its length – thus producing a description iconic in its sequence and elongation.”
Bateson talks about redundancy as a correspondence between entities that are separated (in time or space, or in some other characteristic) yet share patterns, such that knowing about one entity allows one to guess – with better than random chance – something about the other. The technical term for this is mutual information.
Bateson denotes this separation typographically with a slash mark. The examples in “Style, Grace, and Information in Primitive Art” would be rendered as something like the following:
(part of an English sentence / the syntactic structure of the remainder of the sentence)
(visible part of a tree / roots below ground)
(arc of a circle / the position of the remainder of the circle)
In each case the left restrains the possibilities that may appear on the right: knowing part of the message restrains the remaining part. “Knowing” itself consists of this same kind of redundancy: one’s models of the world, if they are good models, place restraints on the experiences one should anticipate. Prior experiences restrain future experiences, etc.
I normally think of these restraints rather as constraints, and indeed the words are near synonyms, yet I think the connotations of the former are sufficiently evocative to justify its use.
These regularities lie at the heart of how we make sense of the world and impose order upon it:
“... the essential notion in all sorting is that some difference shall cause some other difference at a later time. If we are sorting black balls from white balls, or large balls from small balls, a difference among the balls is to be followed by a difference in their location.”
Further, internal redundancies allow us to distinguish between message material and background environment, a non-trivial problem that we nevertheless accomplish with little effort in most situations.
A modern example exploiting the difficulty of this problem is the CAPTCHA, in which a computationally difficult challenge is easy to generate – that this kind of asymmetry is ubiquious is an idea I will return to in the 4th post in this sequence.
We normally think of message material as a sequence or stream of symbols or values, distinguished from the background “noise” in part by its regular structure, that is by some redundancy in the symbols. We can understand our conversational partner’s utterances even in a noisy room because we are attuned to the quality and modulation of their voice, and to an extent because we know what they’re talking about – they predictably follow certain conventions such as Grice’s Maxims. Bateson claims that this distinction between message and background is arbitrary, that “the regularity called signal/noise ratio is really only a special case of redundancy.”
It’s worth making a subtle distinction: a message typically contains both internal redundancy and redundancy with a body of background knowledge. Both kinds of redundancy are necessary to identify and interpret the message. We recognize the letters in the CAPTCHA image above both by virtue of (1) our prior experience of seeing hundreds of thousands of letters and words and (2) the fairly consistent size, coloring, and style of the letters in the image.
Entropy: Count All the Permutations
“Daughter: Daddy, why do things get in a muddle?
Father: What do you mean? Things? Muddle?
Daughter: Well, people spend a lot of time tidying things, but they never seem to spend time muddling them. Things just seem to get in a muddle by themselves.”
For an intuitive understanding of what is special about redundancy and patterns it’s helpful to think about their opposite. What do we mean by unpatternedness, noise, randomness?
Let’s start with a contrived thought experiment. Put a large and equal number of black and white marbles into a bag and shake it up. Then, without looking, pull out 100 marbles from the bag. How surprised would you be if these randomly chosen marbles were all black?
Randomness, muddledness, disorder – this is the default. This is what you should expect to see in the absence of some ordering principle. And this is true simply because of all the equally probable configurations of a system, only a small proportion would be considered “ordered.” Most of them would be characterized as disordered instead.
This point bears repeating: orderedness is a property of a particular person as much as of a configuration of objects. My desk is in order because I’ve declared where every item should be, and if I reorganized your desk according to my sense of order it’s unlikely you’d agree. Ordered states are thus privileged, singled out as special; disorder is everything else. When the number of possible configurations is large, ordered states are necessarily rare.
In the above example, there is only one way to choose 100 black marbles, while there are over 1029 ways of choosing 50 of each (to say nothing of the almost-as-likely 49-and-51 split, 48-and-52, etc.).
This is the principle of maximum entropy: when there are N possibilities and we have no reason to privilege any one possibility over the others, the probability of each possibility is 1/N. In practice patterns often do emerge, particularly when autonomous agents such as animals or people are involved. When dealing with autonomous agents it’s rare that an action is truly independent of adjacent similar actions, and we can easily be taken in by this intuition.
The ideas of redundancy and entropy as fundamentals of information theory do not originate with Bateson, though the areas of interest to which he applied them are perhaps more broad than those of his contemporaries. For a technical foundation see Claude Shannon’s seminal paper A Mathematical Theory of Communication. For a casual introduction see the videos What is NOT Random? and What is Random?
Meaning, Synecdoche, Digitalization
Now we can talk about what meaning is and how it emerges from patterns and messages. In short, “meaning” consists of a redundancy between message material and environment. The message (“it is raining”) corresponds to the state of the world (it is raining) and enables the hearer to make better-than-random predictions.
(It’s worth noting that the word “meaning” has multiple connotations, and the sense I’m focusing on is that of signification rather than purpose or direction.)
Bateson speculates how this kind of communication could have developed:
- The environment contains patterns in certain features that are important to survival
- Animals come to exhibit conspicuous behavior that correlates with these environmental features (which becomes a signal about the environment)
- Which then become a useful source of information for those navigating the environment.
- Initially these behaviors are more frequent or intense when the environmental cause is more pronounced, and this correspondence is roughly linear. However, once animals (typically those of the same species, but not necessarily) learn to recognize these behaviors as signals, this linear, analog relationship may break down and be replaced with a discrete, digital relationship.
This last step deserves some elaboration: signals undergo pressure towards clarity (a distinctness from baseline behavior, i.e. a high signal-to-noise ratio), and this is true particularly when the signaling animal derives some benefit from their signal being recognized. Think of this as co-evolving cognitive modules – not necessarily specific anatomical parts of the brain, but modules on a functional level – of recognition and signaling behavior. Each module evolves to better fit the other.
One particularly effective way to boost the signal-to-noise ratio is to simplify the signal. Complex behavior becomes simpler, more legible, and more iconic. The simple version of the signal manages to trigger the now sophisticated recognition module in observers and in this way suggests the original more complex signal. It becomes a synecdoche, a part that represents the whole. The new simpler signal may also be cheaper to produce: there is a tradeoff between clarity and volume.
Sharing Complex Models
“... knowledge is all sort of knitted together, or woven, like cloth, and each piece of knowledge is only meaningful or useful because of other pieces”
When looking at instances of human communication – written language, say – we see hierarchical structures start to form.
“The word is the context of the phoneme. But the word only exists as such – only has ‘meaning’ – in the larger context of the utterance, which again has meaning only in a relationship.
This hierarchy of contexts within contexts is universal for the communicational (or “emic”) aspect of phenomena...”
We seem to be wired for detecting and generating repetitions, particularly in the realm of sound. Even non-musical speech begins to take on song-like qualities when repeated, an effect known as the Speech-to-Song Illusion (illustrated here). The music that we create has repeating structures, a kind of self-similarity, sometimes quite complex – see Martin Wattenberg’s Shape of Song project.
You might be thinking to yourself, since we have the capacity to encode arbitrary ideas into the hierarchical structure of language (and music, and other media), surely communication consists of taking an idea that exists in my head, encoding it into a message consisting of spoken or written language, at which point the target listener or reader decodes the message into a matching idea in their head. We’re just generating the redundancies of (sender’s idea / message content) and (message content / receiver’s idea). The mechanics of communication are as simple as that!
Not so fast.
There is a grain of truth to this one-way transmission model, and indeed communication often takes this form, but I will argue that three practical problems fundamental to the enterprise of effective communication tend to arise.
Even assuming both interlocutors share a common language, the first problem is that there is often a mismatch in background knowledge. This is particularly common when one speaker is a professional communicating an idea from their field, perhaps a doctor giving a diagnosis to their patient. Indeed, people who spend a lot of time talking about a narrow field of inquiry naturally develop terms of art (or jargon) to quickly refer to commonly occuring ideas. This can be especially problematic when a word that exists in the vernacular takes on a different connotation or denotation when used as a term of art.
The second problem is the challenge of tuning the message to the target. The sender may assume that the target has as much background knowledge as she does, and therefore encodes the message as compactly as possible. If the target doesn’t in fact have the necessary background knowledge to decode the message, the communication will fail, or worse it will succeed but will be corrupted with misimpressions. The opposite problem is also less than ideal. If the sender includes all background knowledge that might possibly be needed, this constitutes a gross inefficiency (and may be taken as condescending or simply boring).
The third problem is a question of verification. How can the sender know that the message has been received and understood? In general the sender can’t even ask, “did you understand?” since the target may not be willing or able to say that they did not. This is of course a problem in teaching, one usually “solved” by administering tests, which Bateson likens to target practice:
“... if you throw stones at two pieces of paper from the same distance and you find that you hit one piece more often than the other, then probably the one that you hit most will be bigger than the other. In the same way, in an examination you throw a lot of questions at the students, and if you find that you hit more pieces of knowledge in one student than in the others, then you think that student must know more.”
In practice even one-way communication can be reasonably effective (as, e.g., I hope this post is) because the sender can make reasonable guesses as to the target’s background knowledge. However, I suggest that there is a considerably more efficient way to communicate, assuming that both interlocutors are willing to cooperate.
By analogy, consider the game of 20 Questions. If I think of practically any object or person, the right set of 20 or fewer yes/no questions will take you to the answer, without my having to directly reveal what I’m thinking of! The crucial limitation of this method, however, is that it does you no good to think of 20 questions right off the bat – you’ll only come up with the wrong questions. Rather you should ask a question whose answer roughly halves the number of possible objects, and only on hearing the answer should you think of the next question.
It’s worth pondering to what extent all communication suffers from this same limitation.
Communicating effectively is necessarily a two-way process, one that involves building a model of the other person and their understanding, identifying and surfacing mismatched premises, and probing at the boundaries of your models by asking questions with high expected value-of-information. Programmers may recognize a similar process, complete with tight feedback loop, in the way they debug their code.
Even communication processes that look one-way are usually two-way or multi-way at least in some respect: there is a degree of feedback, but it is typically slower than face-to-face conversation. Blogs have comment sections. Newspapers have subscribers and letters to the editor. Websites of all descriptions have tracking and analytics. Political parties in democracies have constituencies that vote and special interests that donate money. Prices in a market economy carry information about the scarcity and desirability of the thing priced and respond to changes in these quantities.
Feedback is of vital importance to the smooth operation of optimization processes, and bidirectional communication is what external feedback looks like. The next two posts will cover internal feedback mechanisms.
 Gregory Bateson, Steps to an Ecology of Mind, “Style, Grace, and Information in Primitive Art,” p. 131
 Steps, “Style, Grace, and Information in Primitive Art,” p. 133
 Bateson uses his framework of redundancy and correspondence to talk about the difficulty of understanding art from a viewpoint outside of the culture that produced it:
“Poetry is not a sort of distorted and decorated prose, but rather prose is poetry which has been stripped down and pinned to a Procrustean bed of logic. The computer men who would program the translation of languages sometimes forget this fact about the primary nature of language. To try to construct a machine to translate the art of one culture into the art of another would be equally silly.”
Art is a particularly complex and important kind of communication, and I’ll return to Bateson’s take on art as a corrective force in the final post of this sequence.
 Steps, Introduction, pp. xxx – xxxi
 Steps, “Redundancy and Coding” p. 420
 Steps, “Why Do Things Get in a Muddle?” p. 3
 Bateson claims that living things, and cybernetic systems in general, tend to render solutions intractable when introduced into a problem. We’ll explore this in some detail in the next post.
 Argument paraphrased from Steps, “Redundancy and Coding,” p. 423.
 Here the evolution may occur on a learning timescale or on a genetic timescale, or both. See Steps, “The Role of Somatic Change in Evolution” for a discussion on how these two effects interplay.
 Steps, “How Much Do You Know?” p. 21
 Steps, “Cybernetic Explanation” p. 408
 Steps, “How Much Do You Know?” p. 24
 I experienced a shift in my thinking on this topic about two years ago after reading this excellent post on entropy, which uses 20 Questions as a thought experiment. Not long thereafter I put together a short talk on confirmation bias and communication in which I presented an early version of this argument (slides here). Naturally I resonated with Bateson’s use of 20 Questions in the metalogue “How Much Do You Know?”
 Typically by noticing confusion. See Eliezer Yudkowsky’s Your Strength as a Rationalist:
“Your strength as a rationalist is your ability to be more confused by fiction than by reality. If you are equally good at explaining any outcome, you have zero knowledge.”