A Learning Map

Mike on Sun Sep 30 2018

This is the fifth post in the Gregory Bateson sequence (one, two, three, four), in which I explore the possibility of learning more than you ever thought you could faster than you ever thought possible.

A modern parable: in the late ’70s everyone knew that there were fundamental limits to human memory and learning. It was impossible, for example, when memorizing sequences of spoken numbers – given one digit per second – to go beyond 10 digits at a time. When Carnegie Mellon researcher K. Anders Ericsson found an old, neglected study that suggested it was possible to push past this limit, if only marginally and with intense practice, he set out to replicate the result and understand how it could be done.

Steve, the undergraduate Ericsson had recruited to the study, struggled with the challenge for several days and made little progress. At the end of the fourth session Steve felt that he’d hit a wall at nine digits. Yet in the fifth session of steady, focused practice that Ericsson made sure was neither too easy nor too hard, Steve pushed past nine digits and successfully recalled 11. Two years later at the end of the study Steve reached 82 digits, many times more than anyone had thought possible.

Steve’s achievement is remarkable, to be sure. But the lesson I take from the parable is a prosaic one: that we are often mistaken about limits to human performance, both in general and in our own lives.

Think of a skill – writing, public speaking, coding, cooking, communicating with your partner, dancing. You might assume that you’re about as good as you’ll ever be, that at best you might eke out a few percentage points of improvement. Consider instead that with the right kind of practice you could double your skill or even, like Steve, improve 10-fold. Or, for skills that don’t lend themselves to easy numerical measurement, that you could achieve a completely new degree of mastery and fluidity.

Yet the typical experience is that we hit natural ceilings. It’s common to keep practicing a skill and never get better at it, so much so that we come to expect it. And while I’ve personally experienced occasional leaps in performance approaching that of Steve and his digits, this is unusual. If we’re serious about leveling up and improving our condition and our capacities, we need to find a way to learn from the Steves of the world. To consistently apply the things we learn about learning.

This is the beginning of a sketch of a framework to that end, taking Gregory Bateson’s work on learning and communication as a jumping-off point.

Logical Categories of Learning

Learning, when you think about it, is phenomenally hard. All any of us have to go on is a stream of sensory data that most of us eventually conclude emanates from a corresponding external world. And while this data stream is substantial – estimates of retinal bandwidth are on the order of 10 million bits per second – at any given moment most of the stream is redundant or irrelevant to our task of making sense of and navigating the world.

Much has been written about the human brain’s ability to process, compress, codify, and react to this onslaught of sensory data. There are some interesting puzzles here, such as the problem of shoehorning all that data into the estimated 60–100 bits per second of conscious bandwidth. These questions are beyond the scope of this post. Instead we’ll confine ourselves to the part of the puzzle Bateson was concerned with: that of dividing up the sensory stream into coherent events or contexts delineated by learned context markers.

But we get ahead of ourselves.

First let’s go over Bateson’s logical types of learning[1][2], which I see as an attempt to recast behaviorism, instinct (a construct he derided), learning, and conditioning into a single framework. To paraphrase:

Zero-learning: an unchanging response to a stimulus or context. E.g. reflexes, e.g. sphexish behavior / fixed action patterns.
Learning I: a process that changes the responses of Zero-learning. Habituation, Pavlovian conditioning and operant conditioning, rote learning, behavior extinction.
Learning II: changes in Learning I, having to do with the way the sensory stream is divided up into contexts or with changes in the set of possible actions in a given context. When Learning II leads to faster uptake at the level of Learning I we might call this “learning to learn.” Learning II tends to be stable and self-reinforcing, and this may be suboptimal. The general approach to learning.
Learning III: changes in Learning II, infrequent in adults. Ontological crises, spiritual exploration. May be triggered by psychedelics or psychotherapy[3].
Learning IV: Bateson claims that individual animals and humans do not engage in Learning IV, but that the evolutionary process as a whole engages in learning at this level. The development of oral traditions might qualify as Learning IV, as might the putative singularity.

Terms I like to use for these categories, respectively: reflex (zero), naïve optimization or just “learning” (I), contextualization or analogizing or generalizing (II), self-direction or agency (III), and (perhaps) collective learning (IV).

Bateson points out that this taxonomy hinges on the “conventional assumption that context can be repeated”[4] and that without this assumption all learning collapses to Zero-learning. This is related to the notion of state in reinforcement learning: an agent can make a decision based on the entire history of its experience, or it may over time build up a concise summary of its experience, encapsulated into a state, containing only information relevant to the decisions it has to make. In case you were curious: a system where the future depends only on information contained within the current state, not on details of the past, is said to have the Markov property.

Making decisions based on entire histories is generally infeasible, and the assumption that state-based decisions are useful at all is central to the question of whether learning ever makes sense. This appears to be a valid assumption.

A Simple Model of First-Order Learning

To explore Bateson’s Learning I, let’s consider by way of example a simple, well-defined task: throwing a ball to hit a target.

The task is made up of a large number of small movements that must be executed in quick succession. Each component movement affects the outcome – where the ball ends up and the path it took to get there. Many possible sequences of movements work; most fail. Many families of movement sequences can be made to work, from an underhand toss to hiking the ball between the legs. In every throw the exact timing, force, and even order of these component movements will vary, intentionally or not, and the ball will land closer to or farther from the target as a result.

The possible sequences form a vast space, the successful sequences forming an impossibly tiny subset: a wispy manifold hiding in the bewildering universe of potentialities. Yet somehow we manage to learn skills like this all the time.

From the inside this might feel something like holding an intention (to hit the target, say) and noticing improved performance as you practice repeatedly. You may feel flashes of insight, little ahas, that are pleasurable but quickly fade and become hard to explain as they become automatic and sink to the level of habit. You may notice improvement only in retrospect or not at all.

When throwing a ball overhand, we learn that releasing the ball too late makes it go lower than intended, while releasing too early makes it go too high. This is learned at an intrinsic level, simultaneously with dozens of other “facts” about throwing, yet cannot be learned at all until the variation from throw to throw is stable enough. Nor can it be fully learned until there is some notion of a “right” time to release the ball. Nor can it be learned until a certain sequence of movements is coded as “releasing.”

New information from feedback – via direct interaction with the world – during play, practice, or performance leads to improvements in the compactness and organization of the learner’s representations.[5] Which in turn leads to greater skill and versatility, forming a virtuous cycle. Arguably we're getting ahead of ourselves again, because updating these representations is really more like Learning II.

The improvements come slowly at first, then quickly, then slowly again. An explanation for why exactly this happens will have to wait, but I emphasize this point because this slow-fast-slow pattern informs the learning stages map at the end of this post.

Meta-Learning

So you can hit a target with an overhand throw. Can you hit it throwing underhand? with your non-dominant hand? Can you use your ball-throwing experience to design a trebuchet? We humans seem to be able to transfer our learning from one area into related areas (though sometimes our ability to do this is disappointingly limited, and sometimes we overestimate ourselves and get into trouble). And there’s been a flurry of new research (typically with whimsical names like SNAIL, MAML, and Reptile, and more recently Evolved Policy Gradients) attempting to get algorithms to do the same.

This transfer happens in a few different ways. For a given context we retrieve and execute a sequence of actions, strategy, or policy. This can transfer to a new context if we mistake it for the old, or if we decide to put the contexts in the same bucket. Perhaps we build a mental model of the relevant parts of the world in both contexts and recognize similarities in the models. Or we find an isomorphism between the models. Or our model has parameters that we can tweak until the novel situation feels familiar. This is all Learning II.

Note how sensitive this is to how we categorize or draw boundaries. From the ball-and-target example, we had to build the concept of “releasing” before we had full control over the ball’s trajectory. When we throw underhand we’ll still pay close attention to the timing of the release. Why? After all, there is no thing-in-the-world that corresponds to “releasing” a ball. We do this because dividing and categorizing our sensory perceptions in this way is useful both for immediate performance and for generalization, despite there being no objectively correct way of drawing these boundaries. Bateson explores this in depth, along with the usual themes of control and interdependence, with an example involving “the punctuation of human interaction.”[6]

Fortunately this all happens automatically during the learning process – we don’t have to think about it consciously or in detail. But it does suggest that when we get stuck, sometimes it’s just that we’re drawing the wrong outlines. As I mention above, Learning II tends to be stable and self-reinforcing, and this is not always a good thing. Seeing the world in a specific way changes what you can see.

Flow vs. Deliberate Practice

Mihaly Csikszentmihalyi’s flow has long since entered the vernacular, and it would be hard to find any modern discussion on learning and performance that doesn’t reference the idea or assume it as part of the background. Anders Ericsson’s deliberate practice is less ubiquitous, but is still well known, and is in some ways a reaction to the glorification of the flow state.

In flow, the learner finds the right degree of challenge to match their ability and engages effortlessly, their sense of self melting away; in contrast, the deliberate practitioner continually exerts effort, pushing the edge of their abilities, always striving for the next incremental gain.

Flow is easy and free, and sometimes fleeting and elusive. Deliberate practice is gritty and painful and always available, if you’re willing and have the spoons. Flow can lead to addiction and stagnation. Deliberate practice can lead to burnout.

Thesis, antithesis, synthesis, we’ve all been here before.

So what is the approach to learning that is consistent and effective over the long term, where flow and deliberate practice fall short? I suspect that the answer is different for different people: that some will find Cal Newport’s focus- and willpower-driven Deep Work the best approach, while others will mesh better with something like Alex Boland’s method of attention gardening.

The following is my attempt at answering this question of how to learn the right things efficiently and without burnout. This approach rests on a few assumptions: that learning in productive flow is nearly as fast as deliberate practice when feedback is of high quality; that chronic application of willpower is unsustainable for most people.

Learning I – the acquisition of definite skills and sub-skills – is almost entirely unconscious[7] and rapid progress is possible only in regimes of rich feedback. If effort is to be expended at all, it should be at the level of choosing which activity to practice and which aspects of the activity to attend to. You might call this flow seeking.

The flow seeking approach alone may be sufficient, but it may also lead to arrested development or diminishing returns at abnormally low levels of performance. In this event it is worth stepping up a logical level to focus on Learning II, or the context markers around the activity.

The standard thing to do here is to break complex skills into simpler components. Absolutely do this, but notice that breaking things down is only a specific instance of a larger strategy. I also find it helpful to juxtapose novel pairs of actions, speed them up and slow them down, exchange figure and ground, or combine simple skills into complex ones. One might also choose a new venue, new people to practice with, or play with relaxing constraints or introducing new ones.

A good instructor might direct the learner in these reorientations. A good learner will begin to experiment with such reorientations themselves, choosing the interventions that work well for them, essentially taking on the project of Learning III.

Tl;dr – A Map for Learning

Here’s the 10,000 foot view.

Stage 1. When you’re new, get instruction – a class, tutor, learning partner, or at minimum a video tutorial or text – to bypass initial hurdles and bootstrap the learning loop.
Stage 2. Flow seeking. Choose moderately challenging and varied activities. Play. Continue as long as improvement is fast and the activity itself is intrinsically rewarding. Instruction may be less important here, though without it there is the danger of falling into “bad habits” or lazy flow.
Stage 3. Once improvement slows, switch at least some practice time to a more deliberate approach. Identify bottlenecks, challenge assumptions, revisit fundamentals by breaking them down and building them up again. Get frustrated. Peer interaction or one-on-one instruction becomes increasingly valuable.

Stage 3 is only necessary for difficult skills that require high performance, and some skills have strong enough initial feedback that stage 1 can be skipped. Stages 2 and 3 may alternate. The whole point of stage 3 is to find productive flow again and return to a higher stage 2.

It’s worth emphasizing that the role of the instructor is very different between stage 1 and stage 3. Stage 3 instructors are most effective when exquisitely attuned to the learner’s state, models, strengths, and weaknesses. Stage 1 instructors merely provide information and correct gross misconceptions. Chess prodigies need a teacher to get them interested and a teacher to refine their intuitions and guide them to a higher level. These are rarely the same person.

Take a moment to think of a skill whose development would impact your life. Perhaps you have one in mind already. Got it? Ok, now consider that it might be possible to get much better at this skill, far beyond what you’ve assumed to this point. Next, ask yourself which phase you find yourself in. Are you learning quickly or slowly? What kind of instruction would be most helpful? Finally, take 5 minutes (actually set a timer!) to think of ways you could get richer feedback in your practice.

And let me know how this goes.