# Breaking down understanding: Pythagorean theorem example

What does it mean to understand? In learning and teaching, often we are worried about whether what’s been learned involves true understanding or is just facts and skills that can’t be “applied” or “transferred”.

In Understanding by Design, Wiggins & McTighe present many examples of lack of understanding. One example was a question about the distance between two points on a grid. Let’s say (2,8) and (6,5). Students were assumed to know the Pythagorean theorem: $a^2 + b^2 = c^2$. They could solve the problem by finding the number of units between the points on the x-axis: $|2-6|=4$, finding the number of units between the points of the y-axis: $|8-5|=3$, and then applying Pythagorean theorem: $\sqrt{4^2+3^2} = 5$. But most students could not solve the problem! Their takeaway is that students knew but didn’t understand Pythagorean theorem.

I challenge these students even know the Pythagorean theorem. Is the Pythagorean theorem “$a^2 + b^2 = c^2$“, as we stated, and as most of the students could probably tell you? No!! The statement “$a^2 + b^2 = c^2$” by itself means nothing at all. Let’s do better: “the Pythagorean theorem states that for any right triangle with legs of length $a$ and $b$, and hypotenuse of length $c$, the relation $a^2 + b^2 = c^2$ is true”. Ok, there is a statement that a mathematician and maybe even a logician would be happy with, but students may need some prodding to get to if they knew it at all.

We aren’t done because our audience is students rather than mathematicians. Mathematicians are a slightly crazy type of person who is happy with this statement. If students realized what we’ve just done, they would be appalled–and for good reason. Why? We said “any right triangle…”. That is an infinite class of things. If we made a similar statement like “any New Yorker is rude” or “any Vikings team will not win the Super Bowl”, that would be called ignorant and awful. But the mathematician is comfortable because they have extreme confidence that they can spot any right triangle in any context and say a few definitely true things about it. Now, most of our students are probably able to see a right triangle and say, yes, that is a right triangle, no, that is not a right triangle. But they may not have a fluent perceptual skill of running out in the wild and eagerly seeing right triangles like mathematicians (remember: slightly crazy).

And yet being able to detect right triangles like a boss still isn’t enough! As the student, when sitting down in front of this problem, we enter a very strange place: a grid. We can we do in grid world? We could draw a smiley face with the two points as eyes, color in squares, or make mazes. Creating a right triangle with legs parallel to the axes and hypotenuse that is the line between points (2,8) and (6,5) is just one of countless possibilities. That is legitimately considered an invention when it’s not a practiced skill, and a vast majority students who have never even encountered the idea of creating shapes to support their geometric reasoning are not going to invent it on the spot.  “Find the distance” might spark me to draw a straight between them (“‘the shortest distance between two points…’ wait, do I want the shortest distance?”). But the creation of that triangle, then the detection of the right triangle (since we may have drawn the correct lines without necessarily thinking “triangle”), then the application of Pythagorean are all steps needed to solve the problem.

So boiling down the issue here to a lack of understanding of Pythagorean theorem is, if not wrong, totally unhelpful. Nor is it helpful to say that the students are “failing to apply their knowledge”, or the student just needs to “learn more transferable knowledge”. All those sound like the responses of an obnoxious politician.

There are times when recognizing and pointing to a lack of understanding is a useful communication. There is some pattern to the student’s actions where people who know better can agree they don’t have “understanding”, even if we don’t have a perfect description of what that entails. Call this the gestalt perspective on understanding.

The philosophy here is that we can attempt to break down a lack of understanding into knowledge and skills that are missing. Call this the reductionist perspective. In this case I choose to consider the problem like some kind of environment where the student can perceive things and take actions while applying some of their existing beliefs about the world. I’m not sure it’s an accurate look inside the mind of a student, but I think it helps bridge to ideas like perceptual learning and affordance that we otherwise might not recognize.

# Clickers for the mind

I had taken for granted that feedback is a critical part of learning: it’s information that we use to adjust our performance and incrementally get better. However, Dan Meyer gave an excellent example of when feedback goes wrong. When working through an algebraic equation in a computer program, the student writes a step and the equation turns red: the equation is wrong. They flip a sign and then it’s green: the equations is correct! And yet the student doesn’t understand anything.

I finally have a better framework for thinking about feedback: reinforcement learning, the theory behind animal training, particularly via a technique called clicker training.

In clicker training, animal trainers help an animal associate the “click!” of a clicker (just a small object that makes a click when pressed by the trainer) with positive reinforcement like a small treat for a dog or a fish for a killer whale. The purpose of the clicker is that the trainer can time the click exactly to when the animal performs a correct step. The positive reinforcement is a very powerful way to instill the behavior in the animal, and it works from household pets to performance animals.

Back to our algebra software: the program turns green, click!, positively reinforcing the student’s step. The problem is that we’re reinforcing the wrong action: “keep flipping signs until it’s right”. How Children Fail is a whole book of these kind of training failures in the classroom setting. The author explores how students he’s observed fall into patterns of trying to get to right answer, whether that is saying “I don’t know” or probing for the right answer like our student in the computer program. Anything but doing the hard work of understanding and working out the real problem!

It’s like reinforcing the dog for dragging every item in the house to your lap because those happened to include her fetch stick. Instead, we can break down the actions into a chain of tiny parts, and reinforce these one at a time. The principles of reinforcement learning, which I’ve been reading about (after getting a dog of my own) in the book Don’t Shoot the Dog, tell us how to do this kind of training. Here’s one example from the book:

We were watching a horse being trainer to bow, or kneel on one knee, by a traditional method involving two men and a lot of ropes and whips; the horse under this method is repeatedly forced onto one knee until it learns to go down voluntarily.

I said it didn’t have to be done that way and asserted that I could train a horse to bow without ever touching the animal. (One possibility: Put a red spot on the wall; use food and a marker signal to shape the horse to touch its knee to the spot; then lower the spot gradually to the floor so that to touch it correctly and earn a reinforcer the horse has to kneel.)

This act of shaping is a subtle art. Even training my puppy to sit wasn’t a straightforward procedure — I almost wanted to reach for the ropes and whips after twenty minutes without her ever getting in the right position. In animal training, trainers understand that verbal communication is starting from scratch. The dog has no idea what “sit” means when we start. The math student likewise has no idea what the concept of “equality across the sign” means. (Actually the task is even harder because we don’t know whether the observed behavior of flipping the sign comes from understanding the mistake or from trying all the possibilities. Meanwhile a sit is a sit.)

I believe that successful teaching practices are those that use effective shaping. It applies no matter what perspective you bring to education. Discovery learning shapes using affordances in the learning environment as I’ve talked about before with Portal. Explicit instruction shapes through worked examples that slowly build on previous understanding and have clear points of failure when applying misconceptions like the sign of a unit.

The link to animal training and its behavioral history has been quite surprising to me. Behavioralism has been a relegated branch of psychology, particularly by the cognitive science training that I had. The example of applied behavioralism that I see is in the design of addictive but meaningless games like Cow Clicker, where you are reinforced for clicking an invisible cow. But there’s no question that human minds respond to the same effects and they can be used for good. (Another amusing application: relationships.) There is still more to explain in terms of when concepts are understood versus when we’re just grinding through procedures, but this is where I’m at for now.

# The Silicon Valley vision of education

In a Tim Ferriss Show podcast episode, Peter Diamandis, entrepreneur extraordinaire, answers a listener question: How can we disrupt our education system? I think it’s articulate and representative of the typical “Silicon Valley Vision” for education, so let’s dig into it.

First of all, education’s got a couple different parts. There’s the part of socialization, of getting to know kids, getting to know people, how to be a good citizen, how to interact with people socially. Then there’s the part about learning.

I will stick to the “learning” part, as much as that division is legitimate.

And the challenge with our education system, and you know this, we all know this, is, it is 150 or 200 years old. And it just sucks. I don’t know how else to put it.

I’m not here to talk history either, but I recommend The Invented History of ‘The Factory Model of Education’ to get a richer perspective on the “education is old and broken” talking point.

In any classroom, half the class is bored, the other half of the class is lost, and even the best teachers can only teach to the median. As classroom sizes grow, our ability to provide personalized educations just isn’t happening. So for me, the ability to scale is the use of technology.

I agree with this critique of classroom learning in general. Tutoring, on the other hand, is something like a gold standard in the research community ever since Benjamin Bloom’s 1984 study that tutored students performed at the 98% percentile level(!) of a control group (Bloom’s 2 Sigma Problem). I don’t believe the 98% has quite held up in replication, but I do have a strong belief in the power of personalization.

For better or worse I’m going to base my position on an analogy to medicine. Like the illnesses we see a doctor to treat, the misconceptions, lack of knowledge, or motivational breakdowns that hinder our academic performance are issues in the realm of teachers and schools. At least both occur mostly within our fleshy membrane.

Just like we wouldn’t want to be treated for an illness in a room of dozens of our peers, we would likely benefit from a masterful teacher that could work individually to diagnose our missteps and provide the right “treatment” (maybe an item of knowledge, but perhaps a motivating example, practice maneuver, or perceptual cue) to advance our learning.

You may or may not agree that this is a more desirable state but I think we can all agree that we (the American public school system, or any system of K-12 education) don’t have the resources for anything like this — enough individual attention for all students to learn all the standard curriculum.

The Silicon Valley Vision is that technology-based education can provide education that is not only better than one-on-one human teachers, but can also scale to accommodate every student, up to and including, yes, the poor African villager.

Big goals.

I always ask the question, how do you dematerialize, demonetize, and democratize different systems. In the case of education what I believe is going to happen is that we’re going to develop artificial intelligence systems, AIs, that are using the very best teaching techniques.

Let’s establish some common ground.

First, it’s not clear to me what it means that the AI is “using teaching techniques. Is the AI selecting and sequencing some pre-existing content, or is it actually constructing pedagogic material and enacting the delivery on its own (whether through generated text or Siri voice or even a robot)? The former is more realistic in the near term — for example, it’s the role that Knewton plays for the content of publishers it works with — and seems hinted at by later answers, so let’s stick with that.

Next, I don’t know how these “best” teaching techniques are determined. If these techniques are known, what has stopped us from applying them already?

I’ll give the Silicon Valley Vision the benefit of the doubt here: the “best teaching technique” is highly context dependent, and except perhaps for our imagined individualized 2-sigma teacher, the only practical way to map from context to technique at scale is with automated technology. That leaves us with one question: can technology do that?

An AI can understand a child’s language abilities, their experience, their cognitive capabilities, where they’ve grown up, even know what their experiences are through the days, and give that individual an education that is so personalized and so perfect for their needs in that moment that you couldn’t buy it.

Diamandis starts by enumerating of these contexts for personalization. In our medical analogy this would be like asking for a piece of software we switch on that tells us everything that could be wrong with us. Instead, we have countless scans, tests, and measurements that give hints at what could be going on. Is there reason to believe that the mind is more scrutable? I haven’t seen one.

Our state of the art in learning “diagnostics” is to hand-code the units of knowledge for a particular domain, ask tons of assessment questions, and infer a small amount of information of from each of these about how likely the student knows of each of the units of knowledge. For a typical case, a multiple choice question, the information content is very low — there’s already a 25% chance the student just guessed the right answer — for maybe a minute of the student’s time. That isn’t nearly the information bandwidth that a good teacher achieves, even working with a large class. (Don’t get me wrong, there is cool work that is building domain and student modeling in environments like games or inquiry learning, but the point is that this progress is incredibly slow — for example, a block stacking game that has been individually designed, programmed, and modeled over several years.)

And the beautiful thing about computers and AI is that it can scale at minimum incremental cost. So you can imagine a world in the future in which the son or daughter of a billionaire, or the son or daughter of a poor African villager, have equal access to the best education. We’re seeing that today in knowledge, right, because Larry Page, founder of Google, has access to the same knowledge and information that the poorest person on Google has. It’s a flattening of this capability.

Let’s ignore the issues of access to technology for now, that is, assume our villager does have internet access (uncensored and not prohibitively slow). Do they choose to access the knowledge? When they access the knowledge, do they have the background to understand it, or the means to put the knowledge into action? Sometimes, yes, and the whole project may be worth it for those cases, but when we’re talking about education being solved and done for everyone, there is no precedent here.

So AI for me is the answer to global dematerialized, demonetized, and democratized education. We have to separate learning things from actually socialization and being inspired and so forth. Humans are going to be part of that — always will be — but AI is going to be the way that I learn something. Or an AI can really deliver the information in a way that’s compelling and meaningful. In fact we’re going to have a situation where an AI may be watching my pupilary dilation or how I tilt my head or asking me questions to really understand, did I understand that concept, or was I just faking it by nodding my head. I mean how many times are you speaking to someone and they’re trying to teach you something and you say, “Yeah yeah yeah”, and really in the back of your mind you’re going, “I have no idea what this person just said.” I think education driven by neuroscience and by artificial intelligence will know that you didn’t get it, will back up to the point where you lost the idea, and then bring you step by step so you really do learn these things.

By now our picture in the medical world is rather comical. Imagine an personalized medicine system that, upon checking your vitals and determining the effects of the medication aren’t taking hold, retracts its robotic arm, refills the syringe, and injects you again, over and over, hoping one of these times will work.

If this AI vision doesn’t just mean repeating the instruction at the point of (detected) failure, then is there a map from the context that technology could infer to something “more meaningful” for the student? That’s a challenge for a fully empathetic human who knows the life story of one of their students. Well beyond Turing test level.

I think we’re really going to transform education very quickly. And it’s a huge and critically important part of our society, so as the father of two four-year-olds, I am personally passionate and excited about solving that challenge.

The language of “solving that challenge” sums up what’s most flawed in the Silicon Valley vision of education. There is no “education solved” checkbox. To the extent such a solution is envisioned, it is well beyond the grasp of the foreseeable future in the science of human learning or existing AI-driven technology in the field.

I do think there are tremendous opportunities for technology in education. If our goal is to provide a better personalized education, that means we need to be better at diagnosing and treating deficiencies in knowledge and skills. Just as there has been no disruption of medicine by the use of technology, there won’t be for education. But we can get better practice by practice, and tool by tool.

# How We Learn: Learning Without Thinking

I’m enjoying How We Learn for tying together quite of bit of what I learned during my year in grad school. The effects of spacing (chapter 4), testing (chapter 5), and interleaving (chapter 8, covered earlier) are powerful for learning, but we know a reasonable way to implement all of them: throw everything you want to learn into a spaced repetition system. What’s been most exciting is chapter 9, Learning Without Thinking, which covers perceptual learning.

School education is skewed to verbal and symbolic learning: tests require you to explain your answer or work out steps of math. Perceptional learning changes the focus to visual information. I’ve covered perceptual learning previously in the rather obscure realms of Stepmania and chick sexing, but it applies to almost anything. To see how powerful perception as a component of domain expertise, consider chess. Quoting Carey:

On a good day, a chess grand master can defeat the world’s most advanced supercomputer, and this is no small thing. Every second, the computer can consider more than 200 million possible moves, and draw on a vast array of strategies developed by leading scientists and players. By contrast, a human player–even a grand master–considers about four move sequences per turn in any depth, playing out the likely series of parries and countermoves to follow. That’s four per turn, not per second. Depending on the amount of time allotted for each turn, the computer might search one billion more possibilities than its human opponents. And still, the grand master often wins. How?

He quotes a sketch of an answer from Chase and Simon’s 1973 study of perception in chess, “The superior performance of stronger players derives from the ability of those players to encode the position into larger perceptual chunks, each consisting of a familiar configuration of pieces.”

What does that mean? We don’t have a verbal or symbolic understanding of this ability, eluding the primary mode of computers, education, and–unfortunate for me–blog posts. We see the visual information of the board, and it activates different sizes of “chunks” in our mind. These chunks perhaps roughly correspond to levels of abstraction. A small chunk is that there is a black pawn on g4. A little larger is seeing the king in check. A big, powerful, supercomputer-beating chunk is some kind of dominant offensive pattern that is observed by white’s combination of positions across the board.

…And how do we learn these chunks–in a way that hasn’t translated to the performance and algorithmic sophistication of computer systems? I think we’re still in the early stages of understanding that, but the next stop on my reading list is papers from the Human Perception Lab.

# Miracles through empathy and persistence

For me the first principle of teaching, using John Holt’s metaphor from How Children Fail: “To rescue a man lost in the woods, you must get to where he is.”

I’ve been hearing many stories about very nontraditional “students” who seem lost beyond hope. The Radiolab episode “Juicervose” (covering a story I first heard about from NYT), tells about how an autistic boy used Disney movies to start communicating with his family. After endless watching of movie after movie, repeated time after time, the boy finds the first phrase to reach out. Once his father figures out what’s going on, he takes the role of a Disney character to start really speaking with his son for the first time in years.

Some other examples (for some reason all podcasts): from the same episode, parents spend 900 hours imitating the self-stimulating behaviors of their autistic child before achieving eye contact. In Radiolab’s “Hello”, a woman lives with a dolphin in order to teach it to talk. In This American Life’s “Magic Words”, a couple use improv to speak to their mother who suffers from dementia. In Invisibilia’s “The Secret History of Thoughts”, a boy in a vegetative state is cared for everyday by his father until things start to turn around (this one is a must listen).

In all these case, the lost man is very deep in the woods indeed. For a while, it looks to the searchers like all of the walking in the woods is getting nowhere. They call out his name for the hundredth or the thousandth time, and this time, finally, there’s a response.

I think the principle applies not just to teaching but to self-learning as well. As learners, we must be mindful of where resources assume we are currently in the process. When we practice skills, we must have enormous patience and allow ourselves to slowly work our way forward from wherever we happen to start (instead of comparing ourselves to others).

# How We Learn: Being mixed up

Chapter 8 of How We Learn describes interleaving as a better means of practice. In math education the Saxon textbook is an example. It uses a mix of practice problems, combining everything learned so far, as opposed to typical textbooks where all problems are about one lesson. Not only does this better improve the skills being learned, but students now need to recognize which strategy to use for each problem, and (perhaps as a result) they tend to better apply the skills in other contexts.

It reminds me of my experience with high school math team.  We’d do tests from past years of competitions: 25 questions on a variety of topics. Since these were graded more by participation than percent correct, each person could grow at their own pace. I was able to get through the bulk of tests quickly and spend some time deeply thinking about questions beyond my capacity. One that I always remember is discovering my own approach to trigonometry identities that involved manipulating triangles.

We’d do these tests in the morning and then have people put them up on the board in the afternoon. (Yes, we had two periods of math team plus the regular math class.) I felt this was another benefit – someone around your level could explain a problem as they may have figured it out for the first time. And I would try and fail to explain my homebrew approach to trig.
The chapter also contrasts the conservative and progressive approaches to math education. The progressive supposedly favors conceptual skills like number sense while the conservative builds up from concrete, procedural skills. (I also recommend commentary from Math With Bad Drawings)

The method of learning for math team was strongly in the conservative tradition. We even had a “formula book” that contained formulas that could solve probably 80-90% of the test without much further thought. Sometimes deeper thinking did happen – not just when I will bored and unprepared for trigonometry but when inspired by good question writing that demanded using the material in new ways. (One competition that I highly commend is Mandelbrot.) I don’t think these kind of questions could have been approached without that base of knowledge. Of course the balance is hard to strike: by senior year, some of us were pushing back against being taught with such focus on these more formulaic problems.

# Back and forth on certainty

When first presented with a complex new toy, the typical child explored it intensely, exhibiting a serious face and eyes riveted on the toy. As the child manipulated the toy to discover its properties, the focused concentration continued, punctuated by momentary expressions of surprise, sometimes mixed with joy, as new discoveries were made. Only after exploring the toy for some time did the child begin to play with it, by repetitively acting on it to produce known effects or by incorporating it into a fantasy game.

According to Peter Gray (Free to Learn), this is what learning looks like: a child explores on his own will, acquires knowledge through interaction and observation, then develops skills through imaginative play.

Effective learning research is often at odds with this picture. A recent study finds that we most readily commit things to memory when we have certainty about the causal relationship1. Direct instruction–an authority figure telling a student that step 2 comes after step 1–gives her exactly that. Studies have found that testing is better than rereading or highlighting to learn new material2. Again, the presence of a test question with a single correct answer provides certainty. And when it comes to skill learning, deliberate practice suggests a state of strain and anxiety that doesn’t correspond with play3. We seem to need certainty that the movements or thought patterns we practice are correct.

After all, imagination won’t advance someone through thousands of years of mathematical discovery. And play won’t develop a runner to beat times that have been steadily declining over the years due to evolving techniques and training.

But certainty, perhaps, limits our imagination and motivation. In a study of children’s interaction with an unknown toy, children who were directly taught a particular function of the toy overwhelmingly attended to that aspect, while other children explored the toy to discover its other functions4. If we always choose the path of certainty and efficiency, we train ourselves, like a driver relying on GPS, to expect the next direction. Without it we feel impatiently lost. The child would tell us that we’re in exactly the right place to make our next joyful discovery.

1. http://www.caltech.edu/news/switching-one-shot-learning-brain-46629
2. http://psi.sagepub.com/content/14/1/4
3. http://calnewport.com/blog/2012/04/09/the-father-of-deliberate-practice-disowns-flow/
4. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3369499/

# From mechanistic to environmental learning

An attempt to put my views and main references on learning in ~1 page.

My journey into learning technology started with spaced repetition systems (SRS), particularly Anki. I used it for Chinese, but my inclination was to take this approach to learning to the extreme, to every subject. Optimistically, SRS could be the operating system for learning. Every possible input you wished to learn could be converted into cards, then, voilà, every action for retaining your knowledge will be scheduled and presented to you from the system. Indeed my first learning technology project, Learnstream, was a tool for extracting SRS cards from text and video documents. I also just recently reviewed a friend’s book, Learning Medicine, that promotes this strategy for medical students.

There’s a problem though: the complete picture of learning is not at all described by the forgetting curve upon which spaced repetition systems are built. Consider the dependency of knowledge: reviewing a fact about multiplication (23*42=966) necessarily reviews facts about addition (920+46=966). My time at Carnegie Mellon was exactly what I needed to introduce me to more robust theories of learning and attempts at intelligent tutoring beyond SRS. For example, the Knowledge-Learning-Instruction framework characterizes the kind of knowledge that might work best for SRS, paired associates, contrasting that with other kinds of knowledge (however see my Quora answer for Does spaced repetition work equally well for analytical and factual information?).

I realized that a complete mechanistic model isn’t coming any time soon. And that there are even drawbacks in the attempt to capture that model (see Seeing Like a State, Rendering Learners Legible).

Abandoning the notion that SRS needs a perfect model of memory, I came to think of SRS as creating an environment for learning (blog post: Spaced repetition in natural and artificial learning). Simply put, that environment is a big improvement over Facebook or whatever your default computer addiction is. (Later, however, I wrote about the limitations of SRS as a user interface in Designing learning systems with spaced repetition.)

Alan Kay uses the term environmental learning (“User Interface: A Personal View” (pdf)), invoking some of my favorite authors, Suzuki (Nurtured by Love) and Gallwey (Inner Game of Tennis) — tragically I still haven’t read Montessori. When we step back to think of enabling learning through construction of an environment, we can do much better than SRS. In Getting Beyond Massively Lousy Online Courses I wrote about how games like Portal inspire learning with environmental affordances (a term from The Ecological Approach to Visual Perception) — requiring minimal or no adaptivity. Computer environments for learning have been explored since the 60s starting with Papert (Mindstorms) and continuing today in Bret Victor‘s work, e.g. Learnable Programming.

Unfortunately in this one-page space I don’t think I’m going to get toward describing a full solution, only discrediting a few that some find promising. But let me try to put the challenge in a way that I hope is as stimulating to you as it is to me, while referencing some more of the authors that have guided me:

A math teacher starts her class, day one, with an exciting world of mathematics in her head (for exciting+technical see Dan MeyerLockhart’s LamentSurely You’re Joking Mr. Feynman, Yudkowsky, Gödel Escher Bach): an equation for a parabola describes an infinite wave, she changes the tides with a simple parameter, and she can ride the wave on a tangent surfboard just by taking the derivative. Her students, sitting in rows of desks, see only…her and an empty chalkboard. Slowly she will reveal this environment, using tools like storytelling, visualization, and interactive affordances (along with Papert/Montessori, see Bruner’s modes of learning in Toward a Theory of Instruction, Understanding Comics, Edward Tufte), which promote cognitive mechanisms of perceptual representation and analogy (Vygotsky, Douglas Hofstadter, Dedre Gentner, Michelene Chi, Origin of Concepts). She’ll need to do it in a way that motivates her students to explore that world with her — without it becoming a game of avoiding negative judgment (see Impro and How Children Fail for examples and counterexamples — for me the two most important books about learning — also Tao Te Ching). The students must also engage in repeated practice in order to use the tools of mathematics fluently (Zen in the Art of ArcheryArt of Learning, Ericsson/Outliers/Talent Code/Talent is Overrated, Bruce Lee).

Now imagine the teacher is a computer, the students are mathematicians, and the lesson is on a mathematical discovery not by a human but by said computer/teacher. That’s a peek at my endgame (blog post: Knowledge science = data science + learning science).

# 10x Learning

In Zero to One, Peter Thiel claims that in order to establish a new monopoly (his argument that monopolies are a good thing for both the owner and society is beyond the scope of this post), a startup needs to improve upon an important dimension by 10 times. For example, Google’s search engine was probably 10 times better than other options at the time.

If our domain is education, and we want to establish a better means of learning a particular topic, the claim is that we want a 10x improvement in order to scale out to a majority of our target audience.

Learning rate. In the last post, I claimed that the rate of learning is a plateau-filled slow climb that can crush motivation. Attempts to circumvent this fact, like small wins (which zoom in on the learning graph but can’t trick us forever) and gamification (which is inspired by examples that get to hand-pick their challenges), don’t work. Could we, like the brain downloading programs from the Matrix, improve the learning rate by 10 times? Shorten a plateau that would normally take 10 hours of dedicated learning into one? Perhaps in particularly degenerate cases with very high extraneous cognitive load (like beginning French by reading the original Deleuze), but normally there are just too many bits of knowledge to acquire and synthesize. Even a spaced repetition system that optimizes the order and frequency of review would, by my guess, be at most a 2-3x improvement in the rate. Of course that 2-3x can have a huge impact, so let’s consider other forms of measuring learning.

Time of persistence. If learning something that is entirely voluntary, like a class to improve your home cooking, the amount of time persisted will be a good approximation of how motivating that learning method is. And if “10,000 hours” really is the most important part of mastery, then time of persistence (along with hours of learning per day) is the most important metric of learning.

Of course, I don’t actually believe that number of hours alone is important. I think of some of the podcasts I tried for learning Chinese, where they started and ended with friendly chit-chat in English–clearly that time was not improving my Chinese. Don’t forget that it was originally 10,000 hours of deliberate practice that K. Anders Ericsson claimed produced masters.

Quality of results. In my career field, software engineering, there is much discussion and debate about the “10x engineer”. In any field, one must define what qualifies as 10x quality. 10x engineers have been defined pretty well: their presence in a company increases productivity by 10 times over an average engineer. That is probably a combination of their speed in producing a working system, code that is reliable and easy to maintain, and tools and practices that enable their whole time to be more productive. A more difficult question that we must an answer for learning is what produces a 10x engineer (or other 10x quality role). You can see a deluge of attempts to answer that on Quora. The fact that 10x quality isn’t average suggests that it consists of skills of that may be misunderstood (the benefits of strongly typed languages), counterintuitive (red-green-refactor in test-driven development), unsexy (knowing the ins and outs of the Linux kernel), or have very long plateaus that cause most to drop out (higher order abstractions in Haskell).

Logically we can’t have a scalable system that makes everyone 10x better than average, but if we can train the average person in 10x practices, we have an overall much more productive society.

Quantity of results. Maybe quality is too hard to measure: think about art. But the one who produces 10 times more paintings is probably going to be better (not to mention have more to hang on the wall). Consider this (already cited too often) anecdote from Art and Fear:

The ceramics teacher announced on opening day that he was dividing the class into two groups. All those on the left side of the studio, he said, would be graded solely on the quantity of work they produced, all those on the right solely on its quality. His procedure was simple: on the final day of class he would bring in his bathroom scales and weigh the work of the “quantity” group: fifty pound of pots rated an “A”, forty pounds a “B”, and so on. Those being graded on “quality”, however, needed to produce only one pot — albeit a perfect one — to get an “A”. Well, came grading time and a curious fact emerged: the works of highest quality were all produced by the group being graded for quantity. It seems that while the “quantity” group was busily churning out piles of work – and learning from their mistakes — the “quality” group had sat theorizing about perfection, and in the end had little more to show for their efforts than grandiose theories and a pile of dead clay.

Number of competitive wins. In elite athletics, a major industry goes into shaving fractions of a second (see this in-depth discussion on the minute effects of beet juice). Does the 10x metric still make sense there? It does if you count number of competitions won (or competitive earnings). In high school, I was certainly no athlete, but I was really into math competitions. My high school math team did practice tests on nearly a daily basis and a dozen competitions per year. If you told me I could have the highest score 10 times more often, I would have eagerly given you my (admittedly scant) savings. I was already pretty good, so that would have translated to merely a few extra points per test on average.

I think 10x is a great rallying cry for improving learning experiences, but it’s worth figuring out what is realistic and meaningful for your domain. Once you’ve picked one exploit it as much as possible. If it’s 10x competitive wins, then let your users compete on a daily basis. For 10x quantity of results or time of persistence, encourage a simple repeatable activity and showcase it in a growing gallery (see 180 Websites in 180 Days or Give it 100). If you believe you have the secret to 10x quality, promote the skills that resist learning by finding new ways to practice them or by using expert endorsements to emphasize their importance (see Ramit Sethi’s writing and courses such as Big Wins Manifesto).

I’d love to talk to anyone designing a learning plan–even if it’s just for yourself–to decide which metric to focus on and which strategies to use. Send me an email with your goal!

# Motivation in learning

Suppose you’re designing a learning tool and you want to amp up the motivation. You decide to show a graph of the user’s learning progress. Of course on your awesome learning environment, people will be learning all the time, so it’s going to look like this, right? Users will see that they are getting more and more awesome, they’ll feel awesome, and they’ll come back every day to keep learning.

The problem is, when learning looks like this, the learner is already well aware that they are kicking ass. Your graph is the banner at an election party. Maybe it ties together the scene, but everyone already knows what’s going on.

The reason that motivation is a persistent unsolved problem in education is that learning doesn’t look like that. Learning is filled with plateaus and pits because confusion is the very nature of learning. Learning–in the very best case–looks more like this:

Keep in mind those plateaus can be on the order of months such that we forget what a jump feels like. Which, by the way, happened so quickly and changed our thinking so rapidly that we barely noticed it!

Motivation hackers have countered with the theory of small wins: if we decrease the delay before some kind of reward, we will feel more motivated. But what does that really mean in the big picture–at least when it comes to learning? It means we are zooming into this graph and increasing the number of little upward bumps on the plateau. That is what spaced repetition is good at: keep increasing the frequency of missed items such that the correctness ratio remains around 90%. But our unconscious, in the end, can’t be tricked like that. Once we’re used to spaced repetition, we know that the missed cards are piling up, rather than the new ones we want to get to. We might feel the joy of a small win, but it will be paired with the pain of even more small losses. Moreover, we know that we just aren’t learning that much.

What about games? Given that games are so fun and addictive, many believe they hold the secret to education’s motivation problems. According to Raph Koster’s Theory of Fun, what makes game fun is…wait for it…learning! While games can sometimes teach educators about pacing, game designers have the luxury of not having to include anything with too long of a plateau. They get to choose the domain, but when we discuss learning as a more practical matter, that isn’t possible.

So you want to create instruction a domain and that contains concepts with long plateaus. Your best option has nothing to do with motivation but rather is to improve instruction such that the plateaus are shorter. Beyond that, I’m not too sure. I think it’s part of why “detachment from the illusions of self” is part of shuhari, a Japanese martial arts conception of mastery: one must get over the idea of that they need to be better all the time. In addition, learners need a deeply held belief both that what they are striving for is important (“when are we ever going to use this?”) and that the periods of stagnation are essential to growth. Maybe the graph to show, if you can do it convincingly, is the plateau another learner was on before achieving their next jump. And the cool stuff they did after a certain number of those jumps.