The Silicon Valley vision of education

In a Tim Ferriss Show podcast episode, Peter Diamandis, entrepreneur extraordinaire, answers a listener question: How can we disrupt our education system? I think it’s articulate and representative of the typical “Silicon Valley Vision” for education, so let’s dig into it.

First of all, education’s got a couple different parts. There’s the part of socialization, of getting to know kids, getting to know people, how to be a good citizen, how to interact with people socially. Then there’s the part about learning.

I will stick to the “learning” part, as much as that division is legitimate.

And the challenge with our education system, and you know this, we all know this, is, it is 150 or 200 years old. And it just sucks. I don’t know how else to put it.

I’m not here to talk history either, but I recommend The Invented History of ‘The Factory Model of Education’ to get a richer perspective on the “education is old and broken” talking point.

In any classroom, half the class is bored, the other half of the class is lost, and even the best teachers can only teach to the median. As classroom sizes grow, our ability to provide personalized educations just isn’t happening. So for me, the ability to scale is the use of technology.

I agree with this critique of classroom learning in general. Tutoring, on the other hand, is something like a gold standard in the research community ever since Benjamin Bloom’s 1984 study that tutored students performed at the 98% percentile level(!) of a control group (Bloom’s 2 Sigma Problem). I don’t believe the 98% has quite held up in replication, but I do have a strong belief in the power of personalization.

For better or worse I’m going to base my position on an analogy to medicine. Like the illnesses we see a doctor to treat, the misconceptions, lack of knowledge, or motivational breakdowns that hinder our academic performance are issues in the realm of teachers and schools. At least both occur mostly within our fleshy membrane.

Just like we wouldn’t want to be treated for an illness in a room of dozens of our peers, we would likely benefit from a masterful teacher that could work individually to diagnose our missteps and provide the right “treatment” (maybe an item of knowledge, but perhaps a motivating example, practice maneuver, or perceptual cue) to advance our learning.

You may or may not agree that this is a more desirable state but I think we can all agree that we (the American public school system, or any system of K-12 education) don’t have the resources for anything like this — enough individual attention for all students to learn all the standard curriculum.

The Silicon Valley Vision is that technology-based education can provide education that is not only better than one-on-one human teachers, but can also scale to accommodate every student, up to and including, yes, the poor African villager.

Big goals.

I always ask the question, how do you dematerialize, demonetize, and democratize different systems. In the case of education what I believe is going to happen is that we’re going to develop artificial intelligence systems, AIs, that are using the very best teaching techniques.

Let’s establish some common ground.

First, it’s not clear to me what it means that the AI is “using teaching techniques. Is the AI selecting and sequencing some pre-existing content, or is it actually constructing pedagogic material and enacting the delivery on its own (whether through generated text or Siri voice or even a robot)? The former is more realistic in the near term — for example, it’s the role that Knewton plays for the content of publishers it works with — and seems hinted at by later answers, so let’s stick with that.

Next, I don’t know how these “best” teaching techniques are determined. If these techniques are known, what has stopped us from applying them already?

I’ll give the Silicon Valley Vision the benefit of the doubt here: the “best teaching technique” is highly context dependent, and except perhaps for our imagined individualized 2-sigma teacher, the only practical way to map from context to technique at scale is with automated technology. That leaves us with one question: can technology do that?

An AI can understand a child’s language abilities, their experience, their cognitive capabilities, where they’ve grown up, even know what their experiences are through the days, and give that individual an education that is so personalized and so perfect for their needs in that moment that you couldn’t buy it.

Diamandis starts by enumerating of these contexts for personalization. In our medical analogy this would be like asking for a piece of software we switch on that tells us everything that could be wrong with us. Instead, we have countless scans, tests, and measurements that give hints at what could be going on. Is there reason to believe that the mind is more scrutable? I haven’t seen one.

Our state of the art in learning “diagnostics” is to hand-code the units of knowledge for a particular domain, ask tons of assessment questions, and infer a small amount of information of from each of these about how likely the student knows of each of the units of knowledge. For a typical case, a multiple choice question, the information content is very low — there’s already a 25% chance the student just guessed the right answer — for maybe a minute of the student’s time. That isn’t nearly the information bandwidth that a good teacher achieves, even working with a large class. (Don’t get me wrong, there is cool work that is building domain and student modeling in environments like games or inquiry learning, but the point is that this progress is incredibly slow — for example, a block stacking game that has been individually designed, programmed, and modeled over several years.)

And the beautiful thing about computers and AI is that it can scale at minimum incremental cost. So you can imagine a world in the future in which the son or daughter of a billionaire, or the son or daughter of a poor African villager, have equal access to the best education. We’re seeing that today in knowledge, right, because Larry Page, founder of Google, has access to the same knowledge and information that the poorest person on Google has. It’s a flattening of this capability.

Let’s ignore the issues of access to technology for now, that is, assume our villager does have internet access (uncensored and not prohibitively slow). Do they choose to access the knowledge? When they access the knowledge, do they have the background to understand it, or the means to put the knowledge into action? Sometimes, yes, and the whole project may be worth it for those cases, but when we’re talking about education being solved and done for everyone, there is no precedent here.

So AI for me is the answer to global dematerialized, demonetized, and democratized education. We have to separate learning things from actually socialization and being inspired and so forth. Humans are going to be part of that — always will be — but AI is going to be the way that I learn something. Or an AI can really deliver the information in a way that’s compelling and meaningful. In fact we’re going to have a situation where an AI may be watching my pupilary dilation or how I tilt my head or asking me questions to really understand, did I understand that concept, or was I just faking it by nodding my head. I mean how many times are you speaking to someone and they’re trying to teach you something and you say, “Yeah yeah yeah”, and really in the back of your mind you’re going, “I have no idea what this person just said.” I think education driven by neuroscience and by artificial intelligence will know that you didn’t get it, will back up to the point where you lost the idea, and then bring you step by step so you really do learn these things.

By now our picture in the medical world is rather comical. Imagine an personalized medicine system that, upon checking your vitals and determining the effects of the medication aren’t taking hold, retracts its robotic arm, refills the syringe, and injects you again, over and over, hoping one of these times will work.

If this AI vision doesn’t just mean repeating the instruction at the point of (detected) failure, then is there a map from the context that technology could infer to something “more meaningful” for the student? That’s a challenge for a fully empathetic human who knows the life story of one of their students. Well beyond Turing test level.

I think we’re really going to transform education very quickly. And it’s a huge and critically important part of our society, so as the father of two four-year-olds, I am personally passionate and excited about solving that challenge.

The language of “solving that challenge” sums up what’s most flawed in the Silicon Valley vision of education. There is no “education solved” checkbox. To the extent such a solution is envisioned, it is well beyond the grasp of the foreseeable future in the science of human learning or existing AI-driven technology in the field.

I do think there are tremendous opportunities for technology in education. If our goal is to provide a better personalized education, that means we need to be better at diagnosing and treating deficiencies in knowledge and skills. Just as there has been no disruption of medicine by the use of technology, there won’t be for education. But we can get better practice by practice, and tool by tool.

How We Learn: Learning Without Thinking

I’m enjoying How We Learn for tying together quite of bit of what I learned during my year in grad school. The effects of spacing (chapter 4), testing (chapter 5), and interleaving (chapter 8, covered earlier) are powerful for learning, but we know a reasonable way to implement all of them: throw everything you want to learn into a spaced repetition system. What’s been most exciting is chapter 9, Learning Without Thinking, which covers perceptual learning.

School education is skewed to verbal and symbolic learning: tests require you to explain your answer or work out steps of math. Perceptional learning changes the focus to visual information. I’ve covered perceptual learning previously in the rather obscure realms of Stepmania and chick sexing, but it applies to almost anything. To see how powerful perception as a component of domain expertise, consider chess. Quoting Carey:

On a good day, a chess grand master can defeat the world’s most advanced supercomputer, and this is no small thing. Every second, the computer can consider more than 200 million possible moves, and draw on a vast array of strategies developed by leading scientists and players. By contrast, a human player–even a grand master–considers about four move sequences per turn in any depth, playing out the likely series of parries and countermoves to follow. That’s four per turn, not per second. Depending on the amount of time allotted for each turn, the computer might search one billion more possibilities than its human opponents. And still, the grand master often wins. How?

He quotes a sketch of an answer from Chase and Simon’s 1973 study of perception in chess, “The superior performance of stronger players derives from the ability of those players to encode the position into larger perceptual chunks, each consisting of a familiar configuration of pieces.”

What does that mean? We don’t have a verbal or symbolic understanding of this ability, eluding the primary mode of computers, education, and–unfortunate for me–blog posts. We see the visual information of the board, and it activates different sizes of “chunks” in our mind. These chunks perhaps roughly correspond to levels of abstraction. A small chunk is that there is a black pawn on g4. A little larger is seeing the king in check. A big, powerful, supercomputer-beating chunk is some kind of dominant offensive pattern that is observed by white’s combination of positions across the board.

…And how do we learn these chunks–in a way that hasn’t translated to the performance and algorithmic sophistication of computer systems? I think we’re still in the early stages of understanding that, but the next stop on my reading list is papers from the Human Perception Lab.

Miracles through empathy and persistence

For me the first principle of teaching, using John Holt’s metaphor from How Children Fail: “To rescue a man lost in the woods, you must get to where he is.”

I’ve been hearing many stories about very nontraditional “students” who seem lost beyond hope. The Radiolab episode “Juicervose” (covering a story I first heard about from NYT), tells about how an autistic boy used Disney movies to start communicating with his family. After endless watching of movie after movie, repeated time after time, the boy finds the first phrase to reach out. Once his father figures out what’s going on, he takes the role of a Disney character to start really speaking with his son for the first time in years.

Some other examples (for some reason all podcasts): from the same episode, parents spend 900 hours imitating the self-stimulating behaviors of their autistic child before achieving eye contact. In Radiolab’s “Hello”, a woman lives with a dolphin in order to teach it to talk. In This American Life’s “Magic Words”, a couple use improv to speak to their mother who suffers from dementia. In Invisibilia’s “The Secret History of Thoughts”, a boy in a vegetative state is cared for everyday by his father until things start to turn around (this one is a must listen).

In all these case, the lost man is very deep in the woods indeed. For a while, it looks to the searchers like all of the walking in the woods is getting nowhere. They call out his name for the hundredth or the thousandth time, and this time, finally, there’s a response.

I think the principle applies not just to teaching but to self-learning as well. As learners, we must be mindful of where resources assume we are currently in the process. When we practice skills, we must have enormous patience and allow ourselves to slowly work our way forward from wherever we happen to start (instead of comparing ourselves to others).

How We Learn: Being mixed up

Chapter 8 of How We Learn describes interleaving as a better means of practice. In math education the Saxon textbook is an example. It uses a mix of practice problems, combining everything learned so far, as opposed to typical textbooks where all problems are about one lesson. Not only does this better improve the skills being learned, but students now need to recognize which strategy to use for each problem, and (perhaps as a result) they tend to better apply the skills in other contexts.

It reminds me of my experience with high school math team.  We’d do tests from past years of competitions: 25 questions on a variety of topics. Since these were graded more by participation than percent correct, each person could grow at their own pace. I was able to get through the bulk of tests quickly and spend some time deeply thinking about questions beyond my capacity. One that I always remember is discovering my own approach to trigonometry identities that involved manipulating triangles.

We’d do these tests in the morning and then have people put them up on the board in the afternoon. (Yes, we had two periods of math team plus the regular math class.) I felt this was another benefit – someone around your level could explain a problem as they may have figured it out for the first time. And I would try and fail to explain my homebrew approach to trig.
The chapter also contrasts the conservative and progressive approaches to math education. The progressive supposedly favors conceptual skills like number sense while the conservative builds up from concrete, procedural skills. (I also recommend commentary from Math With Bad Drawings)

The method of learning for math team was strongly in the conservative tradition. We even had a “formula book” that contained formulas that could solve probably 80-90% of the test without much further thought. Sometimes deeper thinking did happen – not just when I will bored and unprepared for trigonometry but when inspired by good question writing that demanded using the material in new ways. (One competition that I highly commend is Mandelbrot.) I don’t think these kind of questions could have been approached without that base of knowledge. Of course the balance is hard to strike: by senior year, some of us were pushing back against being taught with such focus on these more formulaic problems.

Back and forth on certainty

When first presented with a complex new toy, the typical child explored it intensely, exhibiting a serious face and eyes riveted on the toy. As the child manipulated the toy to discover its properties, the focused concentration continued, punctuated by momentary expressions of surprise, sometimes mixed with joy, as new discoveries were made. Only after exploring the toy for some time did the child begin to play with it, by repetitively acting on it to produce known effects or by incorporating it into a fantasy game.

According to Peter Gray (Free to Learn), this is what learning looks like: a child explores on his own will, acquires knowledge through interaction and observation, then develops skills through imaginative play.

Effective learning research is often at odds with this picture. A recent study finds that we most readily commit things to memory when we have certainty about the causal relationship1. Direct instruction–an authority figure telling a student that step 2 comes after step 1–gives her exactly that. Studies have found that testing is better than rereading or highlighting to learn new material2. Again, the presence of a test question with a single correct answer provides certainty. And when it comes to skill learning, deliberate practice suggests a state of strain and anxiety that doesn’t correspond with play3. We seem to need certainty that the movements or thought patterns we practice are correct.

After all, imagination won’t advance someone through thousands of years of mathematical discovery. And play won’t develop a runner to beat times that have been steadily declining over the years due to evolving techniques and training.

But certainty, perhaps, limits our imagination and motivation. In a study of children’s interaction with an unknown toy, children who were directly taught a particular function of the toy overwhelmingly attended to that aspect, while other children explored the toy to discover its other functions4. If we always choose the path of certainty and efficiency, we train ourselves, like a driver relying on GPS, to expect the next direction. Without it we feel impatiently lost. The child would tell us that we’re in exactly the right place to make our next joyful discovery.


From mechanistic to environmental learning

An attempt to put my views and main references on learning in ~1 page.

My journey into learning technology started with spaced repetition systems (SRS), particularly Anki. I used it for Chinese, but my inclination was to take this approach to learning to the extreme, to every subject. Optimistically, SRS could be the operating system for learning. Every possible input you wished to learn could be converted into cards, then, voilà, every action for retaining your knowledge will be scheduled and presented to you from the system. Indeed my first learning technology project, Learnstream, was a tool for extracting SRS cards from text and video documents. I also just recently reviewed a friend’s book, Learning Medicine, that promotes this strategy for medical students.

There’s a problem though: the complete picture of learning is not at all described by the forgetting curve upon which spaced repetition systems are built. Consider the dependency of knowledge: reviewing a fact about multiplication (23*42=966) necessarily reviews facts about addition (920+46=966). My time at Carnegie Mellon was exactly what I needed to introduce me to more robust theories of learning and attempts at intelligent tutoring beyond SRS. For example, the Knowledge-Learning-Instruction framework characterizes the kind of knowledge that might work best for SRS, paired associates, contrasting that with other kinds of knowledge (however see my Quora answer for Does spaced repetition work equally well for analytical and factual information?).

I realized that a complete mechanistic model isn’t coming any time soon. And that there are even drawbacks in the attempt to capture that model (see Seeing Like a State, Rendering Learners Legible).

Abandoning the notion that SRS needs a perfect model of memory, I came to think of SRS as creating an environment for learning (blog post: Spaced repetition in natural and artificial learning). Simply put, that environment is a big improvement over Facebook or whatever your default computer addiction is. (Later, however, I wrote about the limitations of SRS as a user interface in Designing learning systems with spaced repetition.)

Alan Kay uses the term environmental learning (“User Interface: A Personal View” (pdf)), invoking some of my favorite authors, Suzuki (Nurtured by Love) and Gallwey (Inner Game of Tennis) — tragically I still haven’t read Montessori. When we step back to think of enabling learning through construction of an environment, we can do much better than SRS. In Getting Beyond Massively Lousy Online Courses I wrote about how games like Portal inspire learning with environmental affordances (a term from The Ecological Approach to Visual Perception) — requiring minimal or no adaptivity. Computer environments for learning have been explored since the 60s starting with Papert (Mindstorms) and continuing today in Bret Victor‘s work, e.g. Learnable Programming.

Unfortunately in this one-page space I don’t think I’m going to get toward describing a full solution, only discrediting a few that some find promising. But let me try to put the challenge in a way that I hope is as stimulating to you as it is to me, while referencing some more of the authors that have guided me:

A math teacher starts her class, day one, with an exciting world of mathematics in her head (for exciting+technical see Dan MeyerLockhart’s LamentSurely You’re Joking Mr. Feynman, Yudkowsky, Gödel Escher Bach): an equation for a parabola describes an infinite wave, she changes the tides with a simple parameter, and she can ride the wave on a tangent surfboard just by taking the derivative. Her students, sitting in rows of desks, see only…her and an empty chalkboard. Slowly she will reveal this environment, using tools like storytelling, visualization, and interactive affordances (along with Papert/Montessori, see Bruner’s modes of learning in Toward a Theory of Instruction, Understanding Comics, Edward Tufte), which promote cognitive mechanisms of perceptual representation and analogy (Vygotsky, Douglas Hofstadter, Dedre Gentner, Michelene Chi, Origin of Concepts). She’ll need to do it in a way that motivates her students to explore that world with her — without it becoming a game of avoiding negative judgment (see Impro and How Children Fail for examples and counterexamples — for me the two most important books about learning — also Tao Te Ching). The students must also engage in repeated practice in order to use the tools of mathematics fluently (Zen in the Art of ArcheryArt of Learning, Ericsson/Outliers/Talent Code/Talent is Overrated, Bruce Lee).

Now imagine the teacher is a computer, the students are mathematicians, and the lesson is on a mathematical discovery not by a human but by said computer/teacher. That’s a peek at my endgame (blog post: Knowledge science = data science + learning science).

10x Learning

In Zero to One, Peter Thiel claims that in order to establish a new monopoly (his argument that monopolies are a good thing for both the owner and society is beyond the scope of this post), a startup needs to improve upon an important dimension by 10 times. For example, Google’s search engine was probably 10 times better than other options at the time.

If our domain is education, and we want to establish a better means of learning a particular topic, the claim is that we want a 10x improvement in order to scale out to a majority of our target audience.

Learning rate. In the last post, I claimed that the rate of learning is a plateau-filled slow climb that can crush motivation. Attempts to circumvent this fact, like small wins (which zoom in on the learning graph but can’t trick us forever) and gamification (which is inspired by examples that get to hand-pick their challenges), don’t work. Could we, like the brain downloading programs from the Matrix, improve the learning rate by 10 times? Shorten a plateau that would normally take 10 hours of dedicated learning into one? Perhaps in particularly degenerate cases with very high extraneous cognitive load (like beginning French by reading the original Deleuze), but normally there are just too many bits of knowledge to acquire and synthesize. Even a spaced repetition system that optimizes the order and frequency of review would, by my guess, be at most a 2-3x improvement in the rate. Of course that 2-3x can have a huge impact, so let’s consider other forms of measuring learning.

Time of persistence. If learning something that is entirely voluntary, like a class to improve your home cooking, the amount of time persisted will be a good approximation of how motivating that learning method is. And if “10,000 hours” really is the most important part of mastery, then time of persistence (along with hours of learning per day) is the most important metric of learning.

Of course, I don’t actually believe that number of hours alone is important. I think of some of the podcasts I tried for learning Chinese, where they started and ended with friendly chit-chat in English–clearly that time was not improving my Chinese. Don’t forget that it was originally 10,000 hours of deliberate practice that K. Anders Ericsson claimed produced masters.

Quality of results. In my career field, software engineering, there is much discussion and debate about the “10x engineer”. In any field, one must define what qualifies as 10x quality. 10x engineers have been defined pretty well: their presence in a company increases productivity by 10 times over an average engineer. That is probably a combination of their speed in producing a working system, code that is reliable and easy to maintain, and tools and practices that enable their whole time to be more productive. A more difficult question that we must an answer for learning is what produces a 10x engineer (or other 10x quality role). You can see a deluge of attempts to answer that on Quora. The fact that 10x quality isn’t average suggests that it consists of skills of that may be misunderstood (the benefits of strongly typed languages), counterintuitive (red-green-refactor in test-driven development), unsexy (knowing the ins and outs of the Linux kernel), or have very long plateaus that cause most to drop out (higher order abstractions in Haskell).

Logically we can’t have a scalable system that makes everyone 10x better than average, but if we can train the average person in 10x practices, we have an overall much more productive society.

Quantity of results. Maybe quality is too hard to measure: think about art. But the one who produces 10 times more paintings is probably going to be better (not to mention have more to hang on the wall). Consider this (already cited too often) anecdote from Art and Fear:

The ceramics teacher announced on opening day that he was dividing the class into two groups. All those on the left side of the studio, he said, would be graded solely on the quantity of work they produced, all those on the right solely on its quality. His procedure was simple: on the final day of class he would bring in his bathroom scales and weigh the work of the “quantity” group: fifty pound of pots rated an “A”, forty pounds a “B”, and so on. Those being graded on “quality”, however, needed to produce only one pot — albeit a perfect one — to get an “A”. Well, came grading time and a curious fact emerged: the works of highest quality were all produced by the group being graded for quantity. It seems that while the “quantity” group was busily churning out piles of work – and learning from their mistakes — the “quality” group had sat theorizing about perfection, and in the end had little more to show for their efforts than grandiose theories and a pile of dead clay.

Number of competitive wins. In elite athletics, a major industry goes into shaving fractions of a second (see this in-depth discussion on the minute effects of beet juice). Does the 10x metric still make sense there? It does if you count number of competitions won (or competitive earnings). In high school, I was certainly no athlete, but I was really into math competitions. My high school math team did practice tests on nearly a daily basis and a dozen competitions per year. If you told me I could have the highest score 10 times more often, I would have eagerly given you my (admittedly scant) savings. I was already pretty good, so that would have translated to merely a few extra points per test on average.

I think 10x is a great rallying cry for improving learning experiences, but it’s worth figuring out what is realistic and meaningful for your domain. Once you’ve picked one exploit it as much as possible. If it’s 10x competitive wins, then let your users compete on a daily basis. For 10x quantity of results or time of persistence, encourage a simple repeatable activity and showcase it in a growing gallery (see 180 Websites in 180 Days or Give it 100). If you believe you have the secret to 10x quality, promote the skills that resist learning by finding new ways to practice them or by using expert endorsements to emphasize their importance (see Ramit Sethi’s writing and courses such as Big Wins Manifesto).

I’d love to talk to anyone designing a learning plan–even if it’s just for yourself–to decide which metric to focus on and which strategies to use. Send me an email with your goal!

Motivation in learning

Suppose you’re designing a learning tool and you want to amp up the motivation. You decide to show a graph of the user’s learning progress. Of course on your awesome learning environment, people will be learning all the time, so it’s going to look like this, right? Users will see that they are getting more and more awesome, they’ll feel awesome, and they’ll come back every day to keep learning.

Screenshot 2014-10-15 09.49.49

The problem is, when learning looks like this, the learner is already well aware that they are kicking ass. Your graph is the banner at an election party. Maybe it ties together the scene, but everyone already knows what’s going on.

The reason that motivation is a persistent unsolved problem in education is that learning doesn’t look like that. Learning is filled with plateaus and pits because confusion is the very nature of learning. Learning–in the very best case–looks more like this:

Screenshot 2014-10-15 09.39.20

Keep in mind those plateaus can be on the order of months such that we forget what a jump feels like. Which, by the way, happened so quickly and changed our thinking so rapidly that we barely noticed it!

Motivation hackers have countered with the theory of small wins: if we decrease the delay before some kind of reward, we will feel more motivated. But what does that really mean in the big picture–at least when it comes to learning? It means we are zooming into this graph and increasing the number of little upward bumps on the plateau. That is what spaced repetition is good at: keep increasing the frequency of missed items such that the correctness ratio remains around 90%. But our unconscious, in the end, can’t be tricked like that. Once we’re used to spaced repetition, we know that the missed cards are piling up, rather than the new ones we want to get to. We might feel the joy of a small win, but it will be paired with the pain of even more small losses. Moreover, we know that we just aren’t learning that much.

What about games? Given that games are so fun and addictive, many believe they hold the secret to education’s motivation problems. According to Raph Koster’s Theory of Fun, what makes game fun is…wait for it…learning! While games can sometimes teach educators about pacing, game designers have the luxury of not having to include anything with too long of a plateau. They get to choose the domain, but when we discuss learning as a more practical matter, that isn’t possible.

So you want to create instruction a domain and that contains concepts with long plateaus. Your best option has nothing to do with motivation but rather is to improve instruction such that the plateaus are shorter. Beyond that, I’m not too sure. I think it’s part of why “detachment from the illusions of self” is part of shuhari, a Japanese martial arts conception of mastery: one must get over the idea of that they need to be better all the time. In addition, learners need a deeply held belief both that what they are striving for is important (“when are we ever going to use this?”) and that the periods of stagnation are essential to growth. Maybe the graph to show, if you can do it convincingly, is the plateau another learner was on before achieving their next jump. And the cool stuff they did after a certain number of those jumps.

What I’m learning – 8/5/14

Learning How to Learn (MOOC, Coursera) Week 1 contains a good collection of topics. I’m familiar with most of them: spaced repetition, the benefits of sleep and exercise, the pomodoro technique. An interesting framing that they use is focused versus diffuse modes of the brain. I love Coursera’s mobile app for watching videos: they can be downloaded and watched at 2x speed.

Real World Haskell (Online book) I recently did CIS194: Introduction to Haskell, which was excellent for learning Haskell concepts but left me still confused about how to structure programs. This book is already teaching me a lot of practical tips that CIS194 didn’t cover (to be fair, they give RWH readings for each lecture). The embedded comments are a great way to see a variety of solutions for the exercises in the book. It’d be nice to have top quality solutions available too, but sometimes it helps to see the thoughts of another newbie.

Probabilistic Models of Cognition (Online book) I’ve been hugely interested in modeling cognition for many years. I neglected this book because seemed it’d be like too much of a rabbit hole to tackle. However, it so far turns out to be a great review of probability and functional programming (it uses Church, which derives from Scheme) in addition to the interesting domain. I really enjoy being able to modify and run programs in-line. There’s an element of feedback that is nearly effortless because I usually have an expectation of what a program does right before pressing “Run”. Then I immediately see whether that expectation was correct or I need to think more about it. There are also more traditional exercises that push harder but with the convenience of being in the browser.

Why Do Americans Stink at Math? (Article, NY Times) There is a ringing endorsement among those who are good at math: “don’t just memorize a procedure, understand the concept.” Unfortunately, it rarely goes beyond that platitude, and it starts to break down on closer examination: if you have an understanding, isn’t the concept memorized as well? Most likely, unless you have to reconstruct it very slowly, you’ve memorized the procedure too. So which really came first: your self-proclaimed “understanding” or an explanation that you constructed for the procedure that you memorized? The big reveal is to try to get most of them to actually explain a concept they understand to you. “Argh, well, you just do this.”

And yet, when you read an article like this, there is something obviously and dreadfully wrong with something like “Draw a division house, put ‘242’ on the inside and ‘16’ on the outside, etc.” An interesting counterexample is where math was learned by the uneducated in a way that is procedural but also embodied. That is, math was learned or used in commerce or factory work–but clearly still requires a long path to learn symbolically and abstractly (see also The Real Story Behind Story Problems). Another fascinating possibility for teachers using a Japanese technique called lesson study. A lot to digest in this article. (I have some more writing from my grad school days on concepts.)

Is Practice Dead?

According to a new study, “Deliberate practice is unquestionably important, but not nearly as important as proponents of the view have claimed.” Broken down by domain in a meta-analysis of previous research, deliberate practice explains only 26% (games), 21% (music), 18% (sports), 4% (education), or a minuscule <1% (professions) of differences in performance. The aim of this research isn’t to provide advice, but if you start to believe that practice isn’t that important or effective, you might not pursue it wholeheartedly. I’d like to argue that that’s a big mistake.

Let’s start with the “10,000 hour rule” that is always cited in articles about practice and performance. The standard view of this rule seems to conflate two useful ideas. The first idea is that expert-level performance in cognitive domains takes a great deal of cognitive work–we’ll see why. Call this the practice threshold hypothesis. The second idea is that the specific techniques used to practice make a big difference. Call this the practice quality hypothesis. The meta-analysis is conducted on studies that use the original definition of deliberate practice from Ericsson, Krampe, and Tesch-Römer, 1993, “effortful activities designed to optimize improvement.” Their definition captures neither key ideas about the cognitive work threshold or quality in practice.

The origin of 10,000 hours dates back at least to Simon & Barenfeld, 1969, where they discuss not hours but the size of a “vocabulary of familiar subpatterns” needed by chess masters and Japanese readers: 10,000 to 100,000. Just like reading in a foreign language won’t make sense if you don’t know key words (this is the best example I can find), it isn’t simply that “more practice is better” but that a large minimum threshold of practice is necessary for mastery. Obviously this amount is not exactly 10,000 hours. Chess can cover effectively endless board positions, so the figure is not an upper limit, it’s just that few people reach another major threshold beyond 10 years of practicing 20 hours per week, and those who do may be beyond the comprehension of mere masters. Or as Professor Lambeau says in Good Will Hunting, “It’s just a handful of people in the world who can tell the difference between you and me.”

To discredit the practice threshold hypothesis the meta-analysis would need to examine total accumulated practice that may be related to the domain. In fact there seems to be an inverse correlation between the variance explained per domain and the difficulty of measuring accumulated practice. Chess masters tend to have studied chess their entire lives, and musicians have played music (of some form) their entire lives. Sport skill can come from a bit wider range of physical training. Education and professions draw on a yet wider range of skills. A mathematician may make a “natural” programmer because of extensive experience with analytical thinking, but his math expertise doesn’t get counted as “practicing programming”.

Now let’s talk about practice quality. There isn’t a dominant theory of exactly what makes practice good (and there never will be as it is domain-specific), so that makes it difficult to examine in even a single study, much less across many studies and domains. As far as I can tell, quality of practice is not considered whatsoever. So there are potentially people showing up half-heartedly to practice, practicing something they’ve already mastered, or practicing something they aren’t ready for all getting counted the same as people who practice “optimally”, whatever that is.

Again we see that in the domains with a low variance explained by practice, practice quality is much harder to measure. In games and music a good way to practice is simply to play the game or play the music (though there are often better). Compare that to professional programming. Few people really practice once they learn the language. The quality of continued learning on the job depend on a huge number of factors. Most likely these could not be accounted for in anything but an ethnographic study (unfortunately I couldn’t track down the one study from the meta-analysis targeting professional programming).

In short this study does not tell us about the potential of practice because its measure doesn’t capture when practice is most useful. Unfortunately due to the domain dependencies of what constitutes practice threshold and quality, we’re unlikely to ever see a meta-analysis that captures the full potential of practice across domains. What it may tell us is that the common idea of practice isn’t nearly good enough, especially in something as important as professional work. If it only makes 1% difference, you aren’t doing it right.

There are many sources for ideas for better practice. Popular science works such as Moonwalking With Einstein, Practice Perfect, and The Little Book of Talent are all good places to start. The Cambridge Handbook of Expertise and Expert Performance is a collection of articles across a variety of domains showing the progress that has been made since the 1993 definition of deliberate practice.

Finally a small pitch of my own: I’m reviving my wiki to compile general thoughts on effective learning and practice as well as a glimpse of my personal efforts to practice programming and other skills. I encourage you not only check mine out but also to start something similar, and maybe we can conduct a study of super-effective learners!