The bottom-up method for recognizing handwritten characters is to give the computer thousands of examples of each one and let it pull out the salient features. Instead, Lake ef al. gave the program a general model of how you draw a character: A stroke goes either right or left; after you finish one, you start another; and so on. When the program saw a particular character, it could infer the sequence of strokes that were most likely to have led to it—just as I inferred that the spam process led to my dubious email. Then it could judge whether a new character was likely to result from that sequence or from a different one, and it could produce a similar set of strokes itself. The program worked much better than a deep-learning program applied to exactly the same data, and it closely mirrored the performance of human beings. These two approaches to machine learning have complementary strengths and weaknesses. In the bottom-up approach, the program doesn’t need much knowledge to begin with, but it needs a great deal of data, and it can generalize only in a limited way. In the top-down approach, the program can learn from just a few examples and make much broader and more varied generalizations, but you need to build much more into it to begin with. A number of investigators are currently trying to combine the two approaches, using deep learning to implement Bayesian inference. The recent success of AI 1s partly the result of extensions of those old ideas. But it has more to do with the fact that, thanks to the Internet, we have much more data, and thanks to Moore’s Law we have much more computational power to apply to that data. Moreover, an unappreciated fact is that the data we do have has already been sorted and processed by human beings. The cat pictures posted to the Web are canonical cat pictures—pictures that humans have already chosen as “good” pictures. Google Translate works because it takes advantage of millions of human translations and generalizes them t