lots of examples of something you want them to learn (pictures of cats labeled “cat,” for example), along with examples of other random data (pictures of other things). This is called “supervised learning,” because the neural network is being taught by example, including the use of “adversarial training” with data that is not correlated to the desired result. These neural networks, like their biological models, consist of layers of thousands of nodes (“neurons,” in the analogy), each of which is connected to all the nodes in the layers above and below by connections that initially have random strength. The top layer is presented with data, and the bottom layer is given the correct answer. Any series of connections that happened to land on the right answer is made stronger (“rewarded”), and those that were wrong are made weaker (“punished”). Repeat tens of thousands of times and eventually you have a fully trained network for that kind of data. You can think of all the possible combinations of connections as like the surface of a planet, with hills and valleys. (Ignore for the moment that the surface is just 3D and the actual topology is many-dimensional.) The optimization that the network goes through as it learns is just a process of finding the deepest valley on the planet. This consists of the following steps: 1. Define a “cost function” that determines how well the network solved the problem 2. Run the network once and see how it did at that cost function 3. Change the values of the connections and do it again. The difference between those two results is the direction, or “slope,” in which the network moved between the two trials. 4. Ifthe slope is pointed “downhill,” change the connections more in that direction. If it’s “uphill,” change them in the opposite direction. 5. Repeat until there is no improvement in any direction. That means that you’re in a minimum. Congrats! But it’s probably a /oca/ minimum, or a little dip in the mountains, so you’re goi