enough time and unlimited visual aids, a human could express a preference (or indifference) when offered a choice between two future lives laid out before him or her in all their aspects. (This idealization ignores the possibility that our minds are composed of subsystems with incompatible preferences; if true, that would limit a machine’s ability to optimally satisfy our preferences, but it doesn’t seem to prevent us from designing machines that avoid catastrophic outcomes.) The formal problem F to be solved by the machine in this case 1s to maximize human future-life preferences subject to its initial uncertainty as to what they are. Furthermore, although the future-life preferences are hidden variables, they’re grounded in a voluminous source of evidence—namely, all of the human choices ever made. This formulation sidesteps Wiener’s problem: The machine may learn more about human preferences as it goes along, of course, but it will never achieve complete certainty. A more precise definition is given by the framework of cooperative inverse- reinforcement learning, or CIRL. A CIRL problem involves two agents, one human and the other a robot. Because there are two agents, the problem is what economists call a game. It is a game of partial information, because while the human knows the reward function, the robot doesn’t—even though the robot’s job is to maximize it. A simple example: Suppose that Harriet, the human, likes to collect paper clips and staples and her reward function depends on how many of each she has. More precisely, if she has p paper clips and s staples, her degree of happiness is Op + (1-0)s, where @ is essentially an exchange rate between paper clips and staples. If @1is 1, she likes only paper clips; if @ is 0, she likes only staples; if @ is 0.5, she is indifferent between them; and so on. It’s the job of Robby, the robot, to produce the paper clips and staples. The point of the game is that Robby wants to make Harriet happy, but he doesn’t kno