It appears you don't have support to open PDFs in this web browser. To view this file, Open with your PDF reader
Abstract
Building artificial intelligence (AI) that aligns with human values is an unsolved problem. Here we developed a human-in-the-loop research pipeline called Democratic AI, in which reinforcement learning is used to design a social mechanism that humans prefer by majority. A large group of humans played an online investment game that involved deciding whether to keep a monetary endowment or to share it with others for collective benefit. Shared revenue was returned to players under two different redistribution mechanisms, one designed by the AI and the other by humans. The AI discovered a mechanism that redressed initial wealth imbalance, sanctioned free riders and successfully won the majority vote. By optimizing for human preferences, Democratic AI offers a proof of concept for value-aligned policy innovation.
Koster, Balaguer et al. show that an AI mechanism is able to learn to produce a redistribution policy which is preferred to alternatives by humans in an incentivized game.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details




1 Deepmind, London, UK (GRID:grid.498210.6) (ISNI:0000 0004 5999 1726)
2 University of Exeter, Department of Economics and Institute for Data Science and Artificial Intelligence, Exeter, UK (GRID:grid.8391.3) (ISNI:0000 0004 1936 8024)
3 Deepmind, London, UK (GRID:grid.498210.6) (ISNI:0000 0004 5999 1726); University College London, Gatsby Computational Neuroscience Unit, London, UK (GRID:grid.83440.3b) (ISNI:0000000121901201)
4 Deepmind, London, UK (GRID:grid.498210.6) (ISNI:0000 0004 5999 1726); University of Oxford, Department of Experimental Psychology, Oxford, UK (GRID:grid.4991.5) (ISNI:0000 0004 1936 8948)