Points of failure: direct specification in AGI alignment

posted on 28.03.2022
Some have critiqued the strategy of explicitly formalising and implementing a value structure in the design of ethical Artificial General Intelligences (AGIs). I build on these critiques by providing a conceptual account of the issues with direct specification, demonstrating its in-principle unviability when compared to implicit and indirect approaches. I begin with a consideration of the factors involved in AGI risk, and the need for risk mitigation. The design of AGIs which are motivated towards ethical behaviour is a key element in risk mitigation. A natural approach to this problem is to directly specify values for the AGI, but this approach necessitates two fatal consequences: an axiological gap between any potential AGIs and humans, and the immutability of this gap. Indirect approaches evade both of these consequences. I construct an account of the axiological gap and argue for its inevitability under direct specification.


Table of Contents

I: AGI risk and alignment -- II: Points of failure -- Conclusion


Theoretical thesis. Bibliography: pages 49-56

Macquarie University

Thesis MRes


MRes, Macquarie University, Faculty of Arts, Department of Philosophy

Department of Philosophy

Paul Formosa

Richard Menary


Copyright Elias Dokos 2019.




mq:72328 http://hdl.handle.net/1959.14/1283726