Risks from learned optimization in advanced machine learning systems E Hubinger, C van Merwijk, V Mikulik, J Skalse, S Garrabrant arXiv preprint arXiv:1906.01820, 2019 | 141 | 2019 |
Categorizing variants of Goodhart's Law D Manheim, S Garrabrant arXiv preprint arXiv:1803.04585, 2018 | 125* | 2018 |
Logical induction S Garrabrant, T Benson-Tilsen, A Critch, N Soares, J Taylor arXiv preprint arXiv:1609.03543, 2016 | 58* | 2016 |
Embedded agency A Demski, S Garrabrant arXiv preprint arXiv:1902.09469, 2019 | 42* | 2019 |
Pattern avoidance is not P-recursive S Garrabrant, I Pak arXiv preprint arXiv:1505.06508, 2015 | 31* | 2015 |
Using TPA to count linear extensions J Banks, SM Garrabrant, ML Huber, A Perizzolo Journal of Discrete Algorithms 51, 1-11, 2018 | 21* | 2018 |
Words in Linear Groups, Random Walks, Automata and P-Recursiveness S Garrabrant, I Pak arXiv preprint arXiv:1502.06565, 2015 | 17 | 2015 |
Asymptotic convergence in online learning with unbounded delays S Garrabrant, N Soares, J Taylor arXiv preprint arXiv:1604.05280, 2016 | 12 | 2016 |
Counting with irrational tiles S Garrabrant, I Pak arXiv preprint arXiv:1407.8222, 2014 | 12 | 2014 |
Asymptotic logical uncertainty and the Benford test S Garrabrant, T Benson-Tilsen, S Bhaskar, A Demski, J Garrabrant, ... Artificial General Intelligence: 9th International Conference, AGI 2016, New …, 2016 | 7 | 2016 |
Upper bounds in the Ohtsuki–Riley–Sakuma partial order on 2-bridge knots SM Garrabrant, J Hoste, PD Shanahan Journal of Knot Theory and Its Ramifications 21 (09), 1250084, 2012 | 7 | 2012 |
Categorizing Variants of Goodhart’s Law. arXiv. 2018 10.48550 D Manheim, S Garrabrant arXiv, 1803 | 6 | 1803 |
Two major obstacles for logical inductor decision theory S Garrabrant Intelligent Agents Foundation Forum, 2017 | 5 | 2017 |
Geometric analysis of a generalized Wythoff game E Friedman, SM Garrabrant, IK PHIPPS-MORGAN, AS Landsberg, ... Games of No Chance 5 5, 343, 2019 | 4 | 2019 |
Inductive Coherence S Garrabrant, B Fallenstein, A Demski, N Soares arXiv preprint arXiv:1604.05288, 2016 | 4* | 2016 |
Cofinite Induced Subgraphs of Impartial Combinatorial Games: An Analysis of CIS-Nim SM Garrabrant, EJ Friedman, AS Landsberg INTEGERS 13, 2, 2013 | 3 | 2013 |
Temporal Inference with Finite Factored Sets S Garrabrant arXiv preprint arXiv:2109.11513, 2021 | 2 | 2021 |
Cartesian frames S Garrabrant, DA Herrmann, J Lopez-Wild arXiv preprint arXiv:2109.10996, 2021 | 1 | 2021 |
P-recursive integer sequences and automata theory SM Garrabrant University of California, Los Angeles, 2015 | 1 | 2015 |
Factored space models: Towards causality between levels of abstraction S Garrabrant, MG Mayer, M Wache, L Lang, S Eisenstat, H Dell arXiv preprint arXiv:2412.02579, 2024 | | 2024 |