Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5 -

Goodhart’s Law, Partially Observed Goals, and Wireheading: some more reasons for AI systems to find ways to ‘cheat’ and get …

source

Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5