Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5



Goodhart’s Law, Partially Observed Goals, and Wireheading: some more reasons for AI systems to find ways to ‘cheat’ and get …

source

Leave a Reply

Your email address will not be published. Required fields are marked *

Amazon Affiliate Disclaimer

Amazon Affiliate Disclaimer

“As an Amazon Associate I earn from qualifying purchases.”

Learn more about the Amazon Affiliate Program