Way too much room for ambiguity in those rules.

Apr 7, 2024

The alignment problem is such that no matter what we define as behaviour for the AI, what it will actually do can't be predicted.

If we ask it, 'do you understand the rules?' it may say 'yes'.

If we ask it will it obey them, it may say 'yes'.

Does that mean it actually will? No. Not even slightly.

And AI will by its current nature, try to keep you happy. It also can't predict what it might do in a given situation, so it can't say with certainty that it will follow the rules. It knows that if it tells you it will follow the rules, then you'll be happy.

Alignment is a really big issue, and I think we've got a long way to go before we can solve it.

Written by Only Darren