Excellent article as usual.
That a system needs to learn to see, and be trained from the start with video data is not a surprise. Consider how big the pipe is from our eyes to our brain and how much of our brain is devoted to processing visual data.
We don't talk our way through our world, so I can't see how a machine that talks its way through vision processing would work.
I do think Elon's ego is such that you can't trust anything he says about the capability of a system he is involved with, especially when it will directly impact the share price of his company.
Interesting to see what happens next.