Even if it turns out there's a much more efficient way to run a bottled intelligence (which is guaranteed to appear at some point), there's way too much money invested in the current way for it all to change overnight.
There are petrol cars and diesel cars and electric cars. They all have pros and cons, but a big thing is the lock in. The infrastructure that supports ICE means they hang around even if they give the polar bears no where to live.
The investment in LLM's and other BI to date has created a lock in for the invisible layers to remain in some models. They may not be as efficient, but they already exist and are in use. Thus they'll hang around for a while till the better tech has a good enough business case for the investment to change over the hardware layer necessary to support the alternate way to run an inference.
Over time we'll end up with ways to run complex models in things as small as a toaster and we'll end up with Rick's butter robot asking us the meaning of existence.
(I just wish they'd stop making the new robots so damn strong. We don't have alignment hacked, so don't embody the intelligence in something that can break my arm).