Natural language is not the best AI interface


Everyone is excited about what Large Language Models like OpenAIs chatGPT can do. Every week seems to bring a new AI breakthrough release. The limitations are falling to the wayside one by one. Which is why I’m a bit shocked by how little thought there seems to be going on around interfaces for AI.

When Amazon released Alexa, everyone thought that conversational UI was going to take over the world, despite clear (and annoying) limitations. These days everyone seems excited about “natural language” as the universal interface.

“AI copilots will be everywhere. The main way of using a computer is natural language, not pointing and clicking.”

Lorenzo Green paraphrasing Bill Gates

I think they are missing a point. Interfaces are designed to empower the user. Natural language is a messy soup of trial and error trying to convey or understand something. Natural language is great for discovery and learning, less so for expert use.

You have probably heard someone say “text is too slow, it’s quicker if we just talk to each other”. Maybe you’ve felt the same? Choosing to call someone, or meet someone instead of sending them an email. There’s a really important lesson to draw from this common behaviour, and I believe it shows us how we will interact with AIs in the future.

Speech is slower than text, but has an advantage

Language is messy, and so are ideas. If we were to compare the transmission rate of spoken words to that of text, I’m very sure we’ll see that text is far more efficient. We take the time to structure our thoughts into text, which takes effort and time. Something we rarely do while talking.

That’s why talking feels quicker, and easier. You are not conveying information faster, you are simply putting in less effort. Letting the other side do some of the work.

When you’re speaking to someone you can ask a question when you don’t understand. This takes effort and time, but it is much faster than reading more material and hoping you’ll get it eventually.

Natural language will be used to learn new things, to understand topics, and to discover what AI can help you with. But then what? You’re not gonna ask an AI chatbot to fill in an excel formulae, you’ll more likely learn how and do it yourself, because it will feel faster or maybe even be faster, than watching a chat print out characters.

Expert interfaces work differently

Most tech power users will tell you that mechanical keyboards are faster for typing, VIM is the best text editor ever, and everything should be available through the Command Line Interface. I have no idea if the first two are true, but the last one is at least partially true.

Experts are always using complex UIs that a layman would never be able to understand. Because the expert has internalised so much information, they can skip past the fancy buttons and sliders of a user friendly interface and go straight to chaining commands or making exact adjustments.

Text is likely a universal interface, a UI so powerful that it allows us to do anything. Much like speech, it can convey any idea, but unlike speech it can be viciously compacted and exact. Conveying more information, or more commands, in less time.

I have no doubt the AIs of the future will handle CLI-chained-commands-style input, both in audio and text. But what is more interesting to me is that there’s nothing stopping AI from generating all sorts of specialised UI.

Imagine that instead of setting up a spreadsheet and asking AI for a formula, you ask AI to generate a dashboard for your current cash flow. It comes back with a windowed dashboard, you ask it to add additional controls or features, and it does. Highly specific, contextually generated, personalised UIs. Perfectly tailored to your use case, and skill level.

Conclusion

I believe that in the near future we will use AI to generate UI for our personal skill level for the task we’re doing. When our skill increases, we will ask the AI for a new UI.

We will learn and explore using natural language. We will ask AI to generate UIs for more specific, and maybe recurring, tasks. And we will simply trigger chains of commands for truly expert use.

LLMs can generate all these things for us. In real time. But natural language is not the universal UI. AI is itself the universal UI.

Update Oct 13 2023

Someone went ahead and built app that generates UI on the fly to help you solve your problem:

Update Dec 8th 2023

Google’s Geminy model is generating in line UIs.


Categoryessays