Placeona: The Interface Question the AI Gold Rush Forgot to Ask
The instinct of 2026 is to make everything talk. ChatGPT and Gemini ship voice. Cars, fridges, and code editors grow agents. The pitch for nearly every new product is some version of "just ask it." The implicit claim underneath the gold rush is that conversation is the universal interface now, and the job is to bolt it onto everything.
You have already felt where that breaks. Someone tries to voice-control a phone in a quiet train car. A kitchen assistant mishears over the exhaust fan. An agent reads you a 200-word answer when one glance at a screen would have done. The modality was available. It was just wrong for the place. The technology question, can it talk, is answered. The question the rush skipped is whether, here, it should.
That question has a name, and it has been sitting in the human-computer interaction literature for over a decade.
What is a placeona?
A placeona is the context a product is used in, described by what that context leaves available. A persona tells you who the user is. A placeona, a term from Bill Buxton, a pioneer of human-computer interaction at Microsoft Research, tells you where they are and which human channels the place leaves free.
There are four channels, and each one is in some state when the moment of use arrives:
- Hands (free, busy, or dirty)
- Eyes (free or busy)
- Voice (free or restricted)
- Ears (free or busy)
Read the channels and the right interface usually falls out on its own. Driving ties up hands and eyes but leaves voice and ears free, so voice wins. Cooking leaves your hands dirty and everything else open, so voice wins again. A library or an open-plan office leaves your hands and eyes free but makes voice socially restricted, so a screen wins and talking fails, not technically but socially. A movie theater restricts everything, so the honest answer is that no interface belongs there at all.
The matrix is the whole discipline in one picture. It also shows why "add voice" is not a strategy. The same feature that helps in one row is a failure one row down.
Why a "natural" interface was never the same thing as voice
Here is the deeper cut, and it's the line that separates this from a UX checklist. The industry sells voice, gesture, and touch as inherently "natural." Buxton's point is that they are not.
Voice, gesture, touch does not necessarily Natural User Interface make.
A natural user interface, in his framing, exploits skills we have built over a lifetime of living in the world, and it has to be designed with the use context in mind. Natural is not a property of the modality. It's a property of the fit between the modality and the moment. A "natural language interface" is only natural where the place makes language the path of least resistance. Drop the same interface into a context that punishes talking and it stops being natural, no matter how good the model is. The agentic era keeps shipping conversation into rooms that can't use it, and calling the result natural because the words go in and out.
Where does voice actually belong?
I have a particular reason to care about this one. A decade ago I co-founded Orbita, the first voice-AI company focused on healthcare, and bet that voice and natural-language interfaces would become normal. That bet was right. We are living in it now. We raised $25M led by Philips Health Ventures on it. But the work that actually mattered was never "can we do voice." It was where voice belonged, and far more often, where it did not.
Healthcare is a placeona machine. A surgeon scrubbed in has both hands occupied and both eyes on the field, with voice and ears wide open: the textbook case for voice, and it genuinely helps. A patient managing medications alone at home benefits from a spoken reminder. But a patient in a shared hospital room, or a nurse at a station within earshot of a dozen others, is voice-restricted, and the exact feature that helps in the private room becomes a privacy and dignity failure one door down. Working with clients like Pfizer and Mayo Clinic, the job was mapping which clinical contexts the voice fit, and, more often, which ones it quietly broke. We learned the placeona the hard way, in rooms where getting it wrong had a cost.
That experience is why the framework reads as obvious to me and the gold rush reads as a repeat of an old mistake. Every product decision in the agentic era has a placeona hiding inside it. Should this be a proactive ambient agent or a thing you open when you want it? A voice reply or a card you can glance at? A command line or a chat? The answer is not taste, and it is not which modality demos best. It is which channels the user's context leaves free, and what a wrong-modality failure costs in that specific place.
Doesn't multimodal solve this?
Multimodal is the obvious objection: if the product can do voice and screen and touch, why choose? But "multimodal" is not "every modality, everywhere, at once." Done well, it means having the right modality ready for the place and degrading gracefully when a channel is unavailable: voice when your hands are full, a silent card when you walk into the meeting. That is placeona-aware design. Done lazily, multimodal is just more things shouting in the theater, every channel firing regardless of whether the room can take it. The capability to use any channel does not remove the judgment about which one belongs. It raises the stakes on having that judgment.
Write the placeona before you ship
Here is the practice, and it costs nothing. Before you add voice, or a chat, or an agent, write the placeona for the moment of use. Which of the user's hands, eyes, voice, and ears are actually free right then? What does a wrong-modality failure cost there: a missed step, a privacy breach, a social penalty, an interruption that loses you the user? Design for that answer, not for the demo where everything is quiet and the user is paying full attention.
This is building the right thing made specific, applied to how and where a product is experienced. AI removed the constraint of whether we can build a given interface. What's left is the judgment about where it belongs, which is exactly the layer that doesn't get automated away. It's the same reason ideas, not execution, are the bottleneck now: when any modality is cheap to ship, choosing the right one becomes the work. And AI amplifies your direction, right or wrong, so an interface aimed at the wrong place just fails faster and at greater scale.
The technology question is settled. Everything can talk now. The question Buxton asked decades ago is the one our clients always actually needed answered, and the one the gold rush keeps skipping: not whether it can talk, but whether, here, it should. That judgment is the work. It always was.
Frequently asked
What is a placeona?›A placeona (place plus persona) is a way to describe the context a product is used in by what it leaves available.
When should an interface use voice, and when shouldn't it?›Use voice when the place leaves voice and ears free but ties up hands or eyes: driving, cooking, a surgeon scrubbed in.
What is a natural user interface?›Per Bill Buxton, a natural user interface exploits skills we've built over a lifetime of living in the world, and it has to be designed with the use context in mind.
Does multimodal AI mean every interface should talk?›No. Multimodal done well means having the right modality ready for the place and degrading gracefully when a channel isn't available.
Considered takes, in your inbox.
We write when we learn something worth sharing. No schedule, no marketing digests. Built for engineers and product owners shipping with agents.