Usability challenges in voice search — and how to overcome them

If you’ve kept an eye on the latest eCommerce tech trends, then you’ll no doubt have heard of the growing relevance of voice search. It’s a sector that’s becoming more sophisticated every year — and more popular, too.

But voice search isn’t without its challenges.

In many ways, voice search is an inherently unintuitive medium for user interaction. There are little to no visual cues and that in itself can undermine the years of UX theory we’ve all worked so hard to build.

So what does usability look like in the voice search era?

Voice search usability: lessons from the front line

As we said before, voice search is an emerging and exciting digital trend. So you can bet the Cheesecake Labs team has an opinion on it — and the experience to back that opinion up.

One of our recent development projects allowed us to get up close and personal with the challenges that businesses face when embracing voice search. And here are the key lessons we think you’ll find useful…

You need to understand the tech available on the market

First, we learned that you have to understand the current state of voice search tech. There is far more to voice search than the surface level “Command > Wait > Reaction” cycle most of us are familiar with.

Succeeding in this sector will require dedicated research to understand the tech, the associated best practices, and the latest advancements being made.

Here are the three components that make up every voice search as we understand it today.


The first component of a voice search experience is intent. This is the goal a user has when they initiate the voice search in any form. It’s independent of how they phrase their intent or how the voice application responds.

It refers specifically to what the user wants, and that’s it.


The second component is the utterance. Although this is the first interaction a user has with the voice application, it’s the second component of the voice search process — and that’s an important distinction.

The utterance is the phrasing the user employs to spark the action that will satisfy their intent.


Third, we have the slot. The slot (or slots, depending on the request) is the variable component of a person’s request. For example, if a user says, “Show me cute dog pictures”, the slot would be “cute dog”.

The action aspects of the phrase stay the same with each request to see pictures. However, the slot “cute dog” could be interchanged with “angry cat” or “scary rollercoaster” or “beautiful mountains”.

Every voice search (and nearly every voice command) will contain at least one slot.

Next, ask how you can create a great UX

Once you understand the basic components of a voice search — and have researched the latest tech in this sector — it’s time to start ideating a great voice search experience.

Building a voice search UX is unique in that it exclusively includes the user’s voice. There are no other components to the experience of using a VUI (voice user interface). This should radically change your approach to design. The responding voice’s persona will be key to getting this right.

Privacy is another key consideration when building a great voice UX. We’ve seen in recent history how important privacy is to users, especially with the vulnerability associated with vocal interactions.

When users are being listened to, they want total confidence that the listening party is being as responsible and respectful with their privacy and data as possible.

Then you need to build the user’s journey

The user’s journey through a voice search interaction is probably the most complex facet of creating a voice search experience. That’s because it needs to appear as (and in many ways is) a completely fluid and natural interaction.

Simultaneously, the interaction needs to be very precisely orchestrated and predicted. You should consider all of the options and possibilities that could occur during a voice interaction. Unlike a graphical user interface (GUI), where you create the environment that your interactions take place in, VUIs present a far more amorphous and haphazard interactive plane.

Sometimes, you’ll be crafting this interaction with a concrete design process. You’ll be pulling on research, experience, and strategies. And other times, your team will be sitting around a table, spitballing all of the different questions, phrases, and requests that could be thrown at your voice application.

Don’t skip this stage. An ill-thought-out user journey for a voice search application will fall flat on its face.

After that, you’ll create the “copy” for your voice search experience

The “copy” of your voice search experience is the flow that your VUI conversations will follow. This includes the request and the reply, and the following requests and replies that come next.

Again, this is a complex process — and each project will have copy unique to itself — but we do have some pointers:

  • Remember that a VUI is based on communication, which is one of the most fundamental components of the human experience. Before creating artificial conversations, you should start by studying natural ones.
  • Speech is filled with idiosyncrasies. Users will combine words in ways that don’t fit grammar conventions, they will expect non-verbal implications to be understood, and they’ll get frustrated when they are not understood very quickly. A frustrated user isn’t a returning one, so make sure you explore this thoroughly.

Don’t forget to perform user testing on your voice search

It should come as no surprise that user testing is critical to building a voice search interface. This is not like testing a GUI, where you’re testing layouts, clarity, and checking for bugs.

You will be doing those things (or similar), but you will also be running into countless issues that you cannot anticipate. Again, this is a result of the fluid, uncontrolled, and improvisational nature of communication.

Lastly, iterate on your design and prepare for launch

As you go through testing and gradual deployment, be prepared to iterate. The “finished” version of your VUI might be very different from the version that your testers see. And, like any application, you should continue iterating on your VUI after its launch.

Metrics are key here! Measure the performance of your VUI using KPIs before and after launch.

Voice search might be new, but data and analytics haven’t been replaced yet.

Voice experience design is a totally new challenge — but we’re ready for it

Voice search is unlike anything else we’ve built in the digital space. And despite being mainstream for nearly ten years, we — as a design and development community — still know very little about working with this tech.

This newness means that the challenges of voice search design will be difficult to anticipate.

There are inherent difficulties built into voice search. After all, we’re visual creatures, which makes VUIs challenging to design and use.

There are external difficulties as well. You won’t know how your users are going to use your VUI until you see them use it. And you don’t know how they’re going to converse with it until you monitor these interactions.

Even still, these challenges shouldn’t dampen your excitement for or eagerness to invest in the future that voice search will bring. Already, we’re getting a glimpse into this future with the growing popularity and efficacy of smart home speakers and automation.

As new devices like AR glasses and smartwatches become more powerful and realistic, the role of voice search will grow exponentially. We may be on the verge of a future where our voices will become the new touchscreen.

By embracing and attacking these challenges — and embracing the opportunities — you can not only push voice search forward but remain ahead of your competition.

Start building your VUI with the digital experience experts

The team at Cheesecake Labs has hands-on experience with the challenges and opportunities that come with voice search. We’re ready to step into the future, if you are?

About the author.

Guilherme Hayashi
Guilherme Hayashi

Guilherme Hayashi has more than a decade of experience in building digital products always putting people in the center of the product and the process.