Woman with a headset with microphone.

Day 62: Identifying A11y Issues for Voice Input Users

Speech input software is an assistive technology and strategy that people use when they have difficulty using a keyboard or mouse. This may include people with motor, visual, or cognitive disabilities. In the 21st century, it’s an excellent alternative for people in all walks of life.

Things I accomplished

Watched:

Read:

What I learned today

Windows 10 has built-in speech recognition?? It sounds like a combination of Cortana and Speech Recognition could be a cheap alternative to Dragon, but I’d need to experiment a bit with both to compare.

Apple has a Dictation feature. So, somewhat like Windows, a combination of Siri and Dictation could be used. I’ve avoided setting up dictation just because of the privacy flag that pops up when it asks permission to connect to an Apple server and learn from your voice over the Internet. Maybe I’m just paranoid and they all actually work that way?

Dragon offers some ARIA support, but it appears to be limited, and should be tested if relying on aria-label, specific roles, etc.

Love this catchphrase from the Web accessibility perspectives video:

“Web accessibility: essential for some, useful for all.”

Challenges that people who use speech recognition software face on the web:

  • carousels that move without a pause button
  • invisible focus indicators
  • mismatched visual order and tab order
  • mismatched linked image with text and alternative text
  • duplicate link text (e.g. Read More) that leads to different places
  • form controls without labels
  • hover only menus (MouseGrid can struggle accessing these)
  • small click targets
  • clickable items that don’t look clickable
  • too many links

Designers and developers should focus on WCAG’s Operable principle. In particular, Navigable guideline’s success criteria would apply here. If many of those success criteria are met with other users in mind, it will definitely be beneficial to speech recognition users, too.

In the past, I haven’t personally been interested in software, like Dragon, yet looking from an accessibility point of view, I’m ready to start testing with speech input technology to better understand how it works and affects people who rely on it when interacting with the web.

4 thoughts on “Day 62: Identifying A11y Issues for Voice Input Users”

  1. The good ones do all work that way, AFAIK. The most accurate speech recognition is cloud-based, sending the speech up to the cloud to be transcribed against the speech database and rules. Those get into really nitty gritty things like regional accents and words that might be commonly confused for each other — at McDonald’s it was fries and Sprite, and you know no one will ever order “diet fries” for example.

    Like

    1. Thanks for sharing! I would imagine cloud-based services do excel in speech recognition. It’s always the privacy issues that concern me, but really no more or less than many other services that we use and rely on.

      Like

      1. Where it gets really ugly is healthcare. I don’t particularly care if someone hacks into my secret favourite McDonald’s order. I do care if someone hacks the transcription service for my orthopedic surgeon

        Like

Leave a Reply to digilou Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s