Speech input software is an assistive technology and strategy that people use when they have difficulty using a keyboard or mouse. This may include people with motor, visual, or cognitive disabilities. In the 21st century, it’s an excellent alternative for people in all walks of life.
Things I accomplished
Watched:
- Level Access’s Web Accessibility 101: Dragon NaturallySpeaking Demo (YouTube)
- W3C’s Web Accessibility Perspectives: Voice Recognition
- Equal Entry’s Using Cortana and Speech Recognition Together on Windows 10
- AccessiQ’s Website accessibility testing – links – Access iQ™
Read:
- Assistive Technology Experiment: DragonNaturallySpeaking (WebAIM)
- Short Note on ARIA, Dragon, and Standards (TPG)
- Notes on Guidelines for Speech Accessible HTML for DragonNaturally Speaking (TPG)
What I learned today
Windows 10 has built-in speech recognition?? It sounds like a combination of Cortana and Speech Recognition could be a cheap alternative to Dragon, but I’d need to experiment a bit with both to compare.
Apple has a Dictation feature. So, somewhat like Windows, a combination of Siri and Dictation could be used. I’ve avoided setting up dictation just because of the privacy flag that pops up when it asks permission to connect to an Apple server and learn from your voice over the Internet. Maybe I’m just paranoid and they all actually work that way?
Dragon offers some ARIA support, but it appears to be limited, and should be tested if relying on aria-label, specific roles, etc.
Love this catchphrase from the Web accessibility perspectives video:
“Web accessibility: essential for some, useful for all.”
Challenges that people who use speech recognition software face on the web:
- carousels that move without a pause button
- invisible focus indicators
- mismatched visual order and tab order
- mismatched linked image with text and alternative text
- duplicate link text (e.g. Read More) that leads to different places
- form controls without labels
- hover only menus (MouseGrid can struggle accessing these)
- small click targets
- clickable items that don’t look clickable
- too many links
Designers and developers should focus on WCAG’s Operable principle. In particular, Navigable guideline’s success criteria would apply here. If many of those success criteria are met with other users in mind, it will definitely be beneficial to speech recognition users, too.
In the past, I haven’t personally been interested in software, like Dragon, yet looking from an accessibility point of view, I’m ready to start testing with speech input technology to better understand how it works and affects people who rely on it when interacting with the web.
The good ones do all work that way, AFAIK. The most accurate speech recognition is cloud-based, sending the speech up to the cloud to be transcribed against the speech database and rules. Those get into really nitty gritty things like regional accents and words that might be commonly confused for each other — at McDonald’s it was fries and Sprite, and you know no one will ever order “diet fries” for example.
LikeLike
Thanks for sharing! I would imagine cloud-based services do excel in speech recognition. It’s always the privacy issues that concern me, but really no more or less than many other services that we use and rely on.
LikeLike
Where it gets really ugly is healthcare. I don’t particularly care if someone hacks into my secret favourite McDonald’s order. I do care if someone hacks the transcription service for my orthopedic surgeon
LikeLike