Apple patents a system that detects mouth movements and reads lips

Manzana has devised a system to detect the mouth movements and read lips of the user during voice commands in environments with interference and that would apply to devices that support the company’s digital assistant.

The company has a smart assistantSiri, which registers requests such as writing and sending a message, setting reminders or carrying out actions such as calling a contact or sharing the arrival at a place with another user.

However, as you remember Apple Insider, it finds certain difficulties to understand the requests of the users in different scenarios, for example, when there is noise in the place from which it is being used. Distortions are also another problem that Siri faces.

The technology company has devised a voice recognition system that detects different movement data, generated by vibrations during speech, which is included in a patent signed by developers Eddy Zexing Liang and Madhu Chinthakunta, which Apple presented in January at USA and that has been published this Thursday.

“When a user speaks, the mouth, face, head and neck move and vibrate. Motion sensors, such as accelerometers or gyroscopes, can detect these movements and consume relatively little power, compared to audio sensors, such as microphones.

This recognition system would be able to compare with previously learned mouth movements and check whether what the user requests matches words or phrases from previous voice commands to find matches. That is, it would read the user’s lips to understand their request.

Thanks to this system, the devices in which this voice recognition system was implemented would be able to recognize commands such as ‘hey siri‘ and other simple or common commands, such as ‘next song’. These actions would be reflected on the iPhone after linking it to the electronic equipment.

To meet its goals, Apple would need to analyze a large data set about the movements users make to pronounce each word and create voice profilesso that the system can differentiate both the pronunciation of each user and the language in which these requests are made.


