Endpoint MCU Implementation of Voice User Interface
Article 5 of Bringing Intelligence to the Edge Series: Integrating voice user interface technology into microcontroller units for offline, edge-based voice recognition is set to redefine the landscape of home automation and smart industrial applications.
This is the fifth article in a 6-part series featuring articles on "Bringing Intelligence to the Edge". The series looks at the transformative power of AI in embedded systems, with special emphasis on how advancements in AI, embedded vision, and microcontroller units are shaping the way we interact with technology in a myriad of applications. This series is sponsored by Mouser Electronics. Through their sponsorship, Mouser Electronics shares its passion and support for engineering advancements that enable a smarter, cleaner, safer manufacturing future.
Rapid consumer adoption of connected smart speakers and voice assistants suggests that many people prefer to control products with speech rather than counterintuitive button pushes. Consumers also seem to appreciate devices and home appliances that are smart without being connected to the cloud via the internet.
This article primarily discusses the implementation of Renesas Electronics voice user interface (VUI) solution, a third-generation reference platform built on the scalable Arm®-based RA2, RA4, and RA6 series of MCUs, especially targeted to applications in home automation, white goods, and small appliances.
Throughout the discussion, we will explore how this solution offers developers a wide range of tools and resources, leading to enhanced product design. This includes details about the integrated machine learning model's design and training methods, and a comprehensive evaluation approach using both online and offline methods. The focus will be on how these elements come together to create a potent voice-controlled system, bringing intelligence to the edge.
Overview of Voice User Interface Implementation
VUI is speech recognition technology that enables users to interact with a computer, smartphone, or other daily use device with voice commands. The unique feature of VUI is the use of voice as primary mode of interaction, in contrast with traditional keyboard, mouse, display, or touch screen.
The new, easy-to-use Renesas hardware platform for VUI solutions is based on the high-performance Renesas Advanced (RA) family of 32-bit microcontroller units (MCUs).
The RA family delivers key advantages compared to competitive Arm Cortex®-M MCUs by providing stronger embedded security, superior CoreMark® performance, and ultra-low-power operation. Platform Security Architecture (PSA) certification provides customers the confidence and assurance to quickly deploy secure IoT endpoint and edge devices, and smart factory equipment for Industry 4.0
The RA family currently includes three product series: RA6, RA4, and RA2. Each of these series has a unique feature set, making them ideal for various applications and market needs. The RA6 Series offers the widest integration of communication interfaces, with integrated Ethernet and TFT display drivers. Flash memory densities range from 256KB to 2MB. The RA6 Series offers up to 240MHz performance running on the Cortex-M4 or Cortex-M33 core with TrustZone®. The RA6 Series supports full security integration, making these devices widely desired for security applications.
The RA4 Series balances the requirements for low power consumption with the demand for connectivity. It offers up to 1MB of flash and a wide range of communication interfaces. The utilized core is the Cortex-M4 or Cortex-M33 with TrustZone and additional security IP integration. Memory densities range from 256KB to 1MB of flash. These devices provide a CPU frequency of up to 100MHz. The RA2 Series are ideal for designs where the low power requirements of an application matter most. To achieve the best performance, special power-down modes are provided, making these devices well-suited for battery-powered applications. The RA2 Series provides memory densities of up to 256KB of embedded flash and a wide single-voltage supply range of 1.6 to 5.5V. These devices use the Cortex-M23 core running at up to 48MHz.
The Renesas Flexible Software Package (FSP) is an enhanced software package designed to provide easy-to-use, scalable, high-quality software for embedded system designs using Renesas RA Family microcontrollers (Figure 2).
It uses an open software ecosystem and provides flexibility in using bare-metal programming, including Azure RTOS, FreeRTOS, other preferred RTOS, legacy code, and third-party ecosystem solutions. The combination of the flexible open architecture of the FSP plus the wide choice of third-party solutions as part of the Arm ecosystem increases the range of choice for application development. This means that developers can choose the software model that best suits their needs while utilizing Renesas’s excellent Arm-based silicon solutions as well as speed up the implementation time of complex areas like connectivity and security.
Voice Recognition Engine
Based on the Renesas ecosystem, Cyberon DSpotter (Figure 3) is a local voice trigger and command recognition solution with robust noise reduction that consumes very low resources and provides high accuracy performance.
It supports multiple languages as well as many connectivity functions and security capabilities depending on the selected MCU. The major features are listed below:
Local voice recognition algorithm, no network connectivity needed
Phoneme-based modeling
Quick command customization—removes the need to collect speech data in advance
Optimization by model adaptation just with small amount of speech data
Global language support: 44+ languages worldwide
Small footprint and cost-effective (single DMIC + RA6E1 or RA4E1)
DSMT Tool: wake-word and commands customization, performance tuning, testing, no prior neural network knowledge needed.
Separation of recognition engine and command model, switching commands dynamically
Low-power, high-efficiency RA MCU with strong security function
Results
Hit rate has been captured with voice commands mixed with different types of noise in levels suitable to create distinguished signal-to-noise ratios. The test bench shown in Figure 4 is utilized and the results are summarized in Table 1.
SNR | Background noise | Distance | Hit-Rate | Alexa Requirements |
(Clean) | none | 1.5m | 100.00% | 90% |
(Clean) | none | 3m | 100.00% | 90% |
10dB | Babble | 1.5m | 98.55% | 80% |
10dB | Babble | 3m | 98.84% | 80% |
10dB | Music | 1.5m | 98.26% | 80% |
10dB | Music | 3m | 98.55% | 80% |
10dB | TV | 1.5m | 98.84% | 80% |
10dB | TV | 3m | 98.55% | 80% |
5dB | Babble | 1.5m | 98.84% | 80% |
5dB | Babble | 3m | 96.24% | 80% |
5dB | Music | 1.5m | 98.84% | 80% |
5dB | Music | 3m | 97.08% | 80% |
5dB | TV | 1.5m | 93.37% | 80% |
5dB | TV | 3m | 90.72% | 80% |
Table 1: Results of hit rate
Conclusions
The complete implementation of an endpoint voice commands recognition system has been presented that is capable of executing on a simple MCU device. The reference design enables local voice recognition without a network connection and allows one to quickly start building an enhanced VUI in minutes that recognizes voice commands to trigger the corresponding operation.
This article is based on an e-magazine: Bringing Intelligence to the Edge by Mouser Electronics and Renesas Electronics Corporation. It has been substantially edited by the Wevolver team and Electrical Engineer Ravi Y Rao. It's the fifth article from Bringing Intelligence to the Edge Series. Future articles will introduce readers to some more trends and technologies shaping the future of Edge AI.
This introductory article unveils the "Bringing Intelligence to the Edge" series, exploring the transformative potential of AI at the Edge
The first article examines the challenges and trade-offs of integrating AI into IoT devices, emphasizing the importance of balancing performance, ROI, feasibility, and data considerations for successful implementation.
This second article delves into the transformative role of Endpoint AI and embedded vision in tech applications, discussing its potential, challenges, and the advancements in processing data at the source.
The third article delves into the intricacies of TinyML, emphasizing its potential in edge computing and highlighting the four crucial metrics - accuracy, power consumption, latency, and memory requirements - that influence its development and optimization.
The fourth article delves into the realm of data science and AI-driven real-time analytics, showcasing how AI's precision and efficiency in processing big data in real-time are transforming industries by recognizing patterns and inconsistencies.
The fifth article delves into the integration of voice user interface technology into microcontroller units, emphasizing its transformative potential.
The sixth article delves into the profound impact of edge AI on system optimization, maintenance, and anomaly detection across diverse industries.
About the sponsor: Mouser Electronics
Mouser Electronics is a worldwide leading authorized distributor of semiconductors and electronic components for over 1,200 manufacturer brands. They specialize in the rapid introduction of new products and technologies for design engineers and buyers. Their extensive product offering includes semiconductors, interconnects, passives, and electromechanical components.