The Technology behind Human-Machine Interfaces

A short guide to the current trends and challenges of human-machine interface.

author avatar

27 Mar, 2023. 8 min read

A Human-Machine Interface (HMI), commonly known as a user interface (UI), is a terminal that enables humans to interact with machines. HMI is the platform through which the communication between the user and a computer happens, generally a display and various other controls for sending data and receiving it back from the computer. An HMI can be as basic as a push-button mounted on a traffic light or as sophisticated as a voice-activated home assistant. This article explores the foundational technologies behind human-machine interfaces.

Technological Evolution of HMIs

Human-machine interfaces have evolved through various stages, each distinguished by the dominant interface technology of the time. 

In early computing, the loop of human-computer interaction was done by processing punch cards with instructions specified. This communication was not real-time and usually took hours to process and get the output. 

Developers started focusing on HMI as a discipline during the 80s. They began to envision computers accessible in people’s homes and offices and so should be operable with minimal technical knowledge. 

With the launch of the Apple Macintosh[1] in 1984, human-computer interaction made a massive leap. This era saw the interface between the computer and operator augmented with external devices such as the keyboard, mouse, and monitor. Further advancements came to HMI with an icon-based user interface by Microsoft Windows OS in the 90s. The advent of the World Wide Web (WWW) furthered the purpose of HMIs from not just a way to complete tasks but also to interact and communicate. 

This started the trend known as Social Computing[2]. Touchpads emerged not only on personal computers but on public-facing machines such as ATMs, ticket terminals, and vending machines. This interface evolved to touch screen technology as the guiding principles for HMI design moved towards making interactions more intuitive. 

Recent trends in HMI design include making tech more accessible and further reducing technology competency. 

Voice Activated Interfaces (VUIs)

VUIs allow you to interact with a device using voice commands, often without the need for physical touch.  As speaking is one of the most intuitive modes of communication, this technology has transformed the way we interact with machines by opening up access to machines when we have limited physical capability or are occupied with another task. Virtual assistants such as Amazon Alexa or Apple Siri are prime examples of VUI. 

When a user gives a command to the machine, the device receives the audio and converts it to text using a Speech Recognition engine. Once the text form is received, further processing is done using Natural Language Processing (NLP) techniques such as Question Answering[3]. A  trained model that understands the question posed by humans in natural language and provides answers. The desired output is displayed on the screen or played back to the user using text-to-speech Technology depending on the operation. There is immense research going on to improve the performance of Voice operated devices, especially in NLP, a field focused on the interactions between computers and human language, focusing on how to program computers to process and analyze large amounts of natural language data[4]. These technologies also increased the computational demands and low latency results in edge devices.[5] 

Augmented Reality (AR) and Virtual Reality (VR)

This next wave of interface technology falls under the banner of Extended reality (XR). XR is an umbrella category that covers a spectrum of newer, immersive technologies, including Virtual Reality (VR), Augmented Reality(AR) and their combination, Mixed Reality (MR)[6]. These technologies enhance human senses and deliver extra information, either in the real world or through virtual worlds for users to experience. 

Virtual Reality(VR)

VR is a completely immersive user experience in a simulated environment. VR is experienced by wearing a pair of glasses and a headset that generates realistic images, sounds and other sensations that simulate a user's physical presence in a virtual environment[7]. VR got its traction initially in the entertainment industry, especially gaming, and later extended to retail (virtual closets), education(training and simulations), and business (virtual meetups). 

Augmented Reality

As the name suggests, reality is “augmented” by adding digital and virtual content to enhance the existing world around the user. The AR is created by an app or webpage that utilizes the camera in a smartphone or tablet to superimpose the existing camera subject with additional sources such as images or audio. 

“Pokemon Go” is a popular example of AR. The mobile phone game added virtual pokemon in real-life places, which could be collected for points.  Another popular example of an AR app is Google Lens; this app uses our camera feed to detect objects, text and other material, which is then processed through multiple deep learning models to add or translate information. 

Thus, AR and VR are changing how we interact with digital environments. As HMIs, VR, and AR can enable users to intimately operate and interact with machines that are not in the same physical space. 

Computer Vision and Gesture Interfaces

Computer vision is a subfield of Artificial Intelligence that focuses on understanding images and videos. It is concerned with the automatic extraction, analysis, and understanding of useful information from a single image or a sequence of images. Some of the major applications of Computer Vision include Object Detection, Facial Recognition, and Optical Character Recognition(OCR). Applying Computer Vision to HMIs can further reduce the distance between human and machine communication by enabling humans to interact with machines using gestures and facial expressions. 

A machine that understands hand gestures and takes action can make communication more seamless and natural. Vision-based gesture recognition technology employs a camera and a motion sensor to track the user's movements and translate them in real time. Motion sensors in devices can track and interpret gestures as their primary input source. Gesture recognition has grown from an intuitive to a more formal recognition based on the improvements from experiments on sensors used for this purpose[8]

Challenges in designing HMIs

As seen how the HMIs have evolved so far, we can see that HMI technology is consistently trying to reduce the gap between humans and machines. The human brain is one of the most complex systems in the world, the complexities involved in mimicking and comprehending human communication are infinite. 

With advanced technologies such as facial recognition, the models have to be at least as accurate as humans to prevent fraudulent activities. Thus, there is a need for a controlled environment to maximize the performance of the models. This is why when you ask something to Amazon Alexa or Apple Siri when the message is not clear enough, they will throw an output such as “I couldn’t understand it, can you please repeat?” and provide some information regarding how to communicate well with it and provide some sample operations. In performance evaluation terms[11], HMIs that involve security have to maximize precision whereas the HMIs that involve entertainment related can be aimed at maximizing recall.

  Now, let's focus on a few issues that hinder the efficient design and implementation of the new HMI technologies. As HMIs became more interactive, the level of complexities of operations also went high. An effective HMI has to address a variety of tasks ranging from simple, easy-to-perform functions to managing complex control operations calls for real-time processing and robustness. The computational demand of the HMIs is high due to this level of parallel operation. The demand for advanced MPUs with low-power and high computational capability is required to make the HMI operations more intuitive. 

As discussed in terms of security-related HMIs such as phone unlocking through facial recognition, the more comfortable the security methods are, the devices are more vulnerable to breaches as well. For example, An identical twin can unlock the phone of their sibling with Face ID[12], and you can open the phone of a sleeping person by using their finger on fingerprint unlock. 

The field of Artificial Intelligence and IoT together are trying to arrive at smarter solutions that improve convenience as well as security. And in Industrial HMIs, the critical endpoint devices are exposed. The devices lack privacy, have hard-coded keys, interact with a larger system of devices, and have an irregular patching schedule. This puts them at risk of malicious access and tampering, potentially disrupting crucial operations. 

Thus the evolution of HMIs is dependent on the collaboration of IoT, Artificial Intelligence, Embedded systems, connectivity and many more. HMI needs to act as smart edge devices that can handle certain tasks themselves while effectively connecting with a connected system with multiple functionalities and group actions. 

Role of Microprocessors (MPUs) in HMIs

HMI applications are at their core reliant on microprocessors (MPU). The functionality and complexity of an HMI are enabled by the processing power, specifications and functionality included in the MPU. A low-performance processor (less than 300 MHz) can support a fundamental HMI interface, such as a touch screen display. In contrast, high-end HMIs, using technologies such as AR/VR, require a high-performance processor (more than 1 GHz) and advanced capabilities, such as a 2D and 3D graphics accelerator, and a DSP for audio and video processing.

The future of HMI

The future of the human-machine interface is increasingly complex and dynamic. As machines of all kinds become a more natural part of every day and have increased roles in the industry, the interface between them and their users must advance to new levels. These new frontiers will include increasingly intuitive interfaces such as Brain Machine Interaction (BMI).

Behind such new ways of operating technology is the constant demand for low-power, efficient MPUs to handle the requirements.

The Renesas RZ/V MPUs for High-End HMIs

Semiconductor manufacturer, Renesas, has developed MPUs, especially for the high efficiency demanding HMI applications. The RZ/V series is among those microprocessors.

The RZ/V series consists of two products:

  • The RZ/V2L with simple ISP, 3D graphics engine, and highly versatile peripheral functions.
  • The RZ/V2M features a high-performance image signal processor (ISP) with 4K/30fps support.

The Renesas RZ/V series of MPUs for vision AI incorporate Renesas' exclusive DRP-AI (Dynamically Reconfigurable Processor for AI) dedicated AI accelerator delivering excellent AI inference performance and low power consumption, targeting the vision AI market space.

The DRP-AI utilizes dynamically reconfigurable technology exclusive to Renesas to deliver design flexibility, fast AI processing, and high-power efficiency. Combining hardware (DRP-AI) and software (DRP-AI translator tool) also enhances the power efficiency. DRP-AI translator tool provides the capability to expand to more complex AI models without requiring hardware change.

RZ/V is optimized to enable IoT endpoint devices to collect, process or send the data to the cloud efficiently and securely.

The Renesas RZ MPU family aims to diminish obstacles to ingress for embedded microprocessor design. For more details about the product, please visit their website. 

References

[1] https://en.wikipedia.org/wiki/Macintosh_128K

[2] https://en.wikipedia.org/wiki/Social_computing

[3] https://en.wikipedia.org/wiki/Natural_language_processing

[4] https://en.wikipedia.org/wiki/Question_answering

[5] https://en.wikipedia.org/wiki/Edge_device

[6] https://blogs.nvidia.com/blog/2022/05/20/what-is-extended-reality

[7] https://en.wikipedia.org/wiki/Virtual_reality

[8] https://www.renesas.com/eu/en/blogs/enhance-touch-and-gesture-ai

[9] https://aws.amazon.com/what-is/iot/

[10] https://en.wikipedia.org/wiki/Edge_computing

[11] https://en.wikipedia.org/wiki/Precision_and_recall

[12] https://www.youtube.com/watch?v=uVc0gNeCu50

[13] https://en.wikipedia.org/wiki/Groot