Fireside chat with Head of Machine Learning at Edge Impulse Daniel Situnayake
#10 of our Voice of Innovation fireside chat series: Robotics and AI reporter Rachel Gordon speaks to Daniel Situnayake, a founder, engineer, and teacher, on what it means to run sophisticated machine learning algorithms on small devices at the edge of a network.
#10 of our Voice of Innovation fireside chat series: Robotics and AI reporter Rachel Gordon speaks to Situnayake, a founder, engineer, and teacher, on what it means to run sophisticated machine learning algorithms on small devices at the edge of a network.
Wevolver and Syntiant are creating a series that explores innovators' work and the future of pervasive AI. Syntiant is developing ultra-low-power AI processors. Because they believe in the importance of innovation, Syntiant engages in fireside chats with engineers and designers on the cutting edge of their field.
In previous conversations we spoke with Senior Director at Qualcomm Research Evgeni Gousev, CEO and founder of Women in Voice Dr. Joan Palmiter Bajorek, The Things Industries CEO Wienke Giezeman, Rev.com's Daniel Kokotov, Star Wars Animatronic Designer Gustav Hoegen, and Fashiontech Designer Anouk Wipprecht.
Why tinyML isn't huge (yet)
For tinyML's outsized influence, it isn't yet ubiquitous. While not precisely nascent but not old, processing data at the edge has only recently become possible with the convergence of different technologies. But with constraints on the environment and memory to make everything small, the technical challenges are large. Sintunakye aims to make things a little easier at Edge Impulse, where he is the Founding TinyML Engineer.
"We've got these new possibilities on the algorithm side that have matured along the same timeline as the tooling that allows regular people to train and deploy models - and this has been pretty inaccessible to regular developers until quite recently. At the same time, we've had the hardware maturing and getting to the point where we've got 32-bit microcontrollers that can run these types of algorithms, are fast and have enough memory to be useful, and are at a price point that makes them accessible for a vast array of projects," says Sintunakye. "Then we have new hardware emerging, like accelerators designed specifically to run deep learning models on devices without too much energy. We're just arriving at the moment where all of these things have started to come together, where you can get accessible tooling and find effective algorithms for solving the right types of problems. Finally, we've got the hardware to run these things in the field. It's just the beginning of the journey because there are still critical problems to solve."
Where tinyML can shine
Decision-making algorithms are powerful little forces made up of sequencing, selection, and iteration, made strong by a steady diet of data. Real-time insights from such data are a crucial currency for immediate action. One high-stakes area Sintunakye is particularly drawn to is conservation. "Some of the first people to adopt it was conservation community researchers who needed ways to understand what's happening in potentially remote locations where there isn't good connectivity."
Suppose you wanted to monitor the population of an animal in a specific location, for example. In that case, you might put out a bunch of motion-activated cameras that snap a picture anytime something moves. You can collect the photos, count the animals, and have an understanding of the population. "The problem is, imagine, and you've got this camera trap somewhere in the middle of the jungle. It's got a memory card on, and anytime any motion happens, it will take a picture and start filling up the cards. Maybe you care about tigers wandering around in the jungle, but a squirrel often comes past, sets off the imagined sensor, and takes a picture. You end up with a memory card full of squirrels. You have to spend maybe two thousand dollars flying out to some remote location, hike through the jungle for four days to get the camera, and then all it has is a bunch of selfies from squirrels. It's exciting to see this brand-new technology come into play, and immediately people have found a way to use it for something truly beneficial."
At Edge, Sintunakaye's efforts to make tinyMlL as seamless as possible to aid these types of use cases and beyond involved taking the open-source code from TensorFlow Lite Micro and its contributors and "wrap that in some code generation magic, where we take a model and instead of interpreting it and calling out to these different functions, we generate some code that directly calls the functions in the right order, passing in the right data to execute the model." By doing that, he says, they've been able to reduce the overhead involved with running the model. "It uses less RAM and a little bit faster, so this is just one approach to making deep learning run quickly on embedded devices."
Does the future of tinyML involve large language models?
Many of the most exciting machine learning systems that have recently caught the mainstream's attention have been enormous: large language models: GPT-3, DALL-E. These, of course, don't fit on embedded systems – but are there applications for large models running on an embedded context? Sintunakye believes we're just at the beginning of this journey of optimizing hardware to run deep learning workloads. While it's still early, he feels a massive curve of improvements in the amount of computing you can do with this specific power budget or on a cheap device is coming – and it's happening fast.
"In the beginning, no one was running vision models on microcontrollers, and now, everyone's running vision models in real-time on microcontrollers. So a couple of years from now, I can imagine where we're doing real-time transcription with outstanding audio accuracy. We assume you've got something like the Google Assistant or Siri that gets woken up by some on-device keyword-spotting algorithm. Still, after that, it captures everything you're saying and sends the audio to a server to be processed by a big model."
But this - according to Sintinyake, will likely go away in pursuit of doing everything on the device. And we're already seeing it - phones with built-in on-device transcription models, for example. "They're still too big to run performantly on a microcontroller, but with the types of accelerators and types of architectures that people are working on right now and even on the algorithm side of things, like structured sparsity and deeper levels of optimization, we're going to be able to improve. Getting to something like a GPT-3 model running on a constrained device will take a while. There are, of course, going to be trade-offs along the way, but I think this is coming. I'm so excited to see the types of applications people can build once you can embed speech transcription, intent matching, and text generation into, say, a greetings card microcontroller."