Podcast: Super Fast Computer Vision: Final Piece of The Autonomous Vehicle Puzzle

In this episode, we discuss how a PhD student from the Technical University of Eindhoven might’ve cracked the code for high performance computer vision without compromising efficiency.

Farbod Moghaddam

fromThe Next Byte

29 May, 2024. 15 min read

Podcast: Super Fast Computer Vision: Final Piece of The Autonomous Vehicle Puzzle

Topic

A.I.

EPISODE NOTES

(3:40) - The Secret to Super-Fast Robot Vision

This episode was brought to you by Murata, the market leader in leading-edge electronics components. Click HERE to learn more about Murata’s contributions to the future of autonomous vehicles.

Transcript

If you've been watching the news or scrolling on Twitter recently, you've probably seen some headlines that shake your trust a little bit in self-driving cars. And honestly, I'm wondering too, if we've got all these smart engineers working on it, how come self-driving cars aren't perfect yet? Well, we talked today about some research that is the secret of why self-driving cars don't work quite well yet, and then a new future where we can process images a lot faster and cars can drive a lot better. So, let's buckle up and jump into this one, folks.

I'm Daniel, and I'm Farbod. And this is the NextByte Podcast. Every week, we explore interesting and impactful tech and engineering content from Wevolver.com and deliver it to you in bite sized episodes that are easy to understand, regardless of your background.

Daniel: What's up peeps? Like we said today, we're talking all about the secret to giving cars super-fast robot vision. But before we talk about self-driving cars today, I wanna give a quick shout out to today's sponsor, Murata. They're the market leader in creating leading edge electronics components like multi-layer ceramic chips, inductors or connectivity modules. But what I think, that truly boils down to is they make technology that's interesting and impactful. And that aligns a lot with what we're talking about on the podcast, right? We're always talking about interesting and impactful tech. These folks at Murata, they design, produce, deliver innovative solutions that change the world. And one of the ways that they're doing this is by making autonomous vehicles come to reality. So, we've all got this hope that autonomous vehicles can one day drive us around everywhere and we'll never need to labor behind the wheel anymore. And Murata has helped making that happen.

Farbod: Yeah, you know, you hear about all these incredible companies like Cruise and Tesla that are putting in a lot of heavy lifting on the hardware and software side to make this happen, but you don't really hear about the unsung heroes that are developing some of these components that are making that vision come true, right? And here it is, we get to talk about one of those unsung heroes, and that's Murata.

Daniel: Well, and you know, it's kind of like that saying that you want to be the one selling pickaxes during a gold rush, that's exactly what Murata's doing. So, some of the solutions they offer for next generation AVs. These honestly sounds like super cool projects to work on and the components to work with. Ultrasonic cleaning devices to help prevent navigation tools like sensors from becoming completely useless when there's inclement weather. They've created a radar for both in cabin, like think about driver monitoring or child presence detection, and then also out cabin. Think about short range radars for things like doing parallel parking. The car can parallel park itself and stuff like that. They also create IMUs, which are inertial measurement units. Those are really interesting sensors that help the car understand where it's going and how it's turning and what the forces of the road are acting like back on the car. Apparently, their IMUs already support 90% of the autonomous miles driven so far in California, which is just crazy.

Farbod: Wow. Again, it's these folks that you never hear about, but at the end of the day, they're the ones that are making this dream of autonomous vehicles come true.

Daniel: And one thing, so we'll mention here, Murata is obviously at the forefront of all this autonomous vehicle technology. We're linking in the show notes something you should check out. It's part of Wevolver's autonomous vehicle report. And it's actually an interview from one of these engineers from Murata talking about how they're building trust in autonomous driving, how we can achieve future reliability and milestone achievements to basically allow us to one day trust, completely trust an autonomous vehicle when we get behind the wheel. But I guess in this case, there may not even be a steering wheel. When we get inside the car and the car drives itself, how can we get to completely trusting it? These experts from Murata did an awesome interview that we're gonna link in the show notes where they talk about a lot of these components and technologies that they're working on.

Farbod: Yeah, I'm looking forward to it.

Daniel: Well, and I think it's a perfect segue to talking about what we're talking about today is the main meat of the episode, which is how and why cars aren't great at doing autonomous driving so far today. And how this team from Technical University of Eindhoven is helping do some research to help close that gap. So, some of the background information here, robots, self-driving cars, drones, they all need to effectively see their surroundings to be able to navigate and operate safely. And some of the ways that they do that is with cameras and LIDAR, but on the backend of all that hardware, you need to do image processing and do image recognition models.

Farbod: Right.

Daniel: To completely understand what's all the context? What are all the things around me? And then identify what these different obstacles and objects are, identify where there's open space, identify where there's not space. The problem with this, these systems that exist so far today is the ones that are really, really accurate are almost always really, really slow. And the ones that are really, really quick aren't always 100% accurate. And it sounds like a normal trade-off, but when you're talking about something as high stakes as driving on the highway, you need to be able to identify accurately in a very quick manner. Otherwise, if you take a whole second to process the image of what's going on around you, this autonomous vehicle may not be able to create a decision and react in time to prevent a crash.

Farbod: Or on the other side, if you take half a second to process an event that you're processing incorrectly, that could also be disastrous. So, it's really, you can't have, you gotta have the best in both worlds. You can't give anything up. You need to have your cake and eat it too. That's, I guess, been the tough position that researchers have been in, and it's the question of how do you overcome it? Do you just keep building beefier and beefier hardware that can support all these computations that you need? And that is a trend that we're seeing, right? Manufacturers like Nvidia are developing more complex and more advanced chips, but the strategy taken here by this PhD student from Eindhoven University is actually just, how do we make these simpler?

Daniel: Yeah, exactly. And how do we try and I liked when you said, have our cake and eat it too. How do we, instead of trading off between accuracy and speed and image recognition models, how do we break outside the box of how we're doing image recognition, improve these models. And instead of saying, you know, I want to optimize for just accuracy, I want to optimize for just speed. Can we change the way these models work to make sure that we get accuracy and speed without sacrificing things like you're saying and having to get like a ton of computational power to achieve that. I want to give a quick shout out, this PhD researcher and you know, it was a researcher at the time he since earns his PhD. So, I'll call him doctor. Dr. Daan de Geus, was the one that developed these new algorithms. And one of the things that I want to mention is it's not just for autonomous vehicles, it's an awesome example, but they also mentioned that they can use this for robots, they can use this for drones, they can use this even for medical devices, the possibilities are endless, basically wherever you're using computer vision. The secret sauce here has two main ingredients in it. It's an awesome emulsion of two ingredients. The first ingredient is finding a way to get rid of inefficiencies and conflicts. And then the second one is finding a way to do abstraction to understand images at different levels. So, which of these two do you wanna dive into first?

Farbod: Let's see. Let's talk about the first one, the first bullet.

Daniel: Trying to get rid of inaccuracies and inefficiencies. So, two parts to this, I guess it was three ingredients, but. Two main techniques that are part of this first ingredient in the secret sauce is, first one, I just love challenging assumptions here. I love challenging.

Farbod: First principles thinking.

Daniel: Yeah, first principles thinking, saying why does this convention exist? Can I challenge it? Is there something we can do to do that better? I guess in a lot of these computer vision, especially for autonomous driving, there was typically a foreground model that's focusing on processing the items that are in the front of the frame. And then there's a background model that was focusing on processing images that are in the background. One of the things that Dr. Daan de Geus figured out is that they could process almost all, like 99% of the images through the background model without…

Farbod: But just passing some more information basically about the background.

Daniel: By giving it more information about the background. And essentially you didn't need a specialized foreground model. Maybe you did at the time when this convention was first developed, but he since understood that you could just use the background model only. So instead of using two models, which have to process in parallel and often provide conflicting results on the things that maybe are in the overlap between background and foreground, you can get rid of the foreground model. Use the background model only. This greatly reduces the number of inefficiencies. This greatly reduces the number of conflicts. And honestly, just increases processing speed because you're only running one model at once.

Farbod: Anyone that's ever touched parallel processing at any level will tell you like, it's just a recipe for disaster in a lot of cases. It's way more complex unless absolutely necessary. It's best to be avoided. And yeah, what the result of that is, is not only do you get efficiency gains, you get very comparable accuracy. So, you're not really losing performance on that front either. And your system just becomes more lightweight all around, which is what's not to love.

Daniel: And then you obviously reduce the propensity for there to be conflicts, because you don't have two models processing the same bit of the image, trying to understand exactly what it is. You've just got one model doing it. One of the other things that was a key part of, again, getting rid of inefficiencies, was doing image clustering. So basically, trying to group similar parts of an image together to reduce the amount of work that this model needs to do to process it. So, think about your driving and your forward view when you're looking at the windshield. A vast majority of that, maybe 30, 40, 50% of that could just be sky. And the way that most traditional models were intended is it processes every single pixel individually. What this model does first is pre-process it and say, can I group this entire cluster of pixels in the sky? These all these light blue pixels, can I group that all together and just treat it as sky? And it got really good at grouping these pixels together in things that they call clusters. So, grouping the similar parts of an image together means that it only needs to process the sky once. And all these pixels, you know, the millions and millions of pixels that made up the sky before, it can treat it as one unit, which is the sky. It processes it. I know this is sky. I know this isn't relevant to me processing the objects in the road around me. And again, for that snapshot, for that frame. It can treat the entire sky as one unit and reduce the amount of complexity, reduce the amount of consumption and the amount of computation required to process that image, allowing it to do it a lot faster and a lot more accurately.

Farbod: You know, out of this entire article, this is probably the thing that I was most excited about because as a human driver, that is pretty much how I drive, right? When I'm on the road, pretty much all of my attention is to the road and what's in front of me. I know the sky is there. It's in my field of view, but there's not a lot of brain power that's going towards processing what's happening there. Unless there's like a helicopter flying by or something, but that's besides the story. And essentially what you just explained is you have this snapshot of whatever's in front of the robot or the car or whatever, a filter that it goes through to bundle stuff together where it makes sense and then the processing happens.

Daniel: And like you're saying, it's very similar to the way that humans deal with it. I just wanted to give the inverse scenario there to imagine how computationally intensive it must be as a computer doing the traditional method. It's like me imagining driving, but every single second I have to scan everything in my field of vision to understand, oh, the important thing is the car in front of me. And then, the next second, oh, I've got to scan everything in my field of vision again to understand, oh, yeah, there's still a car in front of me. Just imagine the wear and tear that would take on your brain. I would hate driving if that was the way I had to drive.

Farbod: Good thing robots don't have feelings yet.

Daniel: This is the way that most autonomous vehicles are processing images right now. So again, using this unified model, right? So, you don't, so the car doesn't have to think about two things at once to process an image. And then also doing clustering to make sure that, it looks at one bit of the sky, knows that's the sky process it, and it's done with it, as opposed to processing the rest of it. Those are the two main parts that helped it become a lot more efficient and reduce the amount of inefficiencies, let's say that currently exist in the way that most image processing models work today.

Farbod: Yeah. And then this leads us nicely to the last bit of the sauce, which is the abstraction.

Daniel: And basically, it was understanding that there are different levels of detail in the image in front of you. And when I say in front of you, I mean in front of the car, in front of the robot, in front of the computer. So, it's trying to use different levels of detail to understand the context of the whole image, as well as its parts, and also understanding tiny parts are constituents of a larger item.

Farbod: Correct.

Daniel: Think about instead of trying to understand, oh, there's a car in front of me, instead of having to jump to that conclusion by understanding, oh, there are four tires and a license plate and a bunch of handles and a bunch of windows and some seals and some trim and some paint and some metal and some crumb. Oh, yep, that's a car. Instead of processing all that information at once, this abstraction allows the model to look at this blob in front of it and go, oh, that's a car.

Farbod: And this car has a license plate.

Daniel: This car has handles. This car has a license plate. It's very logical with the way that you and I think, right? We think about abstract objects and items in front of us before we go and mentally process all the little components that make it up. And again, this is a big step, I think, towards the way that humans learn and the way that humans think, but allowing computers to do it in that way to reduce the amount of computation required and ultimately let it make the right decision faster.

Farbod: Yeah. And another thing that was apparently an issue with this approach where you would detect every object independently instead of this abstract blob was conflicts in terms of positioning, where you would detect the license plate and be like, oh, it just doesn't actually go in this part of the car. But this other object detection said that this does go on this part of the car. So now again, you have multiple processes running in parallel, conflicting with each other. So that's another gain you get on top of it. But you know, all that rounds out to say that, what is proposed here is instead of the norm that we're seeing, or I would say the general norm that we're seeing in the AI and machine learning front, which is to advance, to become more efficient, to become, to develop more high performance systems, we need better hardware, right? What is so amazing about this article is that it takes a completely different point of view and says, how can we just actually look at all the waste and all the inefficiency we have in our current models, and how do we clean that up without giving up literally anything.

Daniel: As a hardware engineer, that makes me happy.

Farbod: I bet it does.

Daniel: Right, it's not always the hardware that's the constraint there. There's a way that we can make this software processing work a lot more lightweight and work a lot better. And this kind of takes us to the so what here. This new image processing method can process images twice as fast without losing any accuracy versus the current state of the art. So just imagine, every single autonomous vehicle on the road, being able to make the right decision or come to the right conclusion twice as fast. I can't imagine the number of autonomous vehicle car crashes that happen as a result of the car basically having analysis paralysis and not knowing what to do, not knowing whether there's a car in front of it or in some awful cases, a pedestrian in front of it, not being able to process that signal fast enough and make the right decision. If you're able to cut short circuit that cut it in half, make the car make the right decision twice as fast just by doing some tweaks in the image processing software, that's a really, really meaningful achievement. And obviously this feels really relevant to things like autonomous vehicles, allowing robots, autonomous vehicles, drones, et cetera, to operate a lot more safely, a lot more efficiency. I think he also mentioned that it works in things like medical imaging and other fields, right? So, it opens up a ton of possibilities for us to do lightweight real-time image processing in a ton of different industries.

Farbod: It's super exciting, man. To think that this was all a PhD thesis that is now gonna have a cascading effect on how we do computer vision algorithms across the board that's pretty exciting. And honestly, kudos to Dr., I don't wanna put your last name.

Daniel: de Geus.

Farbod: de Geus, there we go. I hope you celebrate it properly, my friend. Yeah.

Daniel: Yeah. Well, I will just say there's some future work that he mentioned as part of the work. Obviously, there's the caveat that image recognition is just a tiny part of the puzzle. So, you can have this awesome image recognition model. It can understand what's going on around the vehicle twice as fast. But if you don't have a good control system in place to make the right decision or to create a reaction using that extra time that you get, maybe it's not going to make as much of a difference. So, he mentioned this is just the first domino that needs to fall in a series of dominoes for us to realize this improvement in things like our autonomous vehicles around us. And one of the things that he mentioned that he's going to continue working on is developing models that can recognize more objects at once and then obviously do that part of the operation faster and more efficiently as well. Which you know takes us closer and closer to a future where and I don't think we're there yet. But it takes us closer and closer to a future where I can truly trust that my car is going to drive better than I can on the road and safer than I can on the road and get to….

Farbod: Oh, you’re not a good driver yourself.

Daniel: Oh, I think I'm a pretty good driver. I was just going to get cocky and say, maybe I am a better driver than computers, but I know that computers are already better drivers than Maryland drivers.

Farbod: Wow, wow. That's, our Maryland fans are not going to like that one.

Daniel: I don't know. I'm sorry to Noah Zippen, because he's a friend of ours and I know he's from Maryland.

Farbod: Please don't hate us, Noah. But I think it's a good point to wrap up the episode. What do you think? You want to give us a little rundown, a little summary, a little TLDR?

Daniel: Yeah, I'll give us a quick rundown. Do you ever wonder why your self-driving car isn't perfect yet? The secret actually lies in how cars see the world around us. A future where cars can drive us and drive us around and we can trust them is an awesome future, but current technology struggles a lot with slow, inaccurate image processing, making it hard for cars to navigate safely through the world around us. So, some researchers have worked on new algorithms that speed up image recognition without suffering any accuracy. So, by grouping similar parts of the images together, and by using different levels of detail to understand the image, this software can now process images two times as fast without losing any accuracy. This can help robots, drones, medical devices, and even autonomous vehicles see the world better and work faster. I think it's gonna be super interesting, so I'm definitely gonna stay posted to see how this breakthrough works, if it's the first domino that falls and a series of dominoes to improve the way that self-driving cars work in the world around us.

Farbod: Money. Love it. All right, folks, thank you so much for listening. And as always, we'll catch you in the next one.

Daniel: Peace.

As always, you can find these and other interesting & impactful engineering articles on Wevolver.com.

To learn more about this show, please visit our shows page. By following the page, you will get automatic updates by email when a new show is published. Be sure to give us a follow and review on Apple podcasts, Spotify, and most of your favorite podcast platforms!

The Next Byte: We're two engineers on a mission to simplify complex science & technology, making it easy to understand. In each episode of our show, we dive into world-changing tech (such as AI, robotics, 3D printing, IoT, & much more), all while keeping it entertaining & engaging along the way.

Topic

A.I.

EPISODE NOTES

Transcript

The Next Byte Newsletter