podcast

Podcast: Blind Spot Begone! Cars and Robots Can Now See Around Corners

In this episode, we discuss a novel new approach for increasing the field of view of autonomous systems by leveraging reflections of blindspots on reflective bodies.

author avatar

26 Jul, 2023. 13 min read

In this episode, we discuss a novel new approach for increasing the field of view of autonomous systems by leveraging reflections of blindspots on reflective bodies. 


(0:47) - Using reflections to see the world from new points of view 


Transcript

What's up, folks? In today's episode, we're talking about how blind spots are now gone. This team from MIT and from Rice University has found a really awesome way to help cars and robots and really anything around you using computer vision to see around corners, see through walls and better navigate through the world. I think it's a really interesting one, so we're going to jump right in. 

I'm Daniel, and I'm Farbod. And this is the NextByte Podcast. Every week, we explore interesting and impactful tech and engineering content from Wevolver.com and deliver it to you in bite sized episodes that are easy to understand, regardless of your background. 

Daniel: What's up, folks? Like we said today, we're talking about a way that MIT and Rice University are collaborating together to help robots and cars and drones see around corners and kind of achieve the impossible, like almost like X-ray vision, so to speak. We're talking about superpowers. We're giving computer vision systems a superpower here. And it all stems from this problem, which is the fact that they often struggle with reflections. So, we're turning a weakness of computer vision systems, right? Reflections are distorted. They're partial views of the world. So, we're turning that weakness into a super strength by, you know, we'll dive into it with the secret sauce. But these researchers have enabled computers to be able to rely on reflections to actually get more context, more visual context from the world around us, as opposed to reflections being a weakness of their vision systems. 

Farbod: And I mean, taking computer vision out of the equation for a second, right? Even as a normal driver, just think about the number of times where reflections come in handy for you. Like there's a, there's a very narrow street next to where I work. So, as you're coming out of the garage, you can't really see the incoming traffic on your right. So, they put up like this domed mirror that allows you to see like what's coming around the corner so that you can make the decision as safely as possible. So, if you have objects that can reflect different areas that your direct site, your direct perspective can't pick up, it's incredibly handy as a driver. So now take the human out of the equation and think about autonomous systems that obviously have a lot of gadgets on board. Some of them have LIDAR sensors picking up their surroundings. Others have cameras that are literally trying to ingest, I mean, digest the data coming in from the surrounding world and make decisions about what to do. If you have that added information of a reflection that tells you what's happening in your blind side, that's incredibly helpful. 

Daniel: Yeah, I'll just say, right, the way that if you were trying to completely remove the advantage that reflections give us from navigating throughout the world, which is actually something that a lot of computer vision systems do right now because they aren't complex enough to understand what's going on with a reflection. It severely limits the system's ability to fully understand and interact with the environment. If you don't have the context of this big bubble mirror across the street showing you when a car might be coming around the corner and you're trying to autonomously control a car and you're waiting for it to feel 100% safe before you're able to pull out, you're either going to be one, over indexing on safety and the car is going to get stuck there forever and you're going to need human intervention. Or the other way around is you're not going to index highly enough on safety and you're going to cause a collision. Both of those are subpart outcomes. These problems, obviously, it's really crucial in technologies like autonomous vehicles, but they also plague stuff like, you know, normal, I say normal household robots, but household robots that are trying to navigate around your home or around the neighborhood, food delivery robots, package delivery robots, also drones, right? Anything that's trying to navigate through this world with a partial field of view or, you know, even a normal or even superhuman field of view, but excluding reflections, it's going to have a big problem trying to understand and navigate through the world around us. When you can leverage a reflection from shiny objects, you're actually able to gain additional perspectives on the environment. You're able to see even more than the naked eye or than the eye, ignoring the reflections, could see and it's just it goes back to the way we do with our human eyes, the way that you described it Farbod, right, with the bubble mirror across the street, trying to pull out of the parking garage. You've got a superhuman field of view when you're able to consider what's going on in that bubble mirror. You can see around the corner to see if a car is coming. You can see at a field of view that you may not be able to see from your perspective, sitting in the driver's seat. We're trying to leverage that same phenomenon with robots and computers and machines, or this team from MIT and Rice has done that through this computer vision technique called ORCA. 

Farbod: Now, we've set up, you know, the problem, we've set up the advantage of what this potential solution could have. So, let's dig a little bit into ORCA, which, by the way, stands for Objects as Radiance Field Cameras. Pretty self-explanatory. But what it comes down to is when you see a reflection of something, at least to us, we're perceiving a 2D image, right? And that is not super useful, especially for an autonomous system that needs to understand things like how far away are you from that object that's reflecting that thing? And, you know, thankfully, as we perceive something, we can understand that, like, oh, that's object, like, it might look a little bit vague in the reflection, but our brains can do the processing of making sure that it's actually a car and not like something random that we're seeing. That level of insight does not exist when you're purely just looking at the image so some level of processing needs to happen. So, what this team is doing is not only do they take a single image of the object that's reflecting a scene or other objects, they take multiple images at once. And then they look at the, they call it a 5D composition. And those two extra dimensions, you know, you have your X, Y and Z, that's just space. Those two extra dimensions are the intensity and the vector of light. So, you know, what angles are they calculating light is incoming into the reflection and how bright is it? 

Daniel: Well, in that aspect, there is the R in Orca, right, radiance field. That’s a technical term for using a neural network to turn a 2D image, take into consideration the light intensity and direction and turn that into a 3D map. So, what they're doing is they're combining the three dimensions that they're able to view with the camera. In addition, they're using the light intensity and direction off of these 2D reflections to add a couple extra dimensions. That's why they call this five-dimensional radiance field. They're able to glean a lot of extra contexts from the world around us based on the reflection. One key part of that, though, I would say, and in my mind, it's the secret ingredient in the secret sauce here, is the fact that they have to understand the geometry of the shiny object that the reflection is bouncing off of. That's how they're able to gain this extra understanding of the world is by already knowing the shape of the object that's generating a reflection. 

Farbod: Which totally makes sense, right? Because if we go back to the example I used earlier about the domed glass, right? The radius of the dome tells you how much you're distorting that image. Again, the human mind can probably like process it, but there's probably a note that says, like, you know, object is like this far away. But when you're trying to process it to the point of making a decision of, you know, as this vehicle is coming at me at 50 miles an hour and I'm trying to make this exit, am I going to have enough time? Those the accuracy of the measurement that you're making needs to be incredibly precise.

Daniel: Well, and I would say, like you said, the human brain is pretty good at it. There are still limitations there as well. If you know, like we've got a legal warning printed on most of the side view mirrors on cars that says, like, objects in mirror are closer than they appear. Because of the curvature of that mirror, you've got to have a written warning there saying, hey, the geometry of this mirror is a little funky. It's not going to look the same as a flat mirror that you're used to. The objects in the mirror are actually a little bit closer than they appear. If you go and like, look at this mirror, that caveat right there that's on the side view mirror on your car is the same level of context that this team from MIT and Rice is trying to bring to this computer vision system, which is if you can understand the geometry of that mirror and how it might affect the way that light reflects off of it, you can gain a much more comprehensive understanding of the geometry that you're trying to guess about based on that 2D image. 

Farbod: Absolutely. No, that's a really good point. And like, now that we have, I guess, the fundamental understanding of how this approach works, I think it's good to kind of discuss how it compares to like other methods that might have tried to tackle the same problem. So apparently, this is not the first time that someone has tried to bring like meaningful information out of reflections for navigation systems for autonomous systems. But as far as I can tell from reading this article, where most have struggled has been the determination of the geometry and texture of the objects that they're detecting within a scene. And that's where this algorithm really shines. And I guess the other parameter that they were comparing it to is one distinguishing factor within a scene is the colors of different objects that you're perceiving. And this is able to perform on par with the baseline that is the other methods that researchers have come up with in the past. 

Daniel: It's pretty sweet, dude. And I think my favorite part of this entire thing is the fact that we're, and I kind of alluded to it earlier, right? We've long viewed reflections and distorted reflections as a shortcoming of computer vision systems. If you're dealing with a reflection, you're viewing with a distorted partial view of the world, and it severely limits the system's ability to actually understand what's going on, to put itself in a, you know, if it's like trying to do real like live mapping to try and understand the geometry that's around it. By kind of cracking the code on 3D mapping of these glossy objects, understanding the 3D geometry of that glossy object and then using that as context through which to filter the reflection. This team from MIT and Rice, we'll give them a quick shout out here. There's Kushagra Tiwary, a graduate student in the camera culture group at MIT, Akshat Dave, a graduate student at Rice. Nikhil Behari, MIT research associate, and then Tzofi Klinghoffer, MIT graduate student, and Ashok Veeraraghavan, professor at Rice University, Ramesh Raskar, professor at MIT. The reason I'm reading their names, I think this is really impressive. So, if those folks are listening or you know those folks that are listening, let them know that we think this is cool. I think it's a good part here, a good way here for us to kind of segue into the so what, right? The significance of what they have achieved so far and what this could potentially mean for other research groups or for the same research group if they were to develop it further in the future. 

Farbod: For sure. Like there's the, I guess, immediate value add that we discussed earlier in the episode, which is, you know, let's say you are a autonomous vehicle. You're moving through a narrow street. There's a truck. There's something behind it. You want to make sure you're safe. There's a reflection. You can now leverage the information within that reflection to understand what's happening behind that truck. And that's great. Then there's the, this hidden secret part of the secret sauce that not only can you get that like direct information, but once that 5D image has been constructed, you get insight from any perspective within that environment that you've now reconstructed. So, where this really comes into play is, you know, the so what, the future planning of what this group wants to do. And you alluded to it earlier, like this would be super helpful in drone technology. They've talked about how they want to strap this thing and utilize it within a drone that as it flies over stuff and it picks up stuff that is reflecting the surfaces on the ground, they can recreate the entire scene that that object has been picking up on the ground. And where my brain goes immediately is like Google Maps. We've all come to like love it. And there's the street view, which, you know, you've probably seen a Prius walking around your neighborhood, picking up all the images and whatnot. But now imagine you're able to accomplish the same level of coverage of entire neighborhoods by just having a swarm of drones flying around, like at a very high altitude, just scanning things that are reflecting other surfaces and being straight up being able to map out the environment just as precisely as those Priuses are. 

Daniel: Yeah, I think that's like, it wasn't immediately apparent to me how this could impact more than just autonomous vehicles, right? But that's a great example of it. I think another thing that I want to hit on here is just kind of the significance of being able to Based on a 2D image backwards solve the geometry of the shiny object that has been reflecting the image. So, they don't just need or they don't necessarily need to know and have studied the 3D geometry of every single reflective surface by being able to dive into the light intensity and direction. In that reflection, they're able to also deduce the rough geometry of the glossy object that's reflecting the light too. So, they talk about in the paper how they plotted something. I think it was like a mug with a shiny ball on top and these were glossy objects. But by studying the surroundings around it and then the light intensity and radiance reflecting off of those objects, they're able to accurately plot the 3D geometry of those objects as like metallic mirror shaped objects without having known and studied that geometry ahead of time. I think there's probably some interesting application for that in surveillance or in some other sort of technology by being able to deduce geometry of objects based off the reflection coming off of it. I don't know. I can't think of something right now other than surveillance, but I'm sure there's going to be some like awesome technology that cracks up because of this discovery.

Farbod: No, I agree. You're absolutely right. And if that wasn't enough, they've also talked about how they want to expand the system that they've developed and the algorithm. Basically, the same level of processing, the same approach they've taken by taking something that was distracting and kind of a nuisance, which was reflections. Now they want to tackle shadows because shadows, they're also a byproduct of light reflecting or not reflecting but interacting with another object. So, you should be able to deduce what that object is by understanding the angle that the shadow is being cast from, etc, etc. So, they want to incorporate shadow deduction as well. In addition to taking as far as I could understand multiple reflections at once and bringing them together like of different objects from different reflecting objects and bringing them together to compose a single 5D scene for the algorithm to digest.

Daniel: I think it's great, man, that they've nailed it. And I was excited to talk about this one. Obviously, it's cool that we're giving cars and robots this ability to see around corners when right now that's like a huge shortfall of the way they navigate through the world. On top of that, all the extra tidbits we talked about at the end here around how this might apply to augmented reality, how we can help study shadows, how this might help with surveillance. These are all really, really interesting parts of the paper that I didn't pop out to me at the surface level the first time I skimmed through it.

Farbod: Agreed, agreed. Now before we wrap up, do you want to do a quick rundown of everything we talked about in like an ELI5 format?

Daniel: Yeah, I'll do that right now. So big problem, robots, cameras, autonomous cars, they can't see around corners. And they're actually a big crutch or a big limitation of theirs right now is that reflections make it really, really challenging for them to navigate through the world around us. These smart scientists from MIT and Rice actually found a way to use the reflections in shiny things to help vehicles see around corners, to help robots see through walls. And they're using your normal regular household shiny objects like coffee mugs or paperweights. So, they took a bunch of pictures of these shiny objects from different directions, mapped all the reflections. And by using their special proprietary method, they can turn these reflections into a five-dimensional map of what's around and then as well as the light that that's hitting these objects. So, what this can help self-driving cars see around obstacles, helping them navigate through the world better. It can also help drones see the ground better and robots navigating through your neighborhood won't crash into things as often because they'll be able to use reflections to understand context to the world around us just like our human eyes do. 

Farbod: Perfect. As always, you killed it. No surprise there. Folks, thank you so much for listening. Before we wrap this up quickly, I want to thank our friends in Azerbaijan. You guys are still making us trend in the top, I think 100 or 150 of tech podcasts. Thank you so much for the love. We hope to see you again next week on those charts and for everyone else. As always, we will catch you next one. 

Daniel: Peace. 

-------

That's all for today The NextByte Podcast is produced by Wevolver, and to learn more about the topics with discussed today visit Wevolver.com.

If you enjoyed this episode, please review and subscribe, via Apple podcasts Spotify or one of your favorite platforms. I'm Farbod and I'm Daniel. Thank you for listening and we'll see you in the next episode.


As always, you can find these and other interesting & impactful engineering articles on Wevolver.com.

To learn more about this show, please visit our shows page. By following the page, you will get automatic updates by email when a new show is published. Be sure to give us a follow and review on Apple podcasts, Spotify, and most of your favorite podcast platforms!

--

The Next Byte: We're two engineers on a mission to simplify complex science & technology, making it easy to understand. In each episode of our show, we dive into world-changing tech (such as AI, robotics, 3D printing, IoT, & much more), all while keeping it entertaining & engaging along the way.

article-newsletter-subscribe-image

The Next Byte Newsletter

Fuel your tech-savvy curiosity with “byte” sized digests of tech breakthroughs.