podcast

Podcast: Did A Shrimp Fry This Rice? No, A Robot Did

In this episode, we discuss how a talented team of researchers from Stanford university have created a robotic platform capable of imitation learning for conducting over 850 common tasks, including searing shrimp!

author avatar

05 Jun, 2024. 17 min read

In this episode, we discuss how a talented team of researchers from Stanford university have created a robotic platform capable of imitation learning for conducting over 850 common tasks, including searing shrimp!


This podcast is sponsored by Mouser Electronics


EPISODE NOTES

(4:33) - Meet The Robot That Learned To Saute Shrimp

This episode was brought to you by Mouser, our favorite place to get electronics parts for any project, whether it be a hobby at home or a prototype for work. Click HERE to learn more about how the Toyota Research Institute developed a platform for robotic imitation learning and why it might be the best approach for training robots in the future! 

Become a founding reader of our newsletter: http://read.thenextbyte.com/


Transcript

Welcome back folks and just let me quickly ask you a question. Have you ever been to like a hibachi grill and just really pondered, did a shrimp really fry this rice? Well, the answer might just shock you because in the not-so-distant future it could be the robot that's frying the rice. And if you don't know what that means or you're confused, well that's okay because we're going to explain it in 3, 2, 1.

I'm Daniel, and I'm Farbod. And this is the NextByte Podcast. Every week, we explore interesting and impactful tech and engineering content from Wevolver.com and deliver it to you in bite sized episodes that are easy to understand, regardless of your background. 

Farbod: Alright friends, as you heard, today we're gonna talk about teaching robots. But before we get into today's article, let's talk about today's sponsor, and that would be Mouser Electronics. Now to the long-term fans of the Next Byte podcast, you know we love Mouser on the show because they're one of the world's greatest electronics suppliers. Now what does that actually mean? Well, as the name implies, they're one of the greatest, they're one of the biggest, so they have a lot of items that you can shop for, but they're also very well connected to industry partners, to academia and they have access to a lot of information that they occasionally like to share about the cutting-edge tech that they're encountering. Well, they made this really like shockingly well-fitting article that relates to today's article talking about the Toyota Research Institute, which I actually didn't know about before this article. An entire division just dedicated, sorry, I think it's Robotics Institute actually, TRI.

Daniel: Yeah.

Farbod: Yeah?

Daniel: Research.

Farbod: Research, I was right the first time. I'm mixing up today's article already. But the Toyota Research Institute just dedicated to R&D and exploring new ideas. Specifically, what this article goes into is robotics, which is why I confused the Rs. They talk about how robotics have been in the electronics world. You have your pick and place machines that help you pick up orders. Typically, they're very hard coded to do very specific tasks. And teaching robots to do new things is actually like a very laborious process that requires a lot of data sets. And then they kind of segue into this new approach that they've been using with a lot of success called imitation learning. Now imitation learning is where you have an expert, like a human being that's taking charge of the robot, showing it how to do its task using the mechanisms, the appendages available to the robot. And through this process, the diffusion policy forms. And I mean, in its essence, it's how the robot is connecting actions to the various sensor groups that are picking up, you know, contact vibration, whatever, to understand what it should be doing. And over the course of a few hours after it's been taught how to do something, the robot starts to actually learn how to do that thing. The reason this is impressive, like the long and short of it, is that you're able to teach robots something in a much more effective manner, much more efficient manner, and it makes the same robotic platform much more usable for different functionalities instead of being hard-coded to a specific use case.

Daniel: Well, and one thing that I really liked from this article is the specific quote where they basically said they were trying to understand what the Achilles heel of new age robotics was. And in this case, they said it's an overreliance on simulation. So instead of us teaching robots things in the real world, oftentimes we're doing millions and millions of simulations in a virtual realm to try and teach the robot to understand how exactly to do a task, how not to do a task, et cetera. It works like you're saying for very surgical applications where it's gonna pick and place the same exact part over and over and over again on a manufacturing line. But if you're trying to teach it to do something dexterous that requires human-like ability to control grippers and to adapt to different situations and to do a variety of different tasks, this type of simulation becomes really, really computationally and from a time perspective, really, really expensive.

Farbod: Absolutely.

Daniel: So, in this case, I kind of like the hidden meaning here, which is like, they said they're not going to do any simulation, but they ended up doing physical simulation, physically simulating and teaching the robot how to do something. And I think that's a perfect pathway into what we're talking about today as well, because they did a very similar method of doing physical simulation to teach a robot how to do things around the house.

Farbod: Absolutely. And now let's segue into the article. This is coming out of Stanford University, and we already kind of hinted at what they're doing, but they wanted to develop a general robotic platform that you could teach stuff to. Now, I kind of mentioned this earlier. You can think of robotics, common robotics that we think of, as ones with a very specific purpose, and the one I like to use is Roomba. I feel like pretty much everyone has heard of a Roomba. It's this little robotic vacuum cleaner in your house. It costs like $600-$700. It does a very specific function, which is it has a vacuum, and then it goes up and down your house, and it sucks stuff up, and then it goes back into a charger to charge itself. That's all it does. That's all it's hard-coded to do. Now, as a user, as someone who's investing money into these ideas, think back at the Jetsons. Did you ever watch the Jetsons?

Daniel: Yes, dude, Rosie the Robot.

Farbod: Yeah. Rosie the Robot was the GOAT and did everything. That's the future that we were promised.

Daniel: And way more than a robot vacuum, right? Rosie was able to do pretty much everything around the house.

Farbod: Exactly.

Daniel: Essentially a robotic butler.

Farbod: Literally, yeah.

Daniel: And was able to do everything around the house and didn't falter. Was actually probably better than humans at doing these tasks. Whereas in the realm we live in right now, I were to say, if you were to say you have robots doing all the cooking and cleaning in your house, I'm gonna expect that the food tastes bad. That's not cooked right. And that there's mess everywhere, because the robot's, honestly, current state of robotics aren't great at doing something other than surgical tasks, like going around vacuuming the floor every single day.

Farbod: Yeah, yeah. But, I mean, we were promised that. We're still not getting it. Could imitation learning be the thing that gets us close to that? I mean, we already kind of covered it on the Mouser side, but just to recap, imitation learning is about a human demonstrating, being the expert for the robot on how to do something. Now, imitation learning, obviously according to, we have two articles that are talking about this today, it's popular, but it also has its own limitations. Specifically, what this team is noting is that they usually suffer from a lengthy training process. So, if you wanna do a specific task, you really have to spend time as the expert to teach it how to do something to make it as robust as you possibly can. But even the policies, the diffusion policies that result from your imitation teaching process still can falter mostly, and I think I'm gonna butcher this word, due to perceptual perturbations. Is that? Perturbations?

Daniel: Perturbations.

Farbod: That's, see? I can't roll it in my tongue. Which, I mean, in layman's terms, terms that I would be comfortable with, distractors and lighting changes in the environment, especially if you're using visual sensors, can completely disrupt the cues that your policy has been looking for so far. So that's been a critical challenge to overcome for making imitation learning as popular as we would like it to be. Now, we've kind of hinted at the problem long enough. What has this team at Stanford been doing? Well, they've developed this platform called Mobile ALOHA. Now Mobile ALOHA, you can think of it like, should we refer figure one? Maybe not everyone has heard of figure one. Okay, just think of it this way. You have this thing that looks like a skateboard of some sort that you can, like a Segway, there we go, like a Segway that you can step on as the human user with controls that allow you to control robotic arms that are on the robot itself. And as the name implies, it's mobile. So, it gets to go around just like a Roomba would, but it has manipulators, specifically to grippers that can do tasks like picking something up, closing doors, etc, etc.

Daniel: And I think the main secret sauce here, right? You mentioned the mobile, but at least in terms of the robotic capabilities, they call it a bimanual robot, which means it has two hands, and these two hands have multiple degrees of freedom, it has grippers on it. Imagine you're trying to replicate all the functionality that a human arm has, but with robotics, you've got two really highly capable robotic arms attached to this moving platform with a bunch of sensors that can move itself around. It's not completely human, but it's pretty close to what Rosie the robot has in terms of tools.

Farbod: It's like gen one, you know?

Daniel: And yeah, I don't know, like you mentioned, is able to move itself around and with a person behind it with some manual controls is able to, at least during the training portion of this, is able to completely control all aspects of the robotic arms using a replica set of like joysticks behind it essentially that that allows the human to teach the robot how to move.

Farbod: Yeah. Yeah, and I think something that is humans we take for granted is the amount of dexterity and knowledge required to use two hands to accomplish a given task.

Daniel: Yeah.

Farbod: Which is apparently quite tricky for robots, you know learning that from the ground up. So, with Mobile ALOHA, obviously you have the ability for the human operator that they call it the tele operation structure. Where you take control of the robot and do a specific task so you have that ability to teach your robot on what to do but where I think it really shines Is that out of the box it comes with a static data set. So, they have 825 operations that they can do out of the box which range from like I apologize. I'm sick by the way that's why I keep my voice keeps going up and down. Which ranges from ziploc ceiling picking up a fork, slotting a battery into place, fastening a Velcro clapper. All these little things that you would expect a household robot or a kitchen robot to do. And the way they gathered this data was they had a black background. Remember, optical disturbers, I guess disruptions, are the main Achilles heel of this technology. So, they had a very nice controlled setting to teach it the baseline on how to do all these operations. Which means then as the user, you don't have to teach it how to do this stuff from the ground up. You build on existing knowledge. And you operate it, for example, in your kitchen in whatever ambient lighting you have. Maybe you operate it once during the day, once during the night, and now it has enough data to kind of keep going and get better at whatever it needs to do. Now what's interesting is, you know, usually when we talk about these articles on Wevolver, the academic journal article is much more difficult to get. Fortunately, that's not the case here. The team has actually provided a link in the Wevolver article that has all of their tests and nice little gifts laid out where the robot is doing a whole load of operations like opening up a cupboard or pushing in chairs and things like that. If you're interested in this and want to see some of those demos, highly encourage you to check it out.

Daniel: Yeah, I agree. And it's super cool. They're an open book about it. They didn't include just their paper. They included all these videos of training the robot, helping the robot to do the tasks and then, spoiler alert, the robot's able to learn and understand, do the tasks on their own. They also have a tutorial in there. They have all their data in there. They have all the hardware code. They have all the machine learning code. If this is something you're interested in, I highly recommend checking out the linked, Wevolver article and then going and checking out, I think it's their GitHub page.

Farbod: Yeah, they have a GitHub link.

Daniel: That's got all this information on there.

Farbod: And it's great that you pointed that out, because this team didn't just want to create Mobile ALOHA and just call it a day. What they really emphasize is they want this truly to be an open-source platform where any researcher or anyone can just buy parts that are fairly inexpensive, off the shelf, put one together, and keep expanding on this. Because that's where the true value of our Jetson's desirable future comes from, right? Everyone collaborating and expanding one platform.

Daniel: Exactly. And one thing I wanted to mention, because you said cost. They mentioned that most of these bi-manual, right, two arms robots with a tele-operation system, which you're saying like allows the operator to basically puppeteer the robot and force it to do these actions when it's learning them in the set environment to learn how to do different types of operations and tasks. They mentioned most of these costs upwards of a quarter of a million dollars and it was really expensive for them to be able to get one. So, they decided to design one and build one from the ground up. I think all their materials and all their fabrication costs in total came out to around $30,000. So, again, still pretty expensive, but definitely like a significant figure lower than it has been before. And always these laboratory setup things, they end up costing way more in the lab because you haven't been able to take advantage of the economies of scale with industrialization. So that gives me some hope that with series manufacturing, something similar to this might be able to make it into your house for a couple thousand dollars. As opposed to the quarter of a million that we started with before this team even began the research.

Farbod: Yeah, no, that's a great point to make. Now in terms of what the impact of this work has been. So obviously, you know we talked about at the beginning of this episode Mouser imitation learning teaching some robot how to do a very specific task. Then we talked about Mobile ALOHA which comes with a lot of demos a lot of policies out of the box. Which then get better as a human uses it to teach it how to do it in different settings. Well, they've given us a great data point here by saying, in their experience, co-training with these static data sets, roughly like showing about 50 demos to the robot on how to do something, increases the success rate of doing that task by about 90%. So, in comparison to where most are starting off, that is a pretty significant gain. When I look at this, I'm like, okay, 50 demos, that's still kind of a lot at this stage. But remember, this is like the very early stages of what this platform could be as others work on it in the future.

Daniel: And just imagine again, the juxtaposition of doing 50 training sets with a robot as opposed to doing millions of simulations. Being able to teach a robot something in one sitting to do something really effectively, again, with a 90% increase in success rate around the home. I would much rather teach a robot how to fold my laundry 50 times. If I know that it's going to do it correctly moving forward, then to try and sit and simulate these outcomes millions of times in my computer, spend a bunch of time, a bunch of energy, a bunch of electricity, a bunch of computation power to do it. And honestly, the outcomes are better with this imitation learning than in a simulation scenario in terms of trying to get it to do a wide variety of practical tasks in the household.

Farbod: I totally agree. And like, I don't know, it makes me more… I keep talking about the future. It makes me more hopeful about a future where the same piece of hardware won't just be hard-coded to one very specific task. It's gonna be flexible and expandable, and it can grow as your needs grow for whatever that piece of hardware is. Again, on the Wevolver article where they've linked their own research with the GIFs, they're using one robotic platform to sear shrimp. I don't think we pointed that out specifically, but that is one of the highlights of what it can do. Pretty impressive. Not everybody can, not... I know some people that can't even sear a shrimp properly.

Daniel: Yeah, I've been to restaurants where they didn't cook the shrimp properly.

Farbod: It's either undercooked and chewy or overcooked and rubbery, you know, like you can't win. But it sears shrimp, it opens cupboards and puts in cups. It cleans up after you, pushes in seats. That's a pretty nifty little robot, you know, and it'll only get better.

Daniel: It can call for an elevator.

Farbod: Yeah.

Daniel: It can give high fives, passing past someone in the hallway.

Farbod: For the days that you really need a high five. Yeah.

Daniel: I mean, there's a bunch of, I forget how many different physical demonstrations they...

Farbod: 825.

Daniel: 825 different demonstrations they've taught this robot.

Farbod: Can't play ping pong.

Daniel: A bunch of like fun tasks around the house, cool knickknacks, cool bells and whistles, but at the same time, a ton of really, really impactful ones that like, again, if the robot can do this really, really high success, it absolutely makes sense to have something like this in your house. And again, if they're able to get the cost down to like a couple thousand dollars. Like this absolutely makes sense to have some, something like this, I don't know, doing the crappy tasks, helping clean up in a restaurant. I know that the restaurant industry, for example, like Nelly's cousin works at a restaurant that just had to shut down because they weren't able to hire anyone. You know what I mean? Like, this is something that like for a couple thousand dollars, if you had something that went and pushed in all the chairs and cleaned up all the spills and mopped the floors and all this stuff, it's like better than a Roomba, pay for itself. And this is a job that it's not taking humans jobs. In this case, the owner couldn't even hire someone to come work in this restaurant. So, I definitely think that that could make a big impact. Personally speaking, would love if something could fold my laundry. I'm all right with the washing and drying. I just freaking hate folding laundry. I don't know why.

Farbod: You know, I'll be that guy. I have a mountain of fresh laundry in my bed that I'm probably not gonna fold tonight. You know, I just don't want to. I'm just gonna pretend like it's not there while I go to bed. I'm not ashamed, I'll say it. I'm gonna keep holding out until the robot's ready for me.

Daniel: Man, what's this lump in my mattress? Nope, it's just my clean laundry.

Farbod: It's just my clean laundry.

Daniel: I don't know, I just think from an achievements perspective, it's pretty interesting to watch this. Again, you just gotta go watch the video that's linked in the show notes. To watch this robot effectively do things really, really well after just about 50 times of learning to train how to do things. One of the things I thought was really interesting is, one of the demos they call it in specific was pushing in chairs, right? Reorganizing chairs, lining them up, pushing them back into the table. They only trained the robot on three different chairs. And then when, when they lined it up with like five chairs and then they told the robot, go push in all the chairs, it was able to push in all five, the three that it was trained on, plus the other two that were different that in a different location and look differently from the ones that it was trained on. So, it kind of shows that.

Farbod: That the fusion policy is really working. It's understanding the task.

Daniel: Start to interpret and understand the task and then go do things that are in a training set or outside of the training set that it learned from. It shows a lot of promise. Personally speaking, I know that there's still a lot of development that needs to go on. They need to do a lot of things to make the robot a lot smaller, have better freedom of movement, make it easier to train and faster to train. But in terms of mundane tasks around the house or in a restaurant or in a factory, this absolutely seems like a really huge step forward in terms of the puppeteering and imitation learning that they used as the secret ingredient to their secret sauce here. It shows a lot of promise.

Farbod: Yeah, I totally agree with you. So, to kind of wrap up the gist of this episode, the average consumer electronic, I mean, the average consumer robot, or even a commercial robot, is typically pretty hard-coded to do a very specific task. We all know that it would be nice if we could have a robot platform that was more flexible, but the question has always been, we usually train these things by giving them very explicit step-by-step instructions or a lot of simulation data, so how do we get around that? Or, imitation learning has kind of risen up as this champion of allowing human experts to show a robot how to do something a couple times, and then the robot just kind of extrapolating information from the onboard sensors and making sense of what it's been doing to learn how to do a specific task. This team at Stanford has really been pushing the boundary on that school of thought by saying, okay, in the past, imitation learning has suffered from distractions that can come up in lighting or everything else that's in the environment. So, here's how we're gonna beat it. We're gonna take these 825 common tasks in the house, show it how to do it off the bat. We're gonna create that data set ourselves. And then you as the users will show the robot how to do this in the environment that you want it to work with the lighting, the distractions or whatever. And then, you know, roughly 50 demos, it will learn how to do it 90% better than the current state of the art. Now I know that doesn't seem like a lot, but what we're really getting closer to is a future where you can have one robot in your house that can do what your Roomba does, it can do what you do, like folding laundries, cook for you, because this one shears shrimp, if you like shrimp, intent. And that's why this technology can literally change our lives in the not-so-distant future.

Daniel: I love it. I mean, I just feel like we haven't given enough credit to the awesome title we're going to put on this.

Farbod: Oh yeah. You want me to say it?

Daniel: Yeah.

Farbod: A shrimp fried this rice? No, a robot did. You like that? We actually came up with the name of this episode before we fully read the article.

Daniel: We're texting each other, Farbod is like, wait, we could title this episode, did a shrimp fry this rice? No, a robot did. I'm like, I'm in based on title.

Farbod: Sold.

Daniel: So, you might've come for the technology, but you stayed for the dad jokes.

Farbod: Oh yeah, that's what we're all about on this podcast. Who cares about the tech? We're all about the dad jokes.

Daniel: Exactly, but before we wrap up, I do wanna mention, we're in the fledgling stages of launching our newsletter. Farbod's been an absolute king in writing these newsletters.

Farbod: Team effort, baby. Teamwork makes the dream work.

Daniel: But if you haven't already signed up, we'd appreciate, you check out the link in our show notes. We're essentially taking all the amazing skills and talents that we have. They're not many, but we've got a couple of skills and talents that we've developed over the last three and a half years of publishing a podcast episode every single Tuesday about interesting and impactful technology. We're trying to take those chops and go on and conquer the newsletter world, just like we're conquered the podcast world with your help. But we need your help. So, if you could go check out the link in our show notes, sign up for the newsletter, let us know what you think. You can be a founding reader of the newsletter being one of the first couple people to sign up. Let us know what you think. Let us know we can fix let us know what we're doing. Well, we're really excited to again go conquer the newsletter world but we want to bring you along with the journey

Farbod: Get an extra dose of mediocre dad jokes in your life every single…

Daniel: There are more dad jokes in the newsletter probably more in the newsletter than the podcast.

Farbod: I know. That's the selling point.

Daniel: Yeah, if you want more dad jokes, you know where to go.

Farbod: Alright folks. Thank you so much for listening and as always, we'll catch you in the next one.

Daniel: Peace.


As always, you can find these and other interesting & impactful engineering articles on Wevolver.com.

To learn more about this show, please visit our shows page. By following the page, you will get automatic updates by email when a new show is published. Be sure to give us a follow and review on Apple podcasts, Spotify, and most of your favorite podcast platforms!

--

The Next Byte: We're two engineers on a mission to simplify complex science & technology, making it easy to understand. In each episode of our show, we dive into world-changing tech (such as AI, robotics, 3D printing, IoT, & much more), all while keeping it entertaining & engaging along the way.

article-newsletter-subscribe-image

The Next Byte Newsletter

Fuel your tech-savvy curiosity with “byte” sized digests of tech breakthroughs.