BC-Z aims to offer "zero-shot" learning for generalized autonomous robotics

15 Feb, 2022

The BC-Z system, initially trained with a mixture of VR demonstration and shared autonomony, demonstrates the potential for generalized learning.

Using a now public-access dataset, a research team has created a robotics control system which can generalize to unseen related tasks — allowing robots to interpret natural-language commands and video demonstrations.

artificial intelligence

- machine learning

- robotics

There is considerable interest in reducing how much training it takes for a machine learning system to perform a given task, from few-shot learning approaches like HyperTransformer to one-shot learning — where a network can be trained using just one sample per class, approximating the human ability to intuit and interpret.

What if, however, you could have zero-shot learning, where a machine learning system can generalize its existing knowledge and succeed at entirely novel tasks — without ever having been directly trained? That’s exactly what a team of researchers from Google, X, UC Berkeley, and Stanford University have created in BC-Z: A zero-shot task generalization approach for robotics.

Do as I do

Having a robot perform a task it has not been trained to do, based either on copying a human or by following verbal instructions, is an extremely hard problem — particular in vision-based tasks involving a range of possible skills.

“This problem is remarkably difficult since it requires robots to both decipher the novel instructions and identify how to complete the task without any training data for that task,” explain Chelsea Finn and Eric Jang, co-authors of the paper and authors of a post on the Google AI Blog discussing their work. “This goal becomes even more difficult when a robot needs to simultaneously handle other axes of generalization, such as variability in the scene and positions of objects.”

The proposed system, BC-Z, is built on previous work, with the researchers concentrating on whether understood concepts in generalization through imitative learning can be scaled to a wide breadth of tasks — demonstrating, in one test, a hundred different manipulation tasks and then instructing the robot to carry out 29 related tasks on which it had not been directly trained.

A demonstration of the BC-Z workflow: shared autonomony data collection; diverse multi-task dataset; continual policy training with video and language descriptions; and finally generalization to an unseen task. The promise of BC-Z: Taking a diverse multi-task dataset and generalizing it to previously-unseen tasks, giving robots a broader range of skills.

The team’s setup for training and testing BC-Z relied on shared autonomy teleoperation: An Oculus VR virtual reality headset was wired into the robot, with the wearer using two handheld controllers to perform tasks with a line-of-sight third-person view. As well as directly carrying out tasks as a demonstration, the operator also monitored the robot’s own attempts — intervening to correct mistakes as they happened, as a way to improve the training data.

“This mixture of demonstrations and interventions has been shown to significantly improve performance by mitigating compounding errors,” claim Finn and Jang. “In our experiments, we see a 2x improvement in performance when using this data collection strategy compared to only using human demonstrations.”

Try it yourself

The finished dataset totals 25,877 demonstrations from seven operators across 12 robots and totaling 125 hours of robot time. This dataset was then augmented with descriptions of the task being carried out — either in the form of an additional 18,726 videos of humans carrying out the same tasks or a simple natural-language written command.

Initially, the team set about proving that BC-Z was able to learn individual vision-based tasks: A single bin-emptying task, and a single door-opening task. Results were encouraging, with the model able to pick at a rate of around half the speed of the human teleoperator for the bin-emptying task and achieve a success rate of 87 per cent to 94 per cent on the door-opening tasks for trained-scenes and held-out scenes.

The project's focus, though, is on few- and zero-shot learning for previously unseen tasks: 29 held-out tasks, 25 of which use objects mixed from two different trained task families. Some tasks were language-conditioned, provided as a simple written instruction like “put the grapes in the bowl”; others were video-conditioned, communicated as footage of the task being carried out.

In these, BC-Z was able to deliver a non-zero success rate for 24 of the 29 tasks — but below the level which would be considered acceptable for deployment. Its total success rate averaged at 32 per cent, rising to 44 per cent for the language-conditioned tasks.

“Qualitatively,” the researchers note of the failures, “we observe that the language-conditioned policy usually moves towards the correct objects, clearly indicating that the task embedding is reflective of the correct task. The most common source of failures are ‘last-centimeter’ errors: Failing to close the gripper, failing to let go of objects, or a near miss of the target object when letting go of an object in the gripper.”

A cautious success

While the success rate on held-out tasks was absolutely low, seeing any success rate above zero per cent is an undeniably promising start — and the team has already been able to draw interesting conclusions, including that imitation learning can be scaled to zero-shot approaches, task-level generalization is possible with just 100 training tasks, and that frozen pre-trained language embeddings work well as task conditioners without additional training.

The team admits to a series of limitations, however: Performance on novel tasks is highly variable, the natural-language commands follow a restrictive verb-noun structure, and performance on video interpretation was notably lower than on the natural-language command tasks. Examples of the tasks BC-Z was given, starting wtih no variation in configruation, followed by increasingly complex variations. The team increased the complexity of each task, and saw impressive success — with failures largely limited to "last-centimeter" issues.

For future work the researchers have suggested using the BC-Z system as a general-purpose initialization for fine-tuning downstream tasks alongside additional training with autonomous reinforcement learning, improving the generalization of video-based task representation, and finding a way to solve a low-level control error bottleneck as a means to improve the performance of imitation learning algorithms in general.

The team’s work has been published at the 5th Conference on Robot Learning (CoRL 2021), and is available under open-access terms on the project website. Additionally, it has released the training and validation dataset on Kaggle for use under the permissive Creative Commons Attribution 4.0 International license. More information can be found on the Google AI Blog.

Reference

Eric Jang, Alex Irpan, Mohi Khansari, Daniel Kappler, Frederik Ebert, Corey Lynch, Sergey Levine, Chelsea Finn: BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning, CoRL 2021 Poster. openreview.net/forum?id=8kbp23tSGYv

Search for articles and topics on Wevolver

"machine learning"

"artificial intelligence"

"neuromorphic computing"

Explore topics

Resilient robots

Resilient robot teams juggle competing priorities without deadlock

Leah Burrows

fromHarvard University

19 Aug, 2022

artificial intelligence

- machine learning

- swarm robotics

A robot, or a group of robots, on a mission often have conflicting priorities. The primary mission of a team of search and rescue robots, for example, is to find survivors. But when they do, they also need to be able to send information to the rest of the team. The best way to find survivors may be to spread out, but the best way to send information is to stay close together. If robotic teams are going to be deployed in more real-world situations, they’re going to need a system to balance these priorities.

Computer scientists at the Harvard John A. Paulson School of Engineering and Applied Sciences (SEAS) have developed a new control mechanism for robotic teams that allow them to automatically calculate the tradeoffs of maintaining a resilient communication network and achieving their primary goal of moving through an environment. The team calculated which types of environments a robotic team could navigate while maintaining communication and which environments would force a team to break communication.

“We wanted to understand and analyze the tradeoff between making progress and maintaining a resilient communication network and build a control mechanism that can deal with those tradeoffs without deadlocking the robotic team,” said Stephanie Gil, Assistant Professor of Computer Science at SEAS and senior author of the paper.

The research was presented recently at the Robotics: Science and Systems conference, where it was selected as a finalist for best paper.

In multi-robot systems, communication is often dependent on the distance between individual robots. Many robotic teams use networked communication, relaying signals from one robot to another to cover increased distance between team members.

“A resilient communication network has multiple paths of communication between any robot on the team to any other and so even if multiple robots were to fail, there are other ways to pass the information,” said Ma

Image: This array of generated images, showing "a train on a bridge" and "a river under the bridge,” was generated using a new method developed by MIT researchers. Image courtesy of the researchers

AI system makes models like DALL-E 2 more creative

Researchers develop a new method that uses multiple models to create more complex images with better understanding.

Rachel Gordon

14 Sep, 2022

artificial intelligence

- machine learning

This article was discussed in our Next Byte podcast.

The full article will continue below.

This article was first published on MIT News.

The internet had a collective feel-good moment with the introduction of DALL-E, an artificial intelligence-based image generator inspired by artist Salvador Dali and the lovable robot WALL-E that uses natural language to produce whatever mysterious and beautiful image your heart desires. Seeing typed-out inputs like “smiling gopher holding an ice cream cone” instantly spring to life clearly resonated with the world.

Getting said smiling gopher and attributes to pop up on your screen is not a small task. DALL-E 2 uses something called a diffusion model, where it tries to encode the entire text into one description to generate an image. But once the text has a lot of more details, it's hard for a single description to capture it all. Moreover, while they're highly flexible, they sometimes struggle to understand the composition of certain concepts, like confusing the attributes or relations between different objects.

This array of generated images, showing “a river leading into mountains" and "red trees on the side,” was generated using a new method developed by MIT researchers. Image courtesy of the researchers

To generate more complex images with better understanding, scientists from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) structured the typical model from a different angle: they added a series of models together, where they all cooperate to generate desired images capturing multiple different aspects as requested by the input text or labels. To create an image with two components, say, described by two sentences of description, each model would tackle a particular component of the image.

The seemingly magical models behind image generation work by sug

BC-Z aims to offer "zero-shot" learning for generalized autonomous robotics

Using a now public-access dataset, a research team has created a robotics control system which can generalize to unseen related tasks — allowing robots to interpret natural-language commands and video demonstrations.

Do as I do

Try it yourself

A cautious success

Reference

Search for articles and topics on Wevolver

ASIC vs FPGA: A Comprehensive Comparison

MIG vs TIG welding: Picking the right process for joining metals

How to diagnose, and fix, PLA stringing in 3D printing

What Is Intelligent Supply Chain Management?

Revolutionizing everyday products with artificial intelligence

How Will Generative AI Revolutionize Our Work?