Robots learning to do things by watching how humans do it is how the future will work, according to Stanford’s Animesh Garg and Marynel Vázquez. The two researchers shared their findings in a talk at the recent NVIDIA GPU Technology Conference.
Generalisable autonomy is the idea that a robot can observe human behaviour, and learn to imitate it in a way that is applicable to a variety of tasks and situations.
For example, a robot could learn to cook by watching YouTube videos. Garg, a postdoctoral researcher at the Stanford Vision and Learning Lab (CVGL), noted that many robots today excel at single tasks, but the ideal is to have a general-purpose robot.
The means to this may lie in neural task programming (NTP), a new approach to meta-learning that aims to teach a robot how to do any task within a domain instead of one specific task.
NTP learns to program with a robot API to perform completely new tasks after viewing a single test example. For instance, a robot chef would take a video about cooking spaghetti as a test example, and deconstruct the video data into separate steps using what Garg calls a "structured representation of the task based on visual cues as well as temporal sequence".
With NTP, the robot could then apply cooking skills it has learned like boiling water, frying meat and simmering sauce to other situations.
NTP-trained robots have already seen promising results for unseen (new) tasks, while performing as well as robots trained without NTP on seen tasks.
Vázquez, in the other hand, is teaching robots to navigate crowds and avoid the awkward situation where two people try to pass each other, but fail because they both move in the same direction simultaneously, so they continue to block each other.
Jackrabbot, CVGL’s social navigation robot, first hit the sidewalks in 2015, making small deliveries and travelling at pedestrian speeds below five miles per hour. As Vázquez explained, teaching Jackrabbot to move through unstructured spaces — for example, the real world — is a multifaceted problem.
“Safety is the first priority,” Vázquez said. From there, the challenge moves into predicting and responding to the movements of lots of people at once.
To tackle safety, they turned to deep learning, developing a generative adversarial network (GAN) that compares real-time data from JackRabbot’s camera with images generated by the GAN on the fly.
These images represent what the robot should be seeing if an area is safe to pass through, like a hallway with no closed doors, stray furniture or people standing in the way. If reality matches the ideal, JackRabbot keeps moving. Otherwise, it hits the brakes.
From there, the team turned to multi-target tasking, aka “tracking the untraceable”. A robot has to go beyond the immediate assessment of “is my path clear?” to tracking multiple people moving in different directions, and predicting where they are moving next. Here the team built a recurrent neural network for what we find to be common sense about moving in crowds.
The newly announced JackRabbot 2.0 (with dual NVIDIA GPUs onboard) incorporates this new expertise.
Generalisable autonomy is the idea that a robot can observe human behaviour, and learn to imitate it in a way that is applicable to a variety of tasks and situations.
For example, a robot could learn to cook by watching YouTube videos. Garg, a postdoctoral researcher at the Stanford Vision and Learning Lab (CVGL), noted that many robots today excel at single tasks, but the ideal is to have a general-purpose robot.
The means to this may lie in neural task programming (NTP), a new approach to meta-learning that aims to teach a robot how to do any task within a domain instead of one specific task.
NTP learns to program with a robot API to perform completely new tasks after viewing a single test example. For instance, a robot chef would take a video about cooking spaghetti as a test example, and deconstruct the video data into separate steps using what Garg calls a "structured representation of the task based on visual cues as well as temporal sequence".
With NTP, the robot could then apply cooking skills it has learned like boiling water, frying meat and simmering sauce to other situations.
NTP-trained robots have already seen promising results for unseen (new) tasks, while performing as well as robots trained without NTP on seen tasks.
Source: Stanford Vision and Learning Lab via NVIDIA blog. The Jackrobot project. |
Jackrabbot, CVGL’s social navigation robot, first hit the sidewalks in 2015, making small deliveries and travelling at pedestrian speeds below five miles per hour. As Vázquez explained, teaching Jackrabbot to move through unstructured spaces — for example, the real world — is a multifaceted problem.
“Safety is the first priority,” Vázquez said. From there, the challenge moves into predicting and responding to the movements of lots of people at once.
To tackle safety, they turned to deep learning, developing a generative adversarial network (GAN) that compares real-time data from JackRabbot’s camera with images generated by the GAN on the fly.
These images represent what the robot should be seeing if an area is safe to pass through, like a hallway with no closed doors, stray furniture or people standing in the way. If reality matches the ideal, JackRabbot keeps moving. Otherwise, it hits the brakes.
From there, the team turned to multi-target tasking, aka “tracking the untraceable”. A robot has to go beyond the immediate assessment of “is my path clear?” to tracking multiple people moving in different directions, and predicting where they are moving next. Here the team built a recurrent neural network for what we find to be common sense about moving in crowds.
The newly announced JackRabbot 2.0 (with dual NVIDIA GPUs onboard) incorporates this new expertise.
No comments:
Post a Comment