We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
Let the OSS Enterprise newsletter guide your open source journey! Sign up here.
Facebook today open-sourced Droidlet, a platform for building robots that leverage natural language processing and computer vision to understand the world around them. Droidlet simplifies the integration of machine learning algorithms in robots, according to Facebook, facilitating rapid software prototyping.
Robots today can be choreographed to vacuum the floor or perform a dance, but they struggle to accomplish much more than that. This is because they fail to process information at a deep level. Robots can’t recognize what a chair is or know that bumping into a spilled soda can will make a bigger mess, for example.
Droidlet isn’t a be-all and end-all solution to the problem, but rather a way to test out different computer vision and natural language processing models. It allows researchers to build systems that can accomplish tasks in the real world or in simulated environments like Minecraft or Facebook’s Habitat, supporting the use of the same system on different robotics by swapping out components as needed. The platform provides a dashboard researchers can add debugging and visualization widgets and tools to, as well as an interface for correcting errors and annotation. And Droidlet ships with wrappers for connecting machine learning models to robots, in addition to environments for testing vision models fine-tuned for the robot setting.
Modular design
Droidlet is made up of a collection of components — some heuristic, some learned — that can be trained with static data when convenient or dynamic data where appropriate. The design consists of several module-to-module interfaces:
- A memory system that acts as a store for information across the various modules
- A set of perceptual modules that process information from the outside world and store it in memory
- A set of lower-level tasks, such as “Move three feet forward” and “Place item in hand at given coordinates,” that can affect changes in a robot’s environment
- A controller that decides which tasks to execute based on the state of the memory system
Each of these modules can be further broken down into trainable or heuristic components, Facebook says, and the modules and dashboards can be used outside of the Droidlet ecosystem. For researchers and hobbyists, Droidlet also offers “battery-included” systems that can perceive their environment via pretrained object detection and pose estimation models and store their observations in the robot’s memory. Using this representation, the systems can respond to language commands like “Go to the red chair,” tapping a pretrained neural semantic parser that converts natural language into programs.
“The Droidlet platform supports researchers building embodied agents more generally by reducing friction in integrating machine learning models and new capabilities, whether scripted or learned, into their systems, and by providing user experiences for human-agent interaction and data annotation,” Facebook wrote in a blog post. “As more researchers build with Droidlet, they will improve its existing components and add new ones, which others in turn can then add to their own robotics projects … With Droidlet, robotics researchers can now take advantage of the significant recent progress across the field of AI and build machines that can effectively respond to complex spoken commands like ‘Pick up the blue tube next to the fuzzy chair that Bob is sitting in.'”
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn more about membership.