Researchers have developed a new technique, called MonoCon, that improves the ability of artificial intelligence (AI) programs to identify three-dimensional (3D) objects and how those objects relate to each other in space, using two-dimensional (2D) images. . For example, the work would help AI used in self-driving vehicles navigate around other vehicles using 2D images it receives from an on-board camera.
“We live in a 3D world, but when you take a picture, it records that world in a 2D image,” says Tianfu Wu, corresponding author of a paper on the work and assistant professor of electrical and computer engineering at the state of North Carolina. University.
“AI programs receive visual information from cameras. So if we want AI to interact with the world, we need to make sure that it is able to interpret what 2D images can tell it about the world. In this research, we focus on part of that challenge: how to get AI to accurately recognize 3D objects, such as people or cars, in 2D images, and place those objects in space. ”
While the work could be important for autonomous vehicles, it also has applications for manufacturing and robotics.
In the context of autonomous vehicles, most existing systems rely on lidar, which uses lasers to measure distance, to navigate 3D space. However, lidar technology is expensive. And because lidar is expensive, stand-alone systems don’t include much redundancy. For example, it would be too expensive to put dozens of lidar sensors on a mass-produced driverless car.
“But if an autonomous vehicle could use visual inputs to navigate through space, you could create redundancy,” Wu said. “Because cameras are significantly cheaper than lidar, it would be economically feasible to include additional cameras, which would create redundancy in the system and make it both more secure and more robust.
“It’s a practical application. However, we are also excited about the fundamental breakthrough of this work: that it is possible to obtain 3D data from 2D objects.”
Specifically, MonoCon is able to identify 3D objects in 2D images and place them in a “bounding box”, which effectively tells the AI the outermost edges of the object in question.
MonoCon builds on a substantial amount of existing work aimed at helping AI programs extract 3D data from 2D images. Many of these efforts train the AI by “showing” it 2D images and placing 3D bounding boxes around objects in the image. These boxes are cuboids, which have eight points – think of the corners of a shoebox. During training, the AI receives 3D coordinates for each of the eight corners of the box, so the AI ”understands” the height, width and length of the “bounding box”, as well as the distance between each of these corners and the camera. The training technique uses this to teach the AI how to estimate the dimensions of each bounding box and asks it to predict the distance between the camera and the car. After each prediction, the trainers “correct” the AI by giving it the correct answers. Over time, this allows the AI to better and better identify objects, place them in a bounding box, and estimate object dimensions.
“What sets our work apart is the way we train the AI, which builds on previous training techniques,” Wu said. “Like previous efforts, we place objects in 3D bounding boxes when training the AI. However, in addition to asking the AI to predict the camera-object distance and the dimensions of the bounding boxes, we also ask the AI to predict the locations of each of the eight points of the box and its distance from the center of the two-dimensional bounding box. We call this “auxiliary context”, and we’ve found that it helps AI more accurately identify and predict image-based 3D objects 2D.
“The proposed method is motivated by a well-known theorem in measure theory, the Cramér-Wold theorem. It is also potentially applicable to other structured output prediction tasks in computer vision.”
The researchers tested MonoCon using a widely used benchmark data set called KITTI.
“At the time of submitting this article, MonoCon performed better than any of the dozens of other AI programs aimed at extracting 3D automotive data from 2D images,” Wu said. MonoCon has did well at identifying pedestrians and bicycles, but was not the best AI program for these identification tasks.
“Moving forward, we are expanding this and working with larger datasets to evaluate and refine MonoCon for use in autonomous driving,” Wu said. “We also want to explore applications in manufacturing, to see if we can improve the performance of tasks such as using robotic arms.”
The paper, “Learning auxiliary monocular contexts aids in monocular 3D object detection,” will be presented at the Association for the Advancement of Artificial Intelligence’s Artificial Intelligence Conference, taking place will be held virtually from February 22 to March 1. The first author of the article is Xienpeng Lu, a Ph.D. student at NC State. The article was co-authored by Nan Xue from Wuhan University.
Poor weather data could help self-driving vehicles see
Xianpeng Liu, Nan Xue, Tianfu Wu, learning monocular auxiliary contexts helps in monocular detection of 3D objects. arXiv:2112.04628v1 [cs.CV]arxiv.org/pdf/2112.04628.pdf
Provided by North Carolina State University
Quote: Technique improves AI’s ability to understand 3D space using 2D images (2022, Jan 26) Retrieved May 17, 2022 from https://techxplore.com/news/2022-01- technical-ai-ability-3d-space.html
This document is subject to copyright. Except for fair use for purposes of private study or research, no part may be reproduced without written permission. The content is provided for information only.