Depth Perception- How Cameras and Human Eyes Can Detect Distant Objects
Depth Perception refers to the ability to gauge the distance and depth between objects and a source by visually perceiving the world in 3 dimensions. Our world is 3 dimensional, yet the images perceived by our eyes are 2 dimensional. And still, everything we end up looking at appears 3-dimensional. Ever wondered how this happens?
The images that are received as inputs by our retina take it in as 2D images, but we still see everything through a 3 dimensional point of view because of Depth Perception, an ability attained through the course of evolution of species! This enables us to identify the relative distances and depth variations between different objects from our eyes, helping us in carrying out a wide range of activities without bumping, tripping or falling!
If human eyes work this way, it is reasonable to assume that artificial intelligence, which is based off of human intelligence, would also perceive depth and distance similarly. However, in the case of artificial intelligence, it relies on cameras, motion sensors like UV, infrared or LiDAR sensors to gather adequate information about the relative distances and depths between the source and objects.
Therefore, in order to understand how depth perception in cameras and artificial intelligence works, we need to understand how our very own eyes conclusively decide on estimating distances and depths!
Human Depth Perception
Humans perceive depth and distance accurately due to their binocular vision which is also known as Stereopsis. This refers to using both eyes to ascertain visual cues helping to gauge depths. These cues are often called depth cues, referring to the details in an environment that helps with the perception of depth.
In case of binocular vision it’s called binocular cues i.e. aiding in 3D view and in monocular vision it’s called monocular cues meaning details that can be perceived in 2D with one eye. Binocular cues include-
- Stereopsis- Since our eyes are at the front, they obtain information about an object from both retina with slight variations in angles owing to the horizontal distance between both the eyes. The best example for this is how our eyes perceive 3D movies and images, if the disparity (stereopsis) in angles from both retinas are larger, the image appears to be closer. For this effect, in the case of 3D movies, they are short at slightly varying angles.
- Convergence- Convergence occurs in binocular vision owing to the merging of two images formed by both the eyes. The muscles of our eyes help in relaxing and contracting in order to focus on objects. An easy example for observing this phenomenon is by placing one’s finger in front of their eyes and then moving it closer. The closer it gets, the blurrier the image is, meaning the merging of the two images becomes more difficult, and stressful when it is too close to the eye.
Whilst, some of the monocular cues include,
1. Relative Size- monocular cues provide us with information regarding the relative size of an object based on the visual angle at which the image is received by the retina, i.e. closer the image/object appears to be, larger is the angle subtended by them to the retina.
2. Perspective- Linear perspective is when two parallel lines converge at a distance of infinity, thus allowing us to reconstruct the relative distance and depth of an area. The closer the lines, the greater the distance from the source.
3. Lighting and Shading- The way by which light is reflected off of an object and the shadow cast by the object because of light’s position helps the viewer discern the shape and position of the object in space.
4. Motion Parallax- Objects at different distances will move with different speeds i.e. when we move, objects closer to us pass by more quickly than those further away from us.
5. Texture Gradient- When an object is closer to our eyes, we can see its finer details whereas when its further away, more hazy and blurry the image becomes.
These cues aid in depth perception in humans allowing for 2D imagery to be perceived with the correct depth and distance. Someone lacking in binocular vision will be reliant on other visual cues to ensure depth perception however, it might not be entirely accurate as with binocular vision.
Depth Perception in Camera and AI
Images are taken with the help of cameras which are 2 dimensional and these images are the projection of the 3 dimensional world on a medium be it a film or screen, thereby not allowing depth perception to take place. In many AI applications, 2D imaging might be sufficient, however in case of autonomous vehicles or industrial robots, they need to be aware of their surroundings and as such need 3D detection and depth perception.
Currently, depth perception in cameras is more monocular in nature owing to the use of a single camera to capture images from which the depth and distances are estimated based on geometrical cues.
Machine Learning and Deep Learning are growing to play a significant role in computerized vision due to their ability to mimic human pattern recognition skills. Currently however, machine learning and deep learning to analyze images by segmenting it takes up huge amounts of power with very low efficiency. Humans do this segmentation and subsequent merged visualization easily with the help of their two eyes.
Using devices like 3D cameras and sensors can be a key enabler in depth sensing for machines, allowing for accurate image detection and depth perception within reasonable budgets.
Applications of depth perception in modern technology are plenty including it’s use in Augmented Reality/ Virtual Reality, Robotics, Cameras and even in facial recognition. With such a successful advancement, technology can truly take a turn never taken before and witness wonderful feats of achievements due to it!