Stereopsis is the binocular sense depth. Binocular as an adjective means “with two eyes, related to two eyes.” Depth is perceived by the HVS by way of cues. Binocular cues are depth cues that depend on perception with two eyes. Monocular cues are depth cues that can be perceived with a single eye alone, such as relative size, linear perspective, or motion parallax. Stereoscopic fusion is the ability of the human brain to fuse the two different perspective views into a single, 3D image. Accommodation is the focusing of the eyes. Convergence is the horizontal rotation of eyes (or cameras) that makes their optical axes intersect in a single point in 3D space. Interocular distance is the distance between an observer’s eye—about 64 mm for adults.
Disparity is the distance between corresponding points on left- and right-eye images. Retinal disparity is the disparity perceived at the retina of the human eyes. Horopter is the 3D curve that is defined as the set of points in space whose images form at corresponding points in the two retinas (i.e., the imaged points have zero disparity). Panum’s fusional area is a small region around the horopter where retinal disparities can be fused by HVS into a single, 3D image. Point of convergence is a point in 3D point where optical axis of eyes (or convergent cameras) intersect. The plane of convergence is the depth plane where optical rays of sensor centers intersect in case of parallel camera setup. Crossed
disparity represents retinal disparities indicating that corresponding optical rays intersect in front of the horopter or the convergence plane. Uncrossed disparity
represents retinal (or camera disparities) where the optical rays intersect behind the horopter or the convergence plane. Interocular distance (also caller interpupillary distance) is the distance between an observer’s eye, about 64 mm for adults  (although there is a distribution1 of distances ±12 mm).
See Fig. 2.1 for illustrations on concepts of disparity and Fig. 2.2 for the concept of fusion.
Stereoscopy is the method used for creating a pair of planar stereo images. Plano-stereoscopic is the exact term for describing 3D displays that achieve a binocular depth effect by providing the viewer with images of slightly different
perspective at one common planar screen. Depth range is the extent of depth that is perceived when a plano-stereoscopic image is reproduced by means of a stereoscopic viewing device.
Corresponding points are the points in the left and right images that are pictures of the same point in 3D space. Parallax is the distance between corresponding points in the left- and right-eye images of a plano-stereoscopic image. Parallax angle is the angle under which the optical rays of the two eyes intersect at a particular point in the 3D space. Hence, (binocular) parallax is the apparent change in the position of an object when viewed from different points (e.g., from two eyes or from two different positions); in slightly different words, an apparent displacement or difference in the apparent position of an object viewed along two different lines of sight. Negative parallax stereoscopic presentation occurs where the optical rays intersect in front of the screen in the viewers’ space (this refers to crossed disparity). Positive parallax stereoscopic presentation occurs where the optical rays intersect behind the screen in the screen space (this refers to uncrossed disparity). Screen space is the region behind the display screen surface. Objects will be perceived in this region if they have positive parallax.
Viewer space is the region between the viewer and the display screen surface. Objects will be perceived in this region if they have negative parallax (Fig. 2.3).
Accommodation/convergence conflict is the deviation from the learned and habitual correlation between accommodation and convergence when viewing plano-stereoscopic images. Binocular rivalry represents perception conflicts that appear in case of colorimetric, geometric, photometric or other asymmetries between the two (recreated) stereo images. Crosstalk is the imperfect separation of the left- and right-eye images when viewing plano-stereoscopic 3D
content. Crosstalk is a physical entity, whereas ghosting is a psychophysical entity (Fig. 2.4).
That the double images just described usually do not disturb visual perception is the result of another habitual behavior that is tightly coupled with the described convergence process. In concert with the rotation of the optical axes, the eyes also focus (accommodate by changing the shape of the eye’s lenses) on the object of interest. This is important for two different reasons. First of all, focusing on the point of convergence allows the observer to see the object of interest clear and sharp. Secondly, the perception of disturbing double images, which in principle result from all scene parts outside Panum’s fusional area, is efficiently suppressed due to an increasing optical blur .
Although particular realizations differ widely in the specifically used techniques, most of all stereoscopic displays and projections are based on the same basic principle of providing the viewer with two different perspective images for the left and the right eye. Usually, these slightly different views are presented at the same planar screen. These displays are therefore called plano-stereoscopic devices. In this case, the perception of binocular depth cues results from the spatial
distances between corresponding points in both planar views, that is, from the so-called parallax P that in turn, induces the retinal disparities in the viewer’s eyes. Thus, the perceived 3D impression depends on, among others, parameters
such as the viewing distance on both, the amount and type of parallax.
As shown in Fig. 2.3, three different cases have to be taken into account here:
- Positive Parallax: Corresponding image points are said to have positive or uncrossed parallax P when the point in the right-eye view lies more to the right than the corresponding point in the left-eye view. Thus, the related viewing rays converge in a 3D point behind the screen, so that the reproduced 3D scene is perceived in the so-called screen space. Furthermore, if the parallax P exactly equals the viewer’s interocular distance te,
the 3D point is reproduced at infinity. This also means that the allowed maximum of the positive parallax is limited to te.
- Zero Parallax: With zero parallax, corresponding image points lie at the same position in the left- and the right-eye views. The resulting 3D point is therefore observed directly at the screen, a situation that is often referred to as the Zero Parallax Setting (ZPS).
- Negative Parallax: Conjugate image points with negative or crossed parallax P are located such that the point in the right-eye view lies more to the left than the corresponding point in the left-eye view. The viewing rays therefore converge in a 3D point in front of the screen in the so-called viewer space.
The parallax angle is unlimited when looking at a real-world 3D scene. In this case, the eyes simultaneously converge and accommodate on the object of interest. As explained, these jointly performed activities allow the viewer to stereoscopically fuse the object of interest and, at the same time, to suppress diplopia (double image) effects for scene parts that are outside the Panum’s fusional area around the focused object. However, the situation is different in stereoreproduction. When looking at a stereoscopic 3D display, the eyes always accommodate on the screen surface, but they converge according to parallax (Fig. 2.4). This deviation from the learned and habitual correlation between accommodation and convergence is known as accommodation–convergence conflict. It represents one of the major reasons for eyestrain, confusion, and loss of stereopsis in 3D stereoreproduction [5–7]. It is therefore important to make
sure that the maximal parallax angle αmax is kept within acceptable limits or, in other words, to guarantee that the 3D world is reproduced rather close to the screen surface of the 3D display.
The related generation of planar stereoscopic views requires capture with a synchronized stereocamera. Because such 2-camera systems are intended to mediate the natural binocular depth cue, it is not surprising that their design shows a
striking similarity with the HVS. For example, the interaxial distance tc between the focal points of left- and the right-eye camera lens is usually chosen in relation to the interocular distance te. Furthermore, similar to the convergence capability of the HVS, it must be able to adapt a stereocamera to a desired convergence condition or ZPS; that is, to choose the part of the 3D scene that is going to be reproduced exactly on the display screen. As shown in Fig. 2.5, this can be achieved by two different camera configurations [8, 9].
- “Toed-In” Setup: With the toed-in approach, depicted in Fig. 2.4(a), a point of convergence is chosen by a joint inward rotation of the left- and the right-eye cameras.
- “Parallel” Setup: With the parallel method, shown in Fig. 2.4(b), a plane of convergence is established by a small shift h of the sensor targets.
At first view, the toed-in approach intuitively seems to be the more suitable solution because it directly fits the convergence behavior of the HVS. However, it has been shown in the past that the parallel approach is nonetheless preferable, because it provides a higher stereoscopic image quality [8, 9].