3DV/3DTV Stereoscopic Principles

0
190

We start this section with a few additional definitions. Stereo means “having depth, or being three-dimensional” and it describes an environment where two inputs combine to create one unified perception of three-dimensional space. Stereoscopic vision is the process where two eye views combine in the brain to create the visual perception of one 3D image; it is a by-product of good binocular vision. Stereoscopy can be defined as any technique that creates the illusion of

Basic stereoscopic camera configurations: (a) “toed-in” approach, and (b) “parallel” setup.

depth of three-dimensionality in an image. Stereoscopic (literally: “solid looking”) is the term to describe a visual experience having visible depth as well as height and width. The term may refer to any experience or device that is associated with binocular depth perception. Stereoscopic 3D refers to two photographs taken from slightly different angles that appear three-dimensional when viewed together. Autostereoscopic describes 3D displays that do not require glasses to see the stereoscopic image. Stereogram is a general term for any arrangement of left-eye and right-eye views that produces a three-dimensional result that may consist of (i) a side-by-side or over-and-under pair of images; (ii) superimposed images projected onto a screen; (iii) a color-coded composite (anaglyph); (iv) lenticular images; or (v) alternate projected left-eye and right-eye images that fuse by means of the persistence of vision [10]. Stereoplexing (stereoscopic multiplexing) is a mechanism to incorporate information for the left and right perspective views into a single information channel without expansion of the bandwidth.

On the basis of the principles discussed above, a number of techniques for re-creating depth for the viewer of photographic or video content have been developed. Considerable amount of research has taken place during the past 30 or more years on 3D graphics and imaging; most of the research has focused on photographic techniques, computer graphics, 3D movies, and holography (the field of imaging, including 3D imaging relates more to the static or quasi-static
capture/representation—encoding, compression/transmission/display/storage of content, for example, photographs, medical images, CAD/CAM drawings, and so on, especially for high-resolution applications—this topic is not covered here).

Fundamentally, the technique known as “stereoscopy” has been advanced, where two pictures or scenes are shot, one for each eye, and each eye is presented with its proper picture or scene, in one fashion or another (Fig. 2.6). Stereoscopic 3D video is based on the binocular nature of human perception; to generate quality 3D content, the creator needs to control the depth and parallax of the scene, among other parameters. Depth perception is the ability to see in 3D to allow the viewer to judge the relative distances of objects; depth range is a term that applies to stereoscopic images created with cameras. As noted above, parallax is the apparent change in the position of an object when viewed from different points; namely, the visual differences in a scene when

Stereoscopic capture of scene to achieve 3D when scene is seen with appropriate display system. In this figure the separation between the two images is exaggerated for pedagogical reasons (in actual stereo photos the differences are very minute).

Generation of horizontal parallax for stereoscopic displays.

viewed from different points. A 3D display (screen) needs to generate some sort of parallax, which, in turn, creates a stereoscopic sense (Fig. 2.7). Nearby objects have a larger parallax than more distant objects when observed from
different positions; because of this feature, parallax can be used to determine distances. Because the eyes of a person are in different positions on the head, they present different views simultaneously. This is the basis of stereopsis, the process by which the brain exploits the parallax due to the different views from the eye to gain depth perception and estimate distances to objects. 3D depth perception can be supported by 3D display systems that allow the viewer to receive a specific and different view for each eye; such a stereo pair of views must correspond to the human eye positions, thus enabling the brain to compute the 3D depth perception. In recent years, the main means of stereoscopic display has moved over the years from anaglyph to polarization and shutter glasses.

Some basic terms and concepts related to camera management for stereoscopic filming are as follows: interaxial distance is the distance between the left- and right-eye lenses in a stereoscopic camera. Camera convergence is the term used to denote the process of adjusting the ZPS in a stereoscopic camera. ZPS defines the point(s) in 3D space that have zero parallax in the plano-stereoscopic image created; for example, with a stereoscopic camera. These points will be stereoscopically reproduced on the surface of the display screen.

Two simultaneous conventional 2D video streams are produced by a pair of cameras mimicking the two human eyes that see the environment from two slightly different angles. Simple planar 3D films are made by recording separate images for the left eye and the right eye from two cameras that are spaced a certain distance apart. The spacing chosen affects the disparity between the lefteye and the right-eye pictures, and thereby the viewer’s sense of depth. While this technique achieves depth perception, it often results in eye fatigue after watching such a programming for a certain amount of time: within minutes after the onset of viewing, stereoscopy frequently causes eye fatigue and, in some, feelings similar to those experienced during motion sickness [11]. Nevertheless, the technique is widely used for (stereoscopic) photography and moviemaking, and it has been tested many times for television [12].

At the display level, one of these streams is shown to the left eye, and the other one to the right eye. Common means of separating the right-eye and left-eye views include glasses with colored transparencies, polarization filters, and shutter
glasses. Polarization of light is the arrangement of beams of light into separate planes or vectors by means of polarizing filters; when two vectors are crossed at right angles, vision or light rays are obscured. In the filter-based approach, complementary filters are placed jointly over two overlapping projectors (when projectors are used—refer back to Table 1.3) and over the two corresponding eyes (i.e., anaglyph, linear or circular polarization, or the narrow-pass filtering of Infitec) [13]. Although the technology is relatively simple, the necessity of wearing glasses while viewing has often been considered a major obstacle to the wide acceptance of 3DTV. Also, there are some limitations to the approach, such as the need to retain a head orientation that works properly with the polarized light (e.g., do not bend the head 45 degrees side to side), and the need to be within a certain viewing angle. There are a number of other mechanisms to deliver binocular stereo, including barrier filters over LCDs (vertical bars act as a fence, channeling data in specific directions for the eyes).

It should be noted as we wrap up this brief overview of the HVS that individuals vary along a continuum in their ability to process stereoscopic depth information. Studies have shown that a relatively large percentage of the population experience stereodeficiencies in depth discrimination/perception if the display duration is very short, and that a certain percentage of the adult population (about 6%) has persistent deficiencies. Figure 2.8 depicts the results of a study that quantifies these observations [14]. These results indicate that certain fast-cut methods in scenes may not work for all in 3D. Object motion can also create visual problem in stereoscopic 3DTV. Figure 2.9 depicts visual discomfort that has been observed in studies [14]. At the practical level, in the context of cinematography, while new digital 3D technology has made the experience more comfortable for many, for some people with eye problems, a prolonged 3D session may result in an aching head according to ophthalmologists. Some people have very minor eye problems (e.g., a minor muscle imbalance), which the brain deals with naturally under normal circumstances; but in a 3D movie, these people are confronted with an entirely new sensory experience that translates into greater mental effort, making it easier to get a headache. Some people who do not have normal depth perception cannot see in 3D at all. People with eye muscle problems, in which the eyes are not pointed at the same object, have trouble processing 3D images.

Stereo deficiencies in some populations [14].

Visual discomfort caused by motion in a scene [14].

Headaches and nausea are cited as the main reasons 3D technology never took off. However, newer digital technology addresses many of the problems that typically caused 3D moviegoers discomfort. Some of the problems were related to
the fact that the projectors were not properly aligned; systems that use a single digital projector help overcome some of the old problems [15]. However, deeper-rooted issues about stereoscopic display may continue to affect a number of viewers (these problems will be solved by future autostereoscopic systems).

The two video views required for 3DTV can be compressed using standard video compression techniques. MPEG-2 encoding is widely used in digital TV applications today and H.264/MPEG-4 AVC is expected to be the leading video technology standard for digital video in the near future. Extensions have been developed recently to H.264/MPEG-4 AVC and other related standards to support 3DTV; other standardization work is underway. The compression gains and
quality of 3DTV will vary depending on the video coding standard used. While inter-view prediction will likely improve the compression efficiency compared to simulcasting (transmitting the two views end-to-end, and so requiring a doubling of the channel bandwidth), new approaches, such as, but not limited to, asymmetric view coding, video-plus-depth, and layered video, are necessary to reduce bandwidth requirements for 3DTV [16].

There are a number of ways to create 3D content, including: (i) Computer- Generated Imagery (CGI); (ii) stereocameras; and (iii) 2D to 3D conversions. CGI techniques are currently the most technically advanced, with welldeveloped methodologies (and tools) to create movies, games, and other graphical applications—the majority of cinematic 3D content is comprised of animated movies created with CGI. Camera-based 3D is more challenging. A 2-camera approach is the typical approach here, at this time; another approach is to use a 2D camera in conjunction with a depth-mapping system. With the 2-camera approach, the two cameras are assembled with same spatial separation to mimic how the eye may perceive a scene. The technical issues relate to focus/focal length, specifically keeping in mind that these have to be matched precisely to avoid differences in vertical and horizontal alignment and/or rotational differences (lens calibration and motion control must be added to the camera lenses). 2D to 3D conversion techniques include the following:

  • object segmentation and horizontal shifting;
  • depth mapping (bandwidth-efficient multiple images and viewpoints);
  • creation of depth maps using information from 2D source images;
  • making use of human visual perception for 2D to 3D conversion;
  • creation of surrogate depth map (e.g., gray-level intensities of a color component).

Conversion of 2D material is the least desirable but perhaps it is the approach that could generate the largest amount of content in the short term. Some note that it is “easy to create 3D content, but it is hard to create good 3D content” [17].

A practical problem relates to “insertion”. At least early on, 2D content will be inserted into a 3D channel, much the way standard-definition commercials still show up in HD content. A set-top could be programmed to automatically detect
an incoming format and handle various frame packing arrangement to support 2D/3D switching for advertisements [18].

In summary, and as we transition the discussion to autostereoscopic approaches (and in preparation for that discussion), we list below the highlights of the various approaches, as provided in Ref. [19] (refer back to Table 1.1 for definition of terms).

Stereoscopy is the Simplest and Oldest Technique:

  • does not create physical duplicates of 3D light;
  • quality of resultant 3D effect is inferior;
  • lacks parallax;
  • focus and convergence mismatch;
  • mis-alignment is seen;
  • “motion sickness” type of a feeling (eye fatigue) is produced;
  • is the main reason for commercial failure of 3D techniques.

Multi-view video provides some horizontal parallax:

  • still limited to a small angle (∼20–45 degrees);
  • jumping effect observed;
  • viewing discomfort similar to stereoscopy;
  • requires high-resolution display device;
  • leakage of neighboring images occurs.

Integral Imaging adds vertical parallax:

  • gets closer to an ideal light-field renderer as the number of lenses (elemental images) increase: true 3D;
  • alignment is a problem;
  • requires very high resolution devices;
  • leakage of neighboring images occurs.

Holography is superior in terms of replicating physical light distribution:

  • recording holograms is difficult;
  • very high resolution recordings are needed;
  • display techniques are quite different;
  • network transmission is anticipated to be extremely taxing.