Moving Picture Experts Group (MPEG)

0
135

Overview

MPEG is a working group of ISO/IEC in charge of the development of standards for coded representation of digital audio and video and related data. Established in 1988, the group produces standards that help the industry offer end users an evermore enjoyable digital media experience. In its 21 years of activity, MPEG has developed a substantive portfolio of technologies that have created an industry worth several hundred billion USD. MPEG is currently interested in 3DV in general and 3DTV in particular. Any broad success of 3DTV/3DV will likely depend on the development and industrial acceptance of MPEG standards; MPEG is the premiere organization worldwide for video encoding and the list of standards that have been produced in recent years is as follows:

MPEG-1 The standard on which such products as video CD and MP3 are based

MPEG-2 The standard on which such products as digital television set-top boxes and DVDs are based

MPEG-4 The standard for multimedia for the fixed and mobile web

MPEG-7 The standard for description and search of audio and visual content

MPEG-21 The multimedia framework

MPEG-A The standard providing application-specific formats by integrating multiple MPEG technologies

MPEG-B A collection of systems-specific standards

MPEG-C A collection of video-specific standards

MPEG-D A collection of audio-specific standards

MPEG-E A standard (M3W) providing support to download and execute multimedia applications

MPEG-M A standard (MXM) for packaging and reusability of MPEG technologies

MPEG-U A standard for rich media user interface

MPEG-V A standard for interchange with virtual worlds

provides a more detailed listing of activities of MPEG groups in the area of video.

Completed Work

As we have seen in other parts of this text, currently there are a number of different 3DV formats (either already available and/or under investigation), typically related to specific types of displays (e.g., classical two-view stereo video, multiview video with more than two views, V+D, MV+D, and layered depth video). Efficient compression is crucial for 3DV applications and a plethora of compression and coding algorithms are either already available and/or under investigation for the different 3DV formats (some of these are standardized e.g., by MPEG, others are proprietary). A generic, flexible, and efficient 3DV format that can serve a range of different 3DV systems (including mobile phones) is currently being investigated by MPEG.

As we noted earlier in this text, MPEG standards now already support 3DV based on V+D. In 2007 MPEG specified a container format “ISO/IEC 23002-3 Representation of Auxiliary Video and Supplemental Information” (also know  as MPEG-C Part 3) that can be utilized for V+D data. Transport of this data is defined in a separate MPEG systems specification “ISO/IEC 13818-1:2003 Carriage of Auxiliary Data”

In 2008 ISO approved a new 3DV project in 2008 under ISO/IEC JTC1/SC29/WG11 (ISO/IEC JTC1/SC29/WG11, MPEG2008/N9784). The
JVT of ITU-T and MPEG has devoted its recent efforts to extend the widely deployed H.264/AVC standard for MVC to support MV+D (and also V+D). MVC allows the construction of bitstreams that represent multiple views. The MPEG standard that emerged, MVC, provides good robustness and compression performance for delivering 3DV by taking into account of the inter-view dependencies of the different visual channels. In addition, its backwards-compatibility with H.264/AVC codecs makes it widely interoperable in environments having both 2D and 3D capable devices. MVC supports an MV+D (and also V+D) encoded representation inside the MPEG-2 transport stream. The MVC standard was developed by the JVT of ISO/IEC MPEG

Activities of MPEG Groups in the Area of Video

Activities of MPEG Groups in the Area of Video

Activities of MPEG Groups in the Area of VideoActivities of MPEG Groups in the Area of Video

Activities of MPEG Groups in the Area of Video

Activities of MPEG Groups in the Area of Video

Activities of MPEG Groups in the Area of Video

Activities of MPEG Groups in the Area of VideoActivities of MPEG Groups in the Area of Video

and ITU-T Video Coding Experts Group (VCEG; ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6). MVC was originally an addition to H.264/MPEG-4 AVC video compression standard that enables efficient encoding of sequences captured simultaneously from multiple cameras using a single video stream.

At press time, MVC was the most efficient way for stereo and multi-view video coding; for two views, the performance achieved by H.264/AVC Stereo SEI message and MVC are similar. MVC is also expected to become a new MPEG video coding standard for the realization of future video applications such as 3DTV and FTV. The MVC group in the JVT has chosen the H.264/AVC-based MVC method as the MVC reference model, since this method showed better coding efficiency than H.264/AVC simulcast coding and the other methods that were submitted in response to the call for proposals made by the MPEG.

New Initiatives

ISO MPEG has already developed a suite of international standards to support 3D services and devices, and in 2009 initiated a new phase of standardization to be completed by 2011

  • One objective is to enable stereo devices to cope with varying display types and sizes, and different viewing preferences. This includes the ability to vary the baseline distance for stereo video to adjust the depth perception that could help to avoid fatigue and other viewing discomforts.
  • MPEG also envisions that high-quality autostereoscopic displays will enter the consumer market in the next few years. Since it is difficult to directly provide all the necessary views due to production and transmission constraints, a new format is needed to enable the generation of many high-quality views from a limited amount of input data such as stereo and depth.

ISO’s vision is now a new 3DV format that goes beyond the capabilities of existing standards to enable both advanced stereoscopic display processing and improved support for autostereoscopic N -view displays, while enabling interoperable 3D services. The new 3DV standard aims to improve rendering capability of 2D+Depth format while reducing bitrate requirements relative to existing standards, as noted earlier in this Section 6.

3DV supports new types of audiovisual systems that allow users to view videos of the real 3D space from different user viewpoints. In an advanced application of 3DV, denoted as FTV, a user can set the viewpoint to an almost arbitrary location and direction that can be static, change abruptly, or vary continuously, within the limits that are given by the available camera setup. Similarly, the audio listening point is changed accordingly. The first phase of 3DV development is expected to support advanced 3D displays, where M dense views must be generated from a sparse set of K transmitted views (typically K ≤ 3) with associated depth data. The allowable range of view synthesis will be relatively narrow (20◦ view angle from leftmost to rightmost view).

Example of an FTV system and data format.

The MPEG initiative notes that 3DV is a standard that targets serving a variety of 3D displays. It is the first phase of FTV, that is a new framework that includes a coded representation for multi-view video and depth information to support the generation of high-quality intermediate views at the receiver. This enables free viewpoint functionality and view generation for automultiscopic displays [7].
Figure 6.1 shows an example of an FTV system that transmits multi-view video with depth information. The content may be produced in a number of ways; for example, with multicamera setup, depth cameras or 2D/3D conversion processes. At the receiver, DIBR could be performed to project the signal to various types of displays.

The first focus (phase) of ISO/MPEG standardization for FTV is 3DV [8]. This means video for 3D displays. Such displays focus present N views (e.g., N = 9) simultaneously to the user (Fig. 6.2). For efficiency reasons, only a lower number K of views (K = 1, 2, 3) shall be transmitted. For those K views additional depth data shall be provided. At the receiver side, the N views to be displayed are generated from the K transmitted views with depth by DIBR. This is illustrated in Fig. 6.2.

This application scenario imposes specific constraints such as narrow angle acquisition (<20◦). Also there should be no need (cost reasons) for geometric rectification at the receiver side, meaning if any rectification is needed at all, it should be performed on the input views already at the encoder side.

Example of generating nine outputs views (N = 9) out of three input views with depth (K = 3).

Some multi-view displays are, for example, based on LCD screens with a sheet of transparent lenses in front. This sheet sends different views to each eye, and so a person sees two different views; this gives the person a stereoscopic viewing experience. The stereoscopic capabilities of these multi-view displays are limited by the resolution of the LCD screen (currently 1920 × 1080). For example, for a nine-view system where the cone of nine views is 10◦ (Cone Angle—CA), objects are limited to ±10% (Object Range—OR) of the screen
width to appear in front or behind the screen. Both OR and CA will improve with time (determined by economics) as the number of pixels of the LCD screen goes up.

In addition, other types of stereo displays appear now in the market in large numbers. The ability to generate output views at arbitrary positions at the receiver, is attractive even in the case of N = 2 (i.e., simple stereo display). If, for example, the material has been produced for a large cinema theater, direct usage of that stereo signal (two fixed views) with relatively small home-sized 3D displays will yield a very different stereoscopic viewing experience (e.g., strongly reduced depth effect). With a 3DV signal as illustrated in Fig. 6.3 a new stereo pair can be generated that is optimized for the given 3D display.

With a different initiative, ISO previously looked at auxiliary video data representations. The purpose of ISO/IEC 23002-3 Auxiliary Video Data Representations is to support all those applications where additional data needs to be

Example of lenticular autostereoscopic display requiring nine views (N = 9).

efficiently attached to the individual pixels of a regular video. ISO/IEC 23002-3 describes how this can be achieved in a generic way by making use of existing (and even future) video codecs available within MPEG. A good example of an application that requires additional information associated with the individual pixels of a regular (2D) video stream is stereoscopic video presented on an autostereoscopic single- or multiple-user display. At the MPEG meeting in Nice, France (October 2005), the arrival of such displays on the market had
been stressed, and several of them were even shown and demonstrated. Because different display realizations vary largely in (i) the number of views that are represented; and (ii) the maximum parallax that can be supported, an input format is required that is flexible enough to drive all possible variants. This can be achieved by supplying a depth or parallax values with each pixel of a regular video stream,
and by generating the required stereoscopic views at the receiver side. The standardization of a common depth, in the parallax format within ISO/IEC 23002-3 Auxiliary Video Data Representations will thus enable interoperability between content providers, broadcasters, and display manufacturers. ISO/IEC 23002-3 is flexible enough to easily add other types of auxiliary video data in the future. One example could be the annotation of temperature maps coming from an infrared camera to regular video coming from a regular camera

The Auxiliary Video Data format defined in ISO/IEC 23002-3 consists of an array of N -bit values that are associated with the individual pixels of a regular video stream. These data can be compressed like conventional luminance signals using already existing (and even future) MPEG video codecs. The format allows for optional subsampling of the auxiliary data in both, the spatial and temporal domains. This can be beneficial depending on the particular application and its requirements and allowing for very low bitrates for the auxiliary data. The specification is very flexible in the sense that it defines a new 8-bit code word aux_video_type that specifies the type of the associated data; for example, currently a value of 0 × 10 signals a depth map, a value of 0 × 11 signals a parallax map. New values for additional data representations can be easily added to fulfill future demands.

The transport of auxiliary video data within an MPEG-2 transport or program stream is defined in an amendment to the MPEG-2 systems standard. It specifies new stream_id_extension and stream_type values that are used to signal an auxiliary video data stream. An additional auxiliary_video_data_descriptor is utilized in order to convey in more detail how the data should to be interpreted by the
application that uses them. Metadata associated with the auxiliary data is carried on system level, allowing the use of unmodified video codecs (no need to modify silicon).

In conclusion, ISO/IEC 23002-3 Auxiliary Video Data Representations provides a reasonably efficient approach for attaching additional information such as depth values and parallax values to the individual pixels of a regular video stream and to signal how these associated data should be interpreted by the application that uses them.