Other Advocacy Entities

This section provides a short survey of industry advocacy and activities in support of 3DTV.

[email protected] Consortium

Recently (in 2008) the [email protected] Consortium was formed with the mission to speed the commercialization of 3D into homes worldwide and provide the best possible viewing experience by facilitating the development of standards, roadmaps, and education for the entire 3D industry—from content, hardware, and software providers to consumers.

3D Consortium (3DC)

The 3D Consortium (3DC) aims at developing 3D stereoscopic display devices and increasing their take-up, promoting expansion of 3D contents, improving distribution, and contributing to the expansion and development of the 3D market. It was established in Japan in 2003 by five founding companies and 65 other companies including hardware manufacturers, software vendors, contents vendors, contents providers, systems integrators, image producers, broadcasting agencies, and academic organizations.

European Information Society Technologies (IST) Project ‘‘Advanced Three-Dimensional Television System  Technologies’’ (ATTEST)

This is a project where industries, research centers, and universities have joined forces to design a backwards-compatible, flexible, and modular broadcast 3DTV system. The ambitious aim of the European Information Society Technologies (IST) project ATTEST is to design a novel, backwards-compatible, and flexible broadcast 3DTV system. In contrast to former proposals that often relied on the basic concept of “stereoscopic” video, that is the capturing, transmission, and display of two separate video streams (one for the left eye and one for the right eye), this activity focuses on a data-in-conjunction-with-metadata approach. At the very heart of the described new concept is the generation and distribution of a novel data representation format that consists of monoscopic color video and associated per-pixel depth information. From these data, one or more “virtual” views of a real-world scene can be synthesized in real-time at the receiver side (i.e., a 3DTV STB) by means of the DIBR techniques. The modular architecture of the proposed system provides important features, such as backwards-compatibility to today’s 2D DTV, scalability in terms of receiver complexity, and adaptability to a wide range of different 2D and 3D displays.

3D Content Creation. For the generation of future 3D content, novel three-dimensional material is created by simultaneously capturing video and associated per-pixel depth information with an active range camera such as the so-called ZCamTM developed by 3DV Systems. Such devices usually integrate a high-speed pulsed infrared light source into a conventional broadcast TV camera and they relate the time of flight of the emitted and reflected light walls to direct measurements of the depth of the scene. However, it seems clear that the need for sufficient high-quality, three-dimensional content can only partially be satisfied with new recordings. It will therefore be necessary (especially in the introductory phase of the new broadcast technology) to also convert already existing 2D video material into 3D using so-called “structure from motion” algorithms. In principle, such (offline or online) methods process one or more monoscopic color video sequences to (i) establish a dense set of image point correspondences from which information about the recording camera, as well as the 3D structure of the scene can be derived or (ii) infer approximate depth information from the relative movements of automatically tracked image segments. Whatever 3D content generation approach is used in the end, the outcome in all cases consists of regular 2D color video in European DTV format (720 × 576 luminance pels, 25 Hz, interlaced) and an accompanying depth-image sequence with the same spatiotemporal resolution. Each of these depth-images stores depth information as 8-bit gray values with the gray level 0 specifying the furthest value and the gray level 255 defining the closest value. To translate this data representation format to real, metric depth values (that are required for the “virtual” view generation (and to be flexible with respect to 3D scenes with different depth characteristics, the gray values are normalized to two main depth clipping planes.

3DV Coding. To provide the future 3DTV viewers with threedimensional content, the monoscopic color video and the associated per-pixel depth information have to be compressed and transmitted over the conventional 2D DTV broadcast infrastructure. To ensure the required backwards-compatibility with existing 2D-TV STBs, the basic 2D color video has to be encoded using the standard MPEG-2 as MPEG-4 Visual or AVC tools currently required by the DVB Project in Europe.

Transmission. The DVB Project, a consortium of industries and academia responsible for the definition of today’s 2D DTV broadcast infrastructure in Europe, requires the use of the MPEG-2 systems layer specifications for the distribution of audiovisual data via cable (DVB-C), satellite (DVB-S), or terrestrial (DVB-T) transmitters.

‘‘Virtual’’ View Generation and 3D Display. At the receiver side of the proposed ATTEST system, the transmitted data is decoded in a 3DTV STB to retrieve the decompressed color video- and depth-image sequences (as well as the additional metadata). From this data representation format, a DIBR algorithm generates “virtual” left- and right-eye views for the three-dimensional reproduction of a real-world scene on a stereoscopic or autostereoscopic, singleor multiple-user 3DTV display. The backwards-compatible design of the system ensures that viewers who do not want to invest in a full 3DTV set are still able to watch the two-dimensional color video without any degradations in quality using their existing digital 2DTV STBs and displays.

3D4YOU

3D4YOU7 is funded under the ICT Work Programme 2007–2008, a thematic priority for research and development under the specific program “Cooperation” of the Seventh Framework Programme (2007–2013). The objectives of the project are

  1. to deliver an end-to-end system for 3D high-quality media;
  2. to develop practical multi-view and depth capture techniques;
  3. to convert captured 3D content into a 3D broadcasting format;
  4. to demonstrate the viability of the format in production and over broadcast chains;
  5. to show reception of 3D content on 3D displays via the delivery chains;
  6. to assess the project results in terms of human factors via perception tests;
  7. to produce guidelines for 3D capturing to aid in the generation of 3D media production rules;
  8. to propose exploitation plans for different 3D applications.

The 3D4YOU project aims at developing the key elements of a practical 3D television system, particularly, the definition of a 3D delivery format and guidelines for a 3D content creation process.

The 3D4YOU project will develop 3D capture techniques, convert captured content for broadcasting, and develop 3D coding for delivery via broadcast that is suitable to transmit and make public. 3D broadcasting is seen as the next major step in home entertainment. The cinema and computer games industries have already shown that there is considerable public demand for 3D content but the special glasses that are needed limits their appeal. 3D4YOU will address the consumer market that coexists with digital cinema and computer games. The 3D4YOU project aims to pave the way for the introduction of a 3D TV system. The project will build on previous European research on 3D, such as the FP5 project ATTEST that has enabled European organizations to become leaders in this field.

3D4YOU endeavors to establish practical 3DTV. The key success factor is 3D content. The project seeks to define a 3D delivery format and a content creation process. Establishing practical 3DTV will then be demonstrated by embedding this content creation process into a 3DTV production and delivery chain, including capture, image processing, delivery, and then display in the home. The project will adapt and improve on these elements of the chain so that every part integrates into a coherent interoperable delivery system. A key project’s objective is to provide a 3D content format that is independent of display technology, and backward compatible with 2D broadcasting. 3D images will be commonplace
in mass communication in the near future. Also, several major consumer electronics companies have made demonstrations of 3DTV displays that could be in the market within two years. The public’s potential interest in 3DTV can be seen by the success of 3D movies in recent years. 3D imaging is already present in many graphics applications (architecture, mechanical design, games, cartoons, and special effects for TV and movie production).

In recent years, multi-view display technologies have appeared that improve the immersive experience of 3D imaging that leads to the vision that 3DTV or similar services might become a reality in the near future. In the United States, the number of 3D-enabled digital cinemas is rapidly growing. By 2010, about 4300 theaters are expected to be equipped with 3D digital projectors with the number increasing every month. Also in Europe, the number of 3D theaters is growing. Several digital 3D films will surface in the months and years to come and several prominent filmmakers have committed to making their next productions in stereo 3D. The movie industry creates a platform for 3D movies, but there is no established solution to bring these movies to the domestic market. Therefore, the next challenge is to bring these 3D productions to the living room. 2D to 3D conversion and a flexible 3D format are an important strategic area. It has been recognized that multi-view video is a key technology that serves a wide variety of applications, including free viewpoint and 3DV applications for the home entertainment and surveillance business fields. Multi-view video coding and transmission systems are most likely to form the basis for next-generation TV broadcasting applications and facilities. Multi-view video will greatly improve the efficiency of current video coding solutions performing simulcasts of independent views. This project builds on the wealth of experience of the major players in European 3DTV and intends to bring the date of the start of 3D broadcasting a step closer by combining their expertise to define a 3D delivery format and a content creation process.

The key technical problems that currently hamper the introduction of 3DTV to the mass market are as follows:

  1. It is difficult to capture 3DV directly using the current camera technology. At least two cameras need to operate simultaneously with an adjustable but known geometry. The offset of stereo cameras needs to be adjustable to
    capture depth, both close by and far away.
  2. Stereo video (acquired with 2-cameras) is currently not sufficient input for glasses-free, multi-view autostereoscopic displays. The required processing, such as disparity estimation, is noise-sensitive resulting in low 3D picture quality.
  3. 3D postproduction methods and 3DV standards are largely absenterimmature.

The 3D4YOU project will tackle these three problems. For instance, a creative combination of two or three high-resolution video cameras with one or two lowresolution depth range sensors may make it possible to create 3DV of good quality without the need for an excessive investment in equipment. This is in contrast to installing, say, 100 cameras for acquisition where the expense may hamper the introduction of such a system.

Developing tools for conversion of 3D formats will stimulate content creation companies to produce 3DV content at acceptable cost. The cost at which 3DV should be produced for commercial operation is not yet known. However, currently, 3DV production requires almost per frame user interaction in the video, which is certainly unacceptable. This immediately indicates the issue that needs to be solved: currently, fully automated generation of high 3DV is difficult; in the future it needs to be fully, or at least semi-automatic with an acceptable minimum of manual supervision during postproduction. 3D4YOU will research how to convert 3D content into a 3D broadcasting format and prove the viability of the format in production and over broadcast chains.

Once 3DV production becomes commercially attractive because acquisition techniques and standards mature, then this will impact the activities of content producers, broadcasters, and telecom companies. As a result, one may see that these companies may adopt new techniques for video production just because the output needs to be in 3D. Also, new companies could be founded that focus on acquiring 3DV and preparing it for postproduction. Here, there is room for differentiation since, for instance, the acquisition of a sport event will require large baselines between cameras and real-time transmission, whereas the shooting of narrative stories will require both small and large baselines and allows some manual postproduction for achieving optimal quality. These activities will require new equipment (or a creative combination of existing equipment) and new expertise.

3D4YOU will develop practical multi-view and depth capture techniques. Currently, the stereo video format is the de facto 3D standard that is used by the cinemas. Stereo acquisition may, for this reason, become widespread as an acquisition technique. Cinemas operate with glasses-based systems and can therefore use a theater-specific stereo format. This is not the case for the glasses-free autostereoscopic 3DTV that 3D4YOU foresees for the home. To allow glassesfree viewing with multiple people at home, a wide baseline is needed to cover the total range of viewing angles. The current stereo video that is intended for the cinema will need considerable postproduction to be suitable for viewing on a multi-view autostereoscopic display. Producing visual content will therefore, become more complex and may provide new opportunities for companies currently active in (3D) movie postproduction. According to the Networked and Electronic Media (NEM) Strategic Research Agenda, multi-view coding will form the basis for next-generation TV broadcast applications. Multi-view video has the advantage that it can serve different purposes. On the one hand, the multi-view input can be used for 3DTV. On the other hand, it can be shown on a normal TV where the viewer can select his or her preferred viewpoint of the action. Of course, a combination is possible where the viewer selects his or her preferred viewpoint on a 3DTV. However, multi-view acquisition with 30 views for example, will require 30 cameras to operate simultaneously. This initially requires a large investment. 3D4YOU therefore sees a gradual transition from stereo capture to systems with many views. 3D4YOU will investigate a mixture of 3DV acquisition techniques to produce an extended center view plus depth format (possibly with one or two extra views) that is, in principle, easier to produce, edit, and distribute. The success of such a simpler format relies on the ease (read cost!) at which it can be produced. One can conclude that the introduction of 3DTV to the mass market is hampered by (i) the lack of highquality 3DV content; (ii) by the lack of suitable 3D formats; and (iii) lack of appropriate format conversion techniques. The variety of new distribution media further complicates this.

Hence, one can identify the following major challenges that are expected to be overcome by the project:

  1. Video Acquisition for 3D Content: Here, the practicalities of multi-view and depth capture techniques are of primary importance, the challenge is to find the trade off such as number of views to be recorded, and how to
    optimally integrate depth capture with multi-view. A further challenge is to define which shooting styles are most appropriate.
  2. Conversion of Captured Multi-View Video to a 3D Broadcasting Format: The captured format needs new postproduction tools (like enhancement and regularization of depth maps or editing, mixing, fading, and compositing of V+D representations from different sources) and a conversion step generating a suitable transmission format that is compatible with used postproduction formats before the 3D content can be broadcast and displayed.
  3. Coding Schemes for Compression and Transmission: A last challenge is to provide suitable coding schemes for compression and transmission that are based on the 3D broadcasting format under study and to demonstrate their feasibility in field trials under real distribution conditions.

By addressing these three challenges from an end-to-end systems point of view, the 3D4YOU project aims to pave the way to the definition of a 3D TV system suitable for a series of applications. Different requirements could be set depending on the application, but the basic underlying technologies (capture, format, and encoding) should maintain as much commonality as possible so as to favor the emergence of an industry based on those technologies.

3DPHONE

The 3DPHONE project aims to develop technologies and core applications enabling a new level of user experience by developing end-to-end all-3D imaging mobile phone. Its aim is to have all fundamental functions of the phone—media display, User Interface (UI), and personal information management (PIM) applications—realized in 3D. We will develop techniques for all-3D phone experience: mobile stereoscopic video, 3D UIs, 3D capture/content creation, compression, rendering, and 3D display. The research and development of algorithms for 3D audiovisual applications including personal communication, 3D visualization, and content management will be done.

The 3DPhone Project started on February 11, 2008. The duration of the project is 3 years and there are six participants from Turkey, Germany, Hungary, Spain, and Finland. The partners are Bilkent University, Fraunhofer, Holografika, TAT, Telefonica, and University of Helsinki. 3DPhone is funded by the European Community’s ICT programme in Framework Programme Seven.

The goal is to enable users to

  • capture memories in 3D and communicate with others in 3D virtual spaces;
  • interact with their device and applications in 3D;
  • manage their personal media content in 3D.

The expected outcome will be simpler use and a more personalized look and feel. The project will bring state-of-the-art advances in mobile 3D technologies with the following activities:

  • A mobile hardware and software platform will be implemented with both 3D image capture and 3D display capability, featuring both 3D displays and multiple cameras. The project will evaluate different 3D display
    and capture solutions and will implement the most suitable solution for hardware–software integration.
  • UIs and applications that will capitalize on the 3D autostereoscopic illusion in the mobile handheld environment will be developed. The project will design and implement 3D and zoomable UI metaphors suitable for autostereoscopic displays.
  • End-to-end 3DV algorithms and 3D data representation formats, targeted for 3D recording, 3D playback, and real-time 3DV communication will beinvestigated and implemented.
  • Ergonomics and experience testing to measure any possible negative symptoms, such as eye strain created by stereoscopic content, will be performed. The project will research ergonomic conditions specific to the mobile handheld usage: in particular, the small screen, one hand holding the device, absence of complete keyboard, and limited input modalities.

In summary, the general requirements on 3DV algorithms on mobile phones are as follows:

  • low power consumption,
  • low complexity of algorithms,
  • limited memory/storage for both RAM and mass storage,
  • low memory bandwidth,
  • low video resolution,
  • limited data transmission rates and limited bitrates for 3DV signal.

These strong restrictions derived from terminal capabilities and from transmission bandwidth limitations usually result in relatively simple video processing algorithms to run on mobile phone devices. Typically, video coding standards take care of this by specific profiles and levels that only use a restricted and simple set of video coding algorithms and  low-resolution video. The H.264/AVC Baseline Profile for instance, uses only a simple subset of the rich video coding algorithms that the standard provides in general. For 3DV, the equivalent of such a low-complexity baseline profile for mobile phone devices still needs to be defined and developed. Obvious requirements of video processing and coding apply for 3DV on mobile phones as well, such as

  • high coding efficiency (taking bitrate and quality into account);
  • requirements specific for 3DV that apply for 3DV algorithms on mobile phones including
    • flexibility with regard to different 3D display types,
    • flexibility for individual adjustment of 3D impression.

 

 

TM-3D-SM Group of Digital Video Broadcast (DVB)

The DVB Project is an industry-led consortium of over 250 broadcasters, manufacturers, network operators, software developers, regulatory bodies, and others in over 35 countries committed to designing open technical standards for the
global delivery of DTV and data services. The DVB project is responsible for the definition of today’s 2D DTV broadcast infrastructure in Europe, requires the use of the MPEG-2 Systems Layer specification for the distribution of audiovisual
data via cable (DVB-C i.e., Digital Video Broadcast-Cable), satellite (DVB-S i.e., Digital Video Broadcast-Satellite), or terrestrial (DVB-T i.e., Digital Video Broadcast-Terrestrial) transmitters. Owing to its almost universal acceptance and
worldwide use, it is of major importance for any future 3DTV system, and to build its distribution services on this transport technology [16] (services using DVB standards are available on every continent with more than 500 million DVB receivers deployed).

During5 2009, DVB closely studied the various aspects of (potential) 3DTV solutions. A Technical Module Study Mission report was finalized, leading to the formal creation of the TM-3DTV group. A 3DTV Commercial Module has also now been created to go back to the first step of the DVB process: what kind of 3DTV solution does the market want and need, and how can DVB play an active part in the creation of that solution? To start answering some of these questions, the CM-3DTV group was planning to host a DVB 3D TV Kick-Off Workshop in early 2010.

There have already been broadcasts of a conventional display-compatible system, and the first HDTV channel compatible broadcasts are scheduled to start in Europe in spring 2010. In 2009, DVB had been closely studying the various aspects of (potential) 3DTV solutions. A Technical Module Study Mission report was finalized, leading to the formal creation of the TM-3DTV group. As the DVB process is business- and market-driven, a 3DTV Commercial Module has now also been created to go back to the first step: what kind of 3DTV solution does the market want and need, and how can DVB play an active part in the creation of that solution? To start answering some of these questions, the CM-
3DTV group hosted a DVB 3DTV Kick-off Workshop in Geneva in early 2010, followed immediately by the first CM-3DTV.

Rapporteur Group On 3DTV of ITU-R Study Group 6

Arranging a television system so that viewers can see 3D pictures is both simple and complex. ITU–R has agreed on a new study topic on 3D television, and in 2010 it expects to be building up knowledge of the options. Proponents had made the proposal to the ITU-R in 2008 that the time was ripe for worldwide agreements on 3DTV, and the ITU-R Study Group 6 has agreed on a “new Study Question” on 3D television, that will be submitted for approval by the ITU-R Membership

Though there are different views about whether current technology can provide a system which is entirely free of eyestrain, for those who wish to start such services, there could be advantages in having a worldwide common solution, or at least interoperable solutions, and the ITU-R Study Group 6 specialists have been gathering information, which might lead to such a result.

Therefore, the Question from ITU-R calls for contributions on systems that include, but also go beyond stereoscopy, and include technology that may record what physicists call the “object wave.” Clearly, this a more futuristic version of 3DTV. Holograms record in a limited way the “object wave.” Will there be a way of broadcasting to record an “object wave”? This remains to be seen. No approaches are excluded at this stage. The “Question” is essentially a call for proposals for 3DTV. Journals and individuals are asked to “spread the word” about this, and to invite contributions. Such contributions are normally channeled via national administrations, or via the other Members of the ITU—the so-called Sector Members. Which proposals will be made and which may be the subject of agreement remains to be seen, but the ITU-R sector has launched, in its own words, “an exciting new issue, which may have a profound impact on television
in the years ahead.”

The Question is included below to give the readers perspective on the ITU-R work.

QUESTION ITU-R 128/6
Digital three-dimensional (3D) TV broadcasting

The ITU Radiocommunication Assembly

considering

a) that existing TV broadcasting systems do not provide complete perception of reproduced pictures as natural three-dimensional scenes;

b) that viewers’ experience of presence in reproduced pictures may be enhanced by 3D TV, which is anticipated to be an important future application of digital TV broadcasting;

c) that the cinema industry is moving quickly towards production and display in 3D;

d) that research into various applications of new technologies (for example, holographic imaging) that could be used in 3D TV broadcasting is taking place in many countries;

e) that progress in new methods of digital TV signal compression and processing is opening the door to the practical realization of multifunctional 3D TV broadcasting systems;

f) that the development of uniform world standards for 3D TV systems, covering various aspects of digital TV broadcasting, would encourage adoption across the digital divide and prevent a multiplicity of standards;

g) the harmonization of broadcast and non-broadcast applications of 3D TV is desirable, decides that the following Questions should be studied

  1. What are the user requirements for digital 3D TV broadcasting systems?
  2. What are the requirements for image viewing and sound listening conditions for 3D TV?
  3. What 3D TV broadcasting systems currently exist or are being developed for the purposes of TV program production, post-production, television recording, archiving, distribution and transmission for realization of 3D TV broadcasting?
  4. What new methods of image capture and recording would be suitable for the effective representation of three-dimensional scenes?
  5. What are the possible solutions (and their limitations) for the broadcasting of 3D TV digital signals via the existing terrestrial 6, 7 and 8MHz bandwidth channels or broadcast satellite services, for fixed and mobile reception?
  6. What methods for providing 3D TV broadcasts would be compatible with existing television systems?
  7. What are the digital signal compression and modulation methods that may be recommended for 3D TV broadcasting?
  8. What are the requirements for the 3D TV studio digital interfaces?
  9. What are appropriate picture and sound quality levels for various broadcast applications of 3D TV?
  10. What methodologies of subjective and objective assessment of picture and sound quality may be used in 3D TV broadcasting?

also decides

  1. that results of the above-mentioned studies should be analyzed for the purpose of the preparation of new Reports and Recommendation(s);
  2. that the above-mentioned studies should be completed by 2012.

It should be noted that the ITU-R has already published some standards and reports on 3DTV in the past, including the following:

  • Rec. ITU-R BT.1198 (1995) Stereoscopic television based on R- and L-eye two-channel signals
  • Rec. ITU-R BT.1438 (2000) Subjective assessment of stereoscopic television pictures’
  • Report ITU-R BT.312-5 (1990) Constitution of stereoscopic television
  • Report ITU-R BT.2017 (1998) Stereoscopic television MPEG-2 multi-view profile
  • Report ITU-R BT.2088 (2006) Stereoscopic Television.

ITU-R BT.1198, Stereoscopic television based on R- and L-eye two-channel signals, suggests some general principles to be followed in development of stereoscopic television systems to maximize their compatibility with existing monoscopic systems. It contains

  • requirements for compatibility with monoscopic signal;
  • requirement for a discrete two-channel digital video coding scheme;
  • requirement for a discrete channel plus difference channel digital video coding scheme.

Obviously these are “old” standards, but they point to the fact that transmission of 3DTV signals is not completely a new concept.

Society of Motion Picture and Television Engineers (SMPTE) 3D Home Entertainment Task Force

There is a need for a single mastering standard for viewing stereo 3D content on TVs, PCs, and mobile phones, where the content could originate from optical disks, broadcast networks, or the Internet. To that end, SMPTE formed a 3D Home Entertainment Task Force in 2008 to work the issue and a standards effort was launched in 2009 via an SMPTE 3D Standards Working Group to define a content format for stereo 3D. The SMPTE 3D Standards Working Group had about 200 participants at press time; the Home Master standard was expected to become available in mid-2010. The group is in favor of a mastering standard for the Home Master specification based on 1920 × 1080 pixel resolution at 60 fps/eye. The specification is expected to support an option for falling back to a 2D image. The standard is also expected to support hybrid products, such as BDs that can support either 2D or stereo 3D displays.

SMPTE’s 3D Home Master defines high-level image formatting requirements that impact 3DTV designs, but the larger bulk of the 3DTV standards for hardware are expected to come from other organizations, such as CEA. Studios or game publishers would deliver the master as source material for uses ranging from DVD and BD players to terrestrial and satellite broadcasts and Internet downloadable or streaming files

As we have seen throughout this text, 3DTV systems must support multiple delivery channels, multiple coding techniques, and multiple display technologies. Digital cinema, for example, is addressed with a relatively simple left–right sequence approach; residential TV displays involve a greater variety of technologies necessitating more complex encoding. Content transmission and delivery is also supported by a variety of physical media such as BDs as well as broadcasting, satellite, and cable delivery. The SMPTE 3D Group has been considering what kind of compression should be supported. One of the key goals of the standardization process is defining and/or identifying schemes that minimize the total bandwidth required to support the service; the MVC extension to MPEG- 4/H.264 discussed earlier is being considered by the group. Preliminary studies have shown, however, that relatively little bandwidth may be saved when compared to simulcast because high-quality images require 75–100% overhead and images of medium quality require 65–98% overhead. In addition to defining the representation and encoding standards (which clearly drive the amount of channel bandwidth for the additional image stream), 3DTV service entails other requirements; for example, there is the issue of graphics overlay, captions and subtitles, and metadata. 3D programming guides have to be rethought, according to industry observers; the goal is to avoid floating the guide in front of the action and instead, to push the guide behind the screen and let the action play over it because practical research shows that people found it jarring when the programming guide is brought to the forefront of 3DV images [13]. The SMPTE Group is also looking at format wrappers, such as Material eXchange Format (MXF; a container format for professional digital video and audio media defined by a set of SMPTE standards), whether an electrical interface should be specified, and if depth representation is needed for an early version of the 3DTV service, among other factors [14]. As we have noted earlier in the text, 3DTV has the added consideration of physiological effects because disjoint stereoscopic images can adversely impact the viewer.

 

Capturing Video

The native video camera can be used to capture video within AIR.

Video and the CameraUI Class

You can use the native camera within AIR to capture video. Your application needs to have permission. In Flash Professional, select File→AIR Android settings→Permissions→ Camera. In Flash Builder, add the following permission:

[code]<uses-permission android:name=”android.permission.CAMERA”/>[/code]

The flash.media.CameraUI class is an addition to the ActionScript language to support the device’s native camera application. It is a subclass of the EventDispatcher class and is only supported on AIR for mobile.

This object allows you to launch the native camera application to shoot a video while your AIR application moves to the background.

When you use the native camera, it comes to the foreground, your AIR application moves to the background, and NativeApplication Event.DEACTIVATE is fired. Make sure you don’t have any logic that could interfere with the proper running of your application, such as exiting. Likewise, when the native camera application quits and your AIR comes back to the foreground, Event.ACTIVATE is called.

The first step is to verify that your device supports access to the camera by checking the CameraUI.isSupported property. Note that, as of this writing, Android does not support the front camera natively, and therefore neither does AIR:

[code]

import flash.media.CameraUI;
if (CameraUI.isSupported == false) {
trace(“no camera accessible”);
return;
}

[/code]

If it is supported, create an instance of the CameraUI class.

Register your application to receive camera events. A MediaEvent.COMPLETE is dispatched after a picture is taken, an Event.CANCEL if no media is selected, and an ErrorEvent if there is an error in the process:

[code]

import flash.events.MediaEvent;
import flash.events.ErrorEvent;
import flash.media.CameraUI;
var cameraUI:CameraUI = new CameraUI();
cameraUI.addEventListener(MediaEvent.COMPLETE, onComplete);
cameraUI.addEventListener(Event.CANCEL, onCancel);
cameraUI.addEventListener(ErrorEvent.ERROR, onError);

[/code]

Call the launch function and pass the type MediaType.VIDEO as a parameter. This will launch the camera in video mode automatically:

[code]

import flash.media.MediaType;
var cameraUI:CameraUI = new CameraUI();
cameraUI.launch(MediaType.VIDEO);
function onError(event:ErrorEvent):void {
trace(event.text);
}

[/code]

The camera application is now active and in the foreground. The AIR application moves to the background and waits.

Once the event is received, the camera application automatically closes and the AIR application moves back to the foreground.

Video capture on Android requires a lot of memory. To avoid having the Activity Manager terminate the application, the capture setting is restricted to low resolution by default, which requires a smaller memory buffer.

MPEG-4 Visual, Android low-resolution video, is not supported by AIR. Therefore, captured videos cannot be played back in AIR. The native application can be used to play back the recorded videos.

Currently, this functionality should only be used for capturing and not viewing unless you use the native application in the Gallery. The video is saved in a 3GP format that AIR does not support. Trying to play it back will just display a white screen.

In the following example, I provide the code for playback in AIR in case this is resolved in the future.

On select, a MediaEvent object is returned:

[code]

import flash.media.Video;
import flash.net.netConnection;
import flash.net.netStream;
var videoURL:String;
var connection:NetConnection;
function onComplete(event:MediaEvent):void {
videoURL = event.data.file.url;
connection = new NetConnection();
connection.addEventListener(NetStatusEvent.NET_STATUS, onStatus);
}
function onStatus(event:NetStatusEvent):void {
switch(event.info.code) {
case “NetConnection.Connect.Success” :
connectStream();
break;
case “NetStream.Play.StreamNotFound” :
trace(“video not found ” + videoURL);
break;
}
}
function connectStream():void {
stream = new NetStream(connection);
stream.addEventListener(NetStatusEvent.NET_STATUS, onStatus);
stream.addEventListener(AsyncErrorEvent.ASYNC_ERROR, onAsyncError);
var video:Video = new Video();
video.attachNetStream(stream);
stream.play(videoURL);
addChild(video);
}
function onAsyncError(event:AsyncErrorEvent):void {
trace(“ignore errors”);
}

[/code]

The Camera Class

The device’s camera, using the flash.media.Camera class, can be attached to a Video object in the AIR application. You can use this approach to simulate a web cam or for an Augmented Reality project.

The hardware orientation of the camera is landscape, so try to make your application’s orientation landscape too by changing the aspectRatio tag in your application descriptor:

[code]<aspectRatio>landscape</aspectRatio>[/code]

The setMode function is used to determine the video’s resolution:

[code]

import flash.media.Camera;
import flash.media.Video;
var camera:Camera = Camera.getCamera();
if (camera != null) {
camera.setMode(stage.stageWidth, stage.stageHeight, 15, true);
var video:Video = new Video(camera.width, camera.height);
video.x = 100;
video.y = 100;
video.attachCamera(camera);
addChild(video);
}

[/code]

Note that frames are only captured when the application is in the foreground. If the application moves to the background, capturing is paused but will resume automatically when the application moves to the foreground again.

You can query for the camera properties. Here are a few queries which may be helpful in your development:

[code]

camera.height;
camera.width;
camera.bandwidth;
camera.fps;
camera.muted
camera.name

[/code]

Documentation and Tutorials

Development around video is constantly evolving. The following two resources are among those that will help you to stay informed:

  • The Open Source Media Framework (http://www.opensourcemediaframework .com/resources.html) helps developers with video-related products. It is a good place to find code samples, tutorials, and other materials.
  • Lisa Larson-Kelly specializes in web video publishing and, more recently, mobilepublishing. She offers free tutorials and a newsletter on the latest technology (http://learnfromlisa.com/).

Preparing Video

A codec is software used to encode and decode a digital video signal. Engineers try various solutions to maintain video quality while reducing the amount of data, using state-of-the-art compression algorithm design.

A large portion of your work will comprise preparing and testing various configurations.

Codecs

At the time of this writing, AIR for Android supports codecs for On2 VP6, H.263 (Sorenson Spark), and H.264.

H.264, also called MPEG-4 Part 10 or AVC for Advanced Video Coding, delivers highquality video at lower bit rates than H.263 and On2. It is more complicated to decode, however, and requires native GPU playback or a fast compressor to ensure smooth playback.

H.264 supports the following profiles: Baseline, Extended, Main, and various flavors of High. Test the profiles, as not all of them work with hardware-accelerated media decoding. It appears that only Baseline is using this at the time of this writing.

AAC (Advanced Audio Coding) is the audio codec generally paired with H.264. Nellymoser and Speex are supported, but do not utilize hardware decoding.

MPEG-4 (Moving Picture Experts Group) H.264 is an industry-standard video compression format. It refers to the container format, which can contain several tracks. The file synchronizes and interleaves the data. In addition to video and audio, the container includes metadata that can store information such as subtitles. It is possible to contain more than one video track, but AIR only recognizes one.

Encoding

You can use Adobe Media Encoder CS5 or a third-party tool such as Sorenson Squeeze or On2 Flix to encode your video.

It is difficult to encode video for every device capacity and display size. Adobe recommends grouping devices into low-end, medium-end, and high-end groups.

If your video is embedded or attached to your application, prepare and provide only one file and use a medium-quality solution to serve all your users. If your video is served over a network, prepare multiple streams.

Gather as much information as possible from the user before selecting the video to play. The criteria are the speed of the network connection and the performance of the device.

Decoding

Containers are wrappers around video and audio tracks holding metadata. MP4 is a common wrapper for the MPEG-4 format and is widely compatible. F4V is Adobe’s own format, which builds on the open MPEG-4 standard media file format and supports H.264/AAC-based content. FLV, Adobe’s original video container file format, supports codecs such as Sorenson Spark and On2 VP6, and can include an alpha channel and additional metadata such as cue points.

Video decoding is a multithreaded operation. H.264 and AAC are decoded using hardware acceleration on mobile devices to improve frame rate and reduce battery consumption. Rendering is still done in the CPU.

Bit Rate

Bit rate is the number of bits dedicated to the video in one second (measured in kilobits per second or kbps). During the encoding process, the encoder varies the number of bits given in various portions of the video based on how complicated they are, while keeping the average as close to the bit rate you set as possible.

Because the average is calculated on the fly and is not always accurate, it is best to select the two-pass mode even though it takes longer. The first pass analyzes the video and records a statistics log; the second pass encodes the video using the log to stay as close to the desired average bit rate as possible.

Use the network connection speed as a guide for your encoding. The recommendation is to use 80% to 90% of the available bandwidth for video/audio combined, and keep the rest for network fluctuations. Try the following H.264/AAC rates as a starting point:

  • WiFi: 500 to 1,000 kbps, audio up to 160 kbps
  • 3G: 350 to 450 kbps, audio up to 128 kbps
  • 2.5G: 100 kbps, audio up to 32 kbps

Frame Rate

Reduce high frame rates whenever possible. Downsampling by an even factor guarantees a better result. For instance, a film at 30 fps can be downsampled to 15 fps; a film at 24 fps can be downsampled to 12 or 18 fps.

Do not use content encoded at a high frame rate and assume that a lower frame rate in AIR will adjust it. It will not.

If your footage was captured at a frame rate greater than 24 fps and you want to keep the existing frame rate, look at reducing other settings such as the bit rate.

If your video is the only moving content in your application, you can use a frame rate as low as 12 fps because the video plays at its native frame rate regardless of the application’s frame rate. A low frame rate
reduces drain on the battery.

Resolution

The pixel resolution is simply the width and height of your video. Never use a video that is larger than the intended display size. Prepare the video at the dimension you need.

High resolution has a greater impact on mobile video playback performance than bit rate. A conservative resolution of 480×360 plays very well; 640×480 is still good. A higher resolution will be challenging on most devices and will result in a poor viewing experience on devices that are not using the GPU for decoding or on devices with a 500 MHz CPU. Resolution recommendations are:

  • WiFi or 3G: 480×320
  • 2.5G: 320×240

In fact, you can often encode smaller and scale up without a noticeable decrease in picture quality. The high PPI on most devices will still display a high-quality video.

Decrease your video size by even divisors of 16. MPEG video encoders work by dividing the video frames into blocks of 16 by 16, called macroblocks. If the dimension does not divide into 16 or close to it, the encoder must do extra work and this may impact the overall encoding target. As an alternate solution, resort to multiples of eight, not four. It is an important practice to achieve maximum compression efficiency.

As for all mobile content, get rid of superfluous content. If necessary, crop the video to a smaller dimension or edit its content, such as trimming a long introduction.

For more information on mobile encoding guidelines, read Adobe’s white paper at http://download.macromedia.com/flashmediaserver/mobile-encoding-android-v2_7.pdf.

Performance

Hardware is improving quickly, but each device’s architecture is a little different. If you want to target the high end of the market, you can add such comments when submitting your applications to the Android Market.

In addition to your encoding settings, there are some best practices to obey for optimal video playback. They are all simple to apply:

  • Do not put anything on top of or behind the video, even if it is transparent. This would need to be calculated in the rendering process and would negatively affect video playback.
  • Make sure your video window is placed on a full pixel (no half-pixel boundaries).
  • Do not use bitmap caching on the video or any other objects on the stage. Do not use filters such as drop shadows or pixel benders. Do not skew or rotate the video. Do not use color transformation or objects with alpha.
  • Do not show more than one video at the same time.
  • Stop all other processes unless they are absolutely necessary. If you use a progress bar, only call for progress update using a timer every second, not on the enter frame event.

MPEG Industry Forum (MPEGIF)

Moving Pictures Expert Group Industry Forum (MPEGIF) is an advocacy group for standards-based DTV technologies. The group is an independent and platformneutral not-for-profit organization representing more than 20 international companies and organizations with the goal to facilitate and further the widespread adoption and deployment of MPEG and related standards in next-generation digital media services. MPEGIF is among the consortiums focused on standardizing technology and methods for delivering 3DV/3DTV.

MPEGIF announced in December 2009 the formation of the 3DTV Working Group and launch of the “3D over MPEG” campaign. The new working group and campaign continue MPEGIF’s work in furthering the widespread adoption and deployment of MPEG-related standards including MPEG-4 AVC/H.264. The chair of the newly formed 3DTV Working Group stated that “3DTV is of keen interest to everyone in the video creation and delivery industries. The challenge we all face is that of sorting through the myriad technical options. Our common goal is to create a 3DTV ecosystem that delivers great new experiences to consumers. The 3DTV Working Group and the ‘3D over MPEG’ campaign are designed to provide focus and clear information to decision makers. 3DTV can be distributed today using MPEG-related standards. Existing broadband and broadcast services and infrastructures are 3D-ready, and ongoing works by standards bodies provide a compelling path for the future evolution of 3DTV . . . 3D video is showing distinct commercial promise in theatrical releases and could thus transition to the advanced living room to follow High-Definition and Surround Sound. As a result there is a growing array of competing technologies and work from various standards bodies. It has therefore become a major theme of the next MPEG Industry Forum Master Class being held at CES 2010 in Las Vegas in January 2010.” About 30 industry participants joined the 3D Working Group at the launch.

The 3DTV Working Group aims at providing a forum for free exchange of information related to this emerging technology, an industry voice advocating the adoption of standards and for consolidating the overall direction of the 3DTV industry. Its focus and constituency will be derived from video service providers, consumer electronics manufacturers, content owners, equipment manufacturers, system integrators, software providers, as well as industry advocacy groups, industry analysts, financial institutions, and academic institutes.

MPEG-4 MVC is being given consideration. As we have seen, MPEG-4 MVC can be used, among other more sophisticated applications, to handle simple transmission of independent, left-eye/right-eye views, which is considered to be the viable early commercial approach, at least in the United States. An arrangement called by some “frame-packing arrangement and SEI message” enables the encoder to signal the decoder how to extract two distinct views from a single decoded frame; this could be in the form of side-by-side, or over–under images.

Moving Picture Experts Group (MPEG)

Overview

MPEG is a working group of ISO/IEC in charge of the development of standards for coded representation of digital audio and video and related data. Established in 1988, the group produces standards that help the industry offer end users an evermore enjoyable digital media experience. In its 21 years of activity, MPEG has developed a substantive portfolio of technologies that have created an industry worth several hundred billion USD. MPEG is currently interested in 3DV in general and 3DTV in particular. Any broad success of 3DTV/3DV will likely depend on the development and industrial acceptance of MPEG standards; MPEG is the premiere organization worldwide for video encoding and the list of standards that have been produced in recent years is as follows:

MPEG-1 The standard on which such products as video CD and MP3 are based

MPEG-2 The standard on which such products as digital television set-top boxes and DVDs are based

MPEG-4 The standard for multimedia for the fixed and mobile web

MPEG-7 The standard for description and search of audio and visual content

MPEG-21 The multimedia framework

MPEG-A The standard providing application-specific formats by integrating multiple MPEG technologies

MPEG-B A collection of systems-specific standards

MPEG-C A collection of video-specific standards

MPEG-D A collection of audio-specific standards

MPEG-E A standard (M3W) providing support to download and execute multimedia applications

MPEG-M A standard (MXM) for packaging and reusability of MPEG technologies

MPEG-U A standard for rich media user interface

MPEG-V A standard for interchange with virtual worlds

provides a more detailed listing of activities of MPEG groups in the area of video.

Completed Work

As we have seen in other parts of this text, currently there are a number of different 3DV formats (either already available and/or under investigation), typically related to specific types of displays (e.g., classical two-view stereo video, multiview video with more than two views, V+D, MV+D, and layered depth video). Efficient compression is crucial for 3DV applications and a plethora of compression and coding algorithms are either already available and/or under investigation for the different 3DV formats (some of these are standardized e.g., by MPEG, others are proprietary). A generic, flexible, and efficient 3DV format that can serve a range of different 3DV systems (including mobile phones) is currently being investigated by MPEG.

As we noted earlier in this text, MPEG standards now already support 3DV based on V+D. In 2007 MPEG specified a container format “ISO/IEC 23002-3 Representation of Auxiliary Video and Supplemental Information” (also know  as MPEG-C Part 3) that can be utilized for V+D data. Transport of this data is defined in a separate MPEG systems specification “ISO/IEC 13818-1:2003 Carriage of Auxiliary Data”

In 2008 ISO approved a new 3DV project in 2008 under ISO/IEC JTC1/SC29/WG11 (ISO/IEC JTC1/SC29/WG11, MPEG2008/N9784). The
JVT of ITU-T and MPEG has devoted its recent efforts to extend the widely deployed H.264/AVC standard for MVC to support MV+D (and also V+D). MVC allows the construction of bitstreams that represent multiple views. The MPEG standard that emerged, MVC, provides good robustness and compression performance for delivering 3DV by taking into account of the inter-view dependencies of the different visual channels. In addition, its backwards-compatibility with H.264/AVC codecs makes it widely interoperable in environments having both 2D and 3D capable devices. MVC supports an MV+D (and also V+D) encoded representation inside the MPEG-2 transport stream. The MVC standard was developed by the JVT of ISO/IEC MPEG

Activities of MPEG Groups in the Area of Video

Activities of MPEG Groups in the Area of Video

Activities of MPEG Groups in the Area of VideoActivities of MPEG Groups in the Area of Video

Activities of MPEG Groups in the Area of Video

Activities of MPEG Groups in the Area of Video

Activities of MPEG Groups in the Area of Video

Activities of MPEG Groups in the Area of VideoActivities of MPEG Groups in the Area of Video

and ITU-T Video Coding Experts Group (VCEG; ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6). MVC was originally an addition to H.264/MPEG-4 AVC video compression standard that enables efficient encoding of sequences captured simultaneously from multiple cameras using a single video stream.

At press time, MVC was the most efficient way for stereo and multi-view video coding; for two views, the performance achieved by H.264/AVC Stereo SEI message and MVC are similar. MVC is also expected to become a new MPEG video coding standard for the realization of future video applications such as 3DTV and FTV. The MVC group in the JVT has chosen the H.264/AVC-based MVC method as the MVC reference model, since this method showed better coding efficiency than H.264/AVC simulcast coding and the other methods that were submitted in response to the call for proposals made by the MPEG.

New Initiatives

ISO MPEG has already developed a suite of international standards to support 3D services and devices, and in 2009 initiated a new phase of standardization to be completed by 2011

  • One objective is to enable stereo devices to cope with varying display types and sizes, and different viewing preferences. This includes the ability to vary the baseline distance for stereo video to adjust the depth perception that could help to avoid fatigue and other viewing discomforts.
  • MPEG also envisions that high-quality autostereoscopic displays will enter the consumer market in the next few years. Since it is difficult to directly provide all the necessary views due to production and transmission constraints, a new format is needed to enable the generation of many high-quality views from a limited amount of input data such as stereo and depth.

ISO’s vision is now a new 3DV format that goes beyond the capabilities of existing standards to enable both advanced stereoscopic display processing and improved support for autostereoscopic N -view displays, while enabling interoperable 3D services. The new 3DV standard aims to improve rendering capability of 2D+Depth format while reducing bitrate requirements relative to existing standards, as noted earlier in this Section 6.

3DV supports new types of audiovisual systems that allow users to view videos of the real 3D space from different user viewpoints. In an advanced application of 3DV, denoted as FTV, a user can set the viewpoint to an almost arbitrary location and direction that can be static, change abruptly, or vary continuously, within the limits that are given by the available camera setup. Similarly, the audio listening point is changed accordingly. The first phase of 3DV development is expected to support advanced 3D displays, where M dense views must be generated from a sparse set of K transmitted views (typically K ≤ 3) with associated depth data. The allowable range of view synthesis will be relatively narrow (20◦ view angle from leftmost to rightmost view).

Example of an FTV system and data format.

The MPEG initiative notes that 3DV is a standard that targets serving a variety of 3D displays. It is the first phase of FTV, that is a new framework that includes a coded representation for multi-view video and depth information to support the generation of high-quality intermediate views at the receiver. This enables free viewpoint functionality and view generation for automultiscopic displays [7].
Figure 6.1 shows an example of an FTV system that transmits multi-view video with depth information. The content may be produced in a number of ways; for example, with multicamera setup, depth cameras or 2D/3D conversion processes. At the receiver, DIBR could be performed to project the signal to various types of displays.

The first focus (phase) of ISO/MPEG standardization for FTV is 3DV [8]. This means video for 3D displays. Such displays focus present N views (e.g., N = 9) simultaneously to the user (Fig. 6.2). For efficiency reasons, only a lower number K of views (K = 1, 2, 3) shall be transmitted. For those K views additional depth data shall be provided. At the receiver side, the N views to be displayed are generated from the K transmitted views with depth by DIBR. This is illustrated in Fig. 6.2.

This application scenario imposes specific constraints such as narrow angle acquisition (<20◦). Also there should be no need (cost reasons) for geometric rectification at the receiver side, meaning if any rectification is needed at all, it should be performed on the input views already at the encoder side.

Example of generating nine outputs views (N = 9) out of three input views with depth (K = 3).

Some multi-view displays are, for example, based on LCD screens with a sheet of transparent lenses in front. This sheet sends different views to each eye, and so a person sees two different views; this gives the person a stereoscopic viewing experience. The stereoscopic capabilities of these multi-view displays are limited by the resolution of the LCD screen (currently 1920 × 1080). For example, for a nine-view system where the cone of nine views is 10◦ (Cone Angle—CA), objects are limited to ±10% (Object Range—OR) of the screen
width to appear in front or behind the screen. Both OR and CA will improve with time (determined by economics) as the number of pixels of the LCD screen goes up.

In addition, other types of stereo displays appear now in the market in large numbers. The ability to generate output views at arbitrary positions at the receiver, is attractive even in the case of N = 2 (i.e., simple stereo display). If, for example, the material has been produced for a large cinema theater, direct usage of that stereo signal (two fixed views) with relatively small home-sized 3D displays will yield a very different stereoscopic viewing experience (e.g., strongly reduced depth effect). With a 3DV signal as illustrated in Fig. 6.3 a new stereo pair can be generated that is optimized for the given 3D display.

With a different initiative, ISO previously looked at auxiliary video data representations. The purpose of ISO/IEC 23002-3 Auxiliary Video Data Representations is to support all those applications where additional data needs to be

Example of lenticular autostereoscopic display requiring nine views (N = 9).

efficiently attached to the individual pixels of a regular video. ISO/IEC 23002-3 describes how this can be achieved in a generic way by making use of existing (and even future) video codecs available within MPEG. A good example of an application that requires additional information associated with the individual pixels of a regular (2D) video stream is stereoscopic video presented on an autostereoscopic single- or multiple-user display. At the MPEG meeting in Nice, France (October 2005), the arrival of such displays on the market had
been stressed, and several of them were even shown and demonstrated. Because different display realizations vary largely in (i) the number of views that are represented; and (ii) the maximum parallax that can be supported, an input format is required that is flexible enough to drive all possible variants. This can be achieved by supplying a depth or parallax values with each pixel of a regular video stream,
and by generating the required stereoscopic views at the receiver side. The standardization of a common depth, in the parallax format within ISO/IEC 23002-3 Auxiliary Video Data Representations will thus enable interoperability between content providers, broadcasters, and display manufacturers. ISO/IEC 23002-3 is flexible enough to easily add other types of auxiliary video data in the future. One example could be the annotation of temperature maps coming from an infrared camera to regular video coming from a regular camera

The Auxiliary Video Data format defined in ISO/IEC 23002-3 consists of an array of N -bit values that are associated with the individual pixels of a regular video stream. These data can be compressed like conventional luminance signals using already existing (and even future) MPEG video codecs. The format allows for optional subsampling of the auxiliary data in both, the spatial and temporal domains. This can be beneficial depending on the particular application and its requirements and allowing for very low bitrates for the auxiliary data. The specification is very flexible in the sense that it defines a new 8-bit code word aux_video_type that specifies the type of the associated data; for example, currently a value of 0 × 10 signals a depth map, a value of 0 × 11 signals a parallax map. New values for additional data representations can be easily added to fulfill future demands.

The transport of auxiliary video data within an MPEG-2 transport or program stream is defined in an amendment to the MPEG-2 systems standard. It specifies new stream_id_extension and stream_type values that are used to signal an auxiliary video data stream. An additional auxiliary_video_data_descriptor is utilized in order to convey in more detail how the data should to be interpreted by the
application that uses them. Metadata associated with the auxiliary data is carried on system level, allowing the use of unmodified video codecs (no need to modify silicon).

In conclusion, ISO/IEC 23002-3 Auxiliary Video Data Representations provides a reasonably efficient approach for attaching additional information such as depth values and parallax values to the individual pixels of a regular video stream and to signal how these associated data should be interpreted by the application that uses them.

3DTV Standardization and Related Activities

Standardization efforts have to be understood in the context of where stakeholders and proponents see the technology going. We already defined what we believe to be five generations of 3DTV commercialization in Chapter 1, which the reader will certainly recall. These generations fit in well with the following menu of research activity being sponsored by various European and global research initiatives, as described in Ref. [1]:

Short-term 3DV R&D (immediate commercialization, 2010–2013)

  • Digital stereoscopic projection
    • better/perfect alignment to minimize “eye-fatigue.”
  • End-to-end digital production-line for stereoscopic 3D cinema
    • digital stereo cameras;
    • digital baseline correction for realistic perspective;
    • digital postprocessing.

Medium-term 3DV R&D (commercialization during the next few years, 2013–2016)

  • End-to-end multi-view 3DV with autostereoscopic displays
    • cameras and automated camera calibration;
    • compression/coding for efficient delivery;
    • standardization;
    • view interpolation for free-view video;
    • better autostereoscopic displays, based on current and near future technology (lenticular, barrier-based);
    • natural immersive environments.

Long-term 3DV R&D (10+ years, 2016–2020+)

  • realistic/ultrarealistic displays;
  • “natural” interaction with 3D displays;
  • holographic 3D displays, including “integral imaging” variants;
  • natural immersive environments;
  • total decoupling of “capture” and “display”;
  • novel capture, representation, and display techniques.

One of the goals of the current standardization effort is to decouple the capture function from the display function. This is a very typical requirement for service providers, going back to voice and Internet services: there will be a large pool of end users each opting to choose a distinct Customer Premises Equipment (CPE) device (e.g., phone, PC, fax machine, cell phone, router, 3DTV display); therefore, the service provider needs to utilize an network-intrinsic protocol (encoding, framing, addressing, etc.) that can then be utilized by the end device to create its own internal representation, as needed. The same applies to 3DTV.

As noted in Chapter 1, there is a lot of interest shown in this topic by the industry and standards body. The MPEG of ISO/IEC is working on a coding format for 3DV. Standards are the key to cost-effective deployment of a technology. Examples of video-related standards include the Beta-VHS (Video Home System) and the HD DVD–Blu-ray controversies.  SMPTE is working on some of the key standards needed to deliver 3D to the home. As far back as 2003, a 3D Consortium with 70 partner organizations had been founded in Japan and, more recently, four new activities have been started: the [email protected] Consortium, the SMPTE 3D Home Entertainment Task Force, the Rapporteur Group on 3DTV of ITU-R Study Group 6, and the TM-3D-SM group of DVB. It will probably be somewhere around 2012 by the time there
will be an interoperable standard available in consumer systems to handle all the delivery mechanisms for 3DTV.

At a broad level and in the context of 3DTV, the following major initiatives had been undertaken at press time:

  • MPEG: standardizing multi-view and 3DV coding;
  • DVB: standardizing of digital video transmission to TVs and mobile devices;
  • SMPTE: standardizing 3D delivery to the home;
  • ITU-T: standardizing user experience of multimedia content;
  • VQEG (Video Quality Experts Group): standardizing of objective video quality assessment.

There is a pragmatic possibility that in the short term, equipment providers may have to support a number of formats for stereo 3D content. The ideal approach for stereoscopic 3DTV is to provide sequential left and right frames at twice the chosen viewing rate. However, because broadcasters and some devices may lack transport/interface bandwidth for that approach, a number of alternatives may also be used (at least in the short term). Broadcasters appear to be focusing on top/bottom interleaving; however, trials are still ongoing to examine other approaches that involve some form of compression including checkerboard, sideby-side, or interleaved rows or columns.

 

 

Multicast Operation

As noted above, the backbone may consist of (i) a pure IP network or (ii) a mixed satellite transmission link to a metropolitan headend that, in turn, uses a metropolitan (or regional) telco IP network. Applications such as video are very sensitive to end-to-end delay, jitter, and (uncorrectable) packet loss; QoS considerations are critical. These networks tend to have fewer hops and pruning may be somewhat trivially implemented by a making use of a simplified network topology.

At the logical level, there are three types of communication between systems in a(n IP) network:

  • Unicast: Here, one system communicates directly to another system.
  • Broadcast: Here, one system communicates to all systems.
  • Multicast: Here, one system communicates to a select group of other systems.

In traditional IP networks, a packet is typically sent by a source to a single destination (unicast); alternatively, the packet can be sent to all devices on the network (broadcast). There are business- and multimedia (entertainment) applications that require a multicast transmission mechanism to enable bandwidthefficient communication between groups of devices where information is transmitted to a single multicast address and received by any device that wishes to obtain such information. In traditional IP networks, it is not possible to generate a single transmission of data when this data is destined for a (large) group of remote devices. There are classes of applications that require  distribution of information to a defined (but possibly dynamic) set of users. IP Multicast, an extension to IP, is required to properly address these communications needs. As the term implies, IP Multicast has been developed to support efficient communication between a source and multiple remote destinations.

Multicast applications include, among others, datacasting—for example, for distribution of real-time financial data—entertainment digital television over an IP network (commercial-grade IPTV), Internet radio, multipoint video conferencing, distance-learning, streaming media applications, and corporate communications. Other applications include distributed interactive simulation, cloud/grid computing, and distributed video gaming (where most receivers are also senders). IP Multicast protocols and underlying technologies enable efficient distribution of data, voice, and video streams to a large population of users, ranging from hundreds to thousands to millions of users. IP Multicast technology enjoys intrinsic scalability, which is critical for these types of applications.

As an example in the IPTV arena, with the current trend toward the delivery of HDTV signals, each requiring the 12 Mbps range, and the consumers’ desire for a large number of channels (200–300 being typical), there has to be an efficient mechanism of delivering a signal of 1–2 Gbps1 aggregate to a large number of remote users. If a source had to deliver 1 Gbps of signal to, say, 1 million receivers by  transmitting all of this bandwidth across the core network, it would require a petabit per second network fabric; this is currently not possible. On the other hand, if the source could send the 1 Gbps of traffic to (say) 50 remote distribution points (for example, headends), each of which then makes use of a local distribution network to reach 20,000 subscribers, the core network only needs to support 50 Gbps, which is possible with proper design. For such reasons, IP Multicast is seen as a bandwidth-conserving technology that optimizes traffic management by simultaneously delivering a stream of information to a large population of recipients, including corporate enterprise users and residential customers. IPTV uses IP-based basic transport (where IP packets contain MPEG-4 TSs) and IP Multicast for service control and content acquisition (group membership). See Fig. 5.1 for a pictorial example.

One important design principle of IP Multicast is to allow receiver-initiated attachment (joins) to information streams, thus supporting a distributed informatics model. A second important principle is the ability to support optimal pruning such that the distribution of the content is streamlined by pushing replication as close to the receiver as possible. These principles enable bandwidth-efficient use of underlying network infrastructure.

The issue of security in multicast environments is addressed via Conditional Access Systems (CAS) that provide per-program encryption (typically, but not always, symmetric encryption; also known as inner encryption) or aggregate IP-level encryption (again typically, but not always, symmetric encryption; also known as outer encryption).

Carriers have been upgrading their network infrastructure in the past few years to enhance their capability to provide QoS-managed services, such as IPTV. Specifically, legacy remote access platforms, implemented largely to support basic DSL service roll-outs—for example, supporting ATM aggregation and DSL termination—are being replaced by new broadband network gateway access technologies optimized around IP, Ethernet, and VDSL2 (Very High Bitrate

Bandwidth advantage of IP Multicast.

Digital Subscriber Line 2). These services and capabilities are delivered with multiservice routers on the network edge. Viewer-initiated program selection is achieved using the IGMP, specifically with the Join Group Request message. (IGMP v2 messages include Create Group Request, Create Group Reply, Join Group Request, Join Group Reply, Leave Group (LG) Request, LG Reply, Confirm Group Request, and Confirm Group Reply.) Multicast communication is based on the construct of a group of receivers (hosts) that have an interest in receiving a particular stream of information, be it voice, video, or data. There are no physical or geographical constraints, or boundaries to belong to a group, as long as the hosts have (broadband) network connectivity. The connectivity of the receivers can be heterogeneous in nature, in terms of bandwidth and connecting infrastructure (for example, receivers connected over the Internet), or homogeneous (for example, IPTV or DVB-H users). Hosts that are desirous of receiving data intended for a particular group join the group using a group management protocol: hosts/receivers must become explicit members of the group to receive the data stream, but such membership may be ephemeral and/or dynamic. Groups of IP hosts that have joined the group and wish to receive traffic sent to this specific group are identified by multicast addresses.

Multicast routing protocols belong to one of two categories: Dense-Mode (DM) protocols and Sparse-Mode (SM) protocols.

  • DM protocols are designed on the assumption that the majority of routers in the network will need to distribute multicast traffic for each multicast group. DM protocols build distribution trees by initially flooding the entire network and then pruning out the (presumably small number of) paths without active receivers. The DM protocols are used in LAN environments, where bandwidth
    considerations are less important, but can also be used in WANs in special cases (for example, where the backbone is a one-hop broadcast medium such as a satellite beam with wide geographic illumination, such as in some IPTV applications).
  • SM protocols are designed on the assumption that only few routers in the network will need to distribute multicast traffic for each multicast group. SM protocols start out with an empty distribution tree and add drop-off branches only upon explicit requests from receivers to join the distribution. SM protocols are generally used in WAN environments, where bandwidth considerations are important.

For IP Multicast there are several multicast routing protocols that can be employed to acquire real-time topological and membership information for active groups. Routing protocols that may be utilized include the Protocol-Independent Multicast (PIM), the Distance Vector Multicast Routing Protocol (DVMRP), the MOSPF (Multicast Open Shortest Path First), and Core-Based Trees (CBT). Multicast
routing protocols build distribution trees by examining routing a forwarding table that contains unicast reachability information. PIM and CBT use the unicast forwarding table of the router. Other protocols use their specific unicast reachability routing tables; for example, DVMRP uses its distance vector routing protocol to determine how to create source-based distribution trees, while MOSPF utilizes its link state table to create source-based distribution trees. MOSPF, DVMRP, and PIM-DM are dense-mode routing protocols, while CBT and PIM-SM are sparse-mode routing protocols. PIM is currently the most-widely used protocol.

As noted, IGMP (versions 1, 2, and 3) is the protocol used by Internet Protocol Version 4 (IPv4) hosts to communicate multicast group membership states to multicast routers. IGMP is used to dynamically register individual hosts/receivers on a particular local subnet (for example, LAN) to a multicast group. IGMP

IGMP v2 message format.

version 1 defined the basic mechanism. It supports a Membership Query (MQ) message and a Membership Report (MR) message. Most implementations at press time employed IGMP version 2; it adds LG messages. Version 3 adds source awareness, allowing the inclusion or exclusion of sources. IGMP allows group membership lists to be dynamically maintained. The host (user) sends an IGMP “report,” or join, to the router to be included in the group. Periodically, the router sends a “query” to learn which hosts (users) are still part of a group. If a host wishes to continue its group membership, it responds to the query with a “report.” If the host does not send a “report,” the router prunes the group list to delete this host; this eliminates unnecessary network transmissions. With IGMP v2, a host may send an LG message to alert the router that it is no longer participating in a multicast group; this allows the router to prune the group list to delete this host before the next query is scheduled, thereby minimizing the time period during which unneeded transmissions are forwarded to the network.

The IGMP messages for IGMP version 2 are shown in Fig. 5.2. The message comprises an eight octet structure. During transmission, IGMP messages are encapsulated in IP datagrams; to indicate that an IGMP packet is being carried, the IP header contains a protocol number of 2. An IP datagram includes a Protocol Type field, that for IGMP is equal to 2 (IGMP is one of many protocols that can be specified in this field). An IGMP v2 PDU consists of a 20-byte IP header and 8 bytes of IGMP.

Some of the areas that require consideration and technical support to develop and deploy IPTV systems include the following, among many others:

  • content aggregation;
  • content encoding (e.g., AVC/H.264/MPEG-4 Part 10, MPEG-2, SD, HD, Serial Digital Interface (SDI), Asynchronous Serial Interface (ASI), Layer 1 switching/routing);
  • audio management;
  • digital right management/CA: encryption (DVB-CSA, AES or Advanced Encryption StandardAdvanced Encryption Standard); key management schemes (basically, CAS); transport rights;
  • encapsulation (MPEG-2 transport stream distribution);
  • backbone distribution such as satellite or terrestrial (DVB-S2, QPSK, 8-PSK, FEC, turbo coding for satellite—SONET (Synchronous Optical Network)/SDH/OTN (Synchronous Digital Hierarchy/Optical Transport Network) for terrestrial);
  • metro-level distribution;
  • last-mile distribution (LAN/WAN/optics, GbE (Gigabit Ethernet), DSL/FTTH);
  • multicast protocol mechanisms (IP multicast);
  • QoS backbone distribution;
  • QoS, metro-level distribution;
  • QoS, last-mile distribution;
  • QoS, channel surfing;
  • Set-Top Box (STB)/middleware;
  • QoE;
  • Electronic Program Guide (EPG);
  • blackouts;
  • service provisioning/billing, service management;
  • advanced video services (e.g., PDR and VOD);
  • management and confidence monitoring;
  • triple play/quadruple play.