Diagnostics Tools

You can monitor performance and memory using diagnostics tools. Let’s look at a few of them.

Hi-Res-Stats

The Hi-Res-Stats class, from mrdoob, calculates the frame rate, the time to render each frame, the amount of memory used per frame, and the maximum frame rate and memory consumption. Import the library and add a new instance of Stats as a displayOb ject. This is simple and convenient to use (see https://github.com/bigfish):

[code]

import net.hires.debug.*;
var myStats:Stats = new Stats();
addChild(myStats);

[/code]

Because it needs to be added to the displayList and draws its progress visually, as shown in Figure 19-4, this tool may impact rendering slightly, or get in the way of other graphics. A trick I use is to toggle its visibility when pressing the native search key on my device:

[code]

import flash.ui.Keyboard;
import flash.events.KeyboardEvent;
stage.addEventListener(KeyboardEvent.KEY_DOWN, onKey);
function onKey(e:KeyboardEvent):void {
switch (e.keyCode) {
case Keyboard.SEARCH:
event.preventDefault();
myStats.visible = !myStats.visible;
break;
}
}

[/code]

Figure 19-4. Hi-Res-Stats display
Figure 19-4. Hi-Res-Stats display

Flash Builder Profiler

The premium version of Flash Builder comes with Flash Builder Profiler, which watches live data and samples your application at small, regular intervals and over time. It is well documented. Figure 19-5 shows the Configure Profiler screen.

Figure 19-5. The Configure Profiler screen in Flash Builder Profiler
Figure 19-5. The Configure Profiler screen in Flash Builder Profiler

When “Enable memory profiling” is selected, the profiler collects memory data and memory usage. This is helpful for detecting memory leaks or the creation of large objects. It shows how many instances of an object are used.

When “Watch live memory data” is selected, the profiler displays memory usage data for live objects. When “Generate object allocation stack traces” is selected, every new creation of an object is recorded.

When “Enable performance profiling” is selected, the profiler collects stack trace data at time intervals. You can use this information to determine where your application spends its execution time. It shows how much time is spent on a function or a process.

You can also take memory snapshots and performance profiles on demand and compare them to previous ones. When doing so, the garbage collector is first run implicitly. Garbage collection can also be monitored.

Flash Preload Profiler

The Flash Preload Profiler is an open source multipurpose profiler created by Jean- Philippe Auclair. This tool features a simple interface and provides data regarding frame rate history and memory history, both current and maximum.

Other more unusual and helpful features are the overdraw graph, mouse listener graph, internal events graph, displayObject life cycle graph, full sampler recording dump, memory allocation/collection dump, function performance dump, and run on debug/ release SWFs capability. More information on the Flash Preload Profiler is available at http://jpauclair.net/2010/12/23/complete-flash-profiler-its-getting-serious/.

Grant Skinner’s PerformanceTest

Grant’s PerformanceTest class is a tool for doing unit testing and formal test suites. Some of its core features are the ability to track time and memory usage for functions, and the ability to test rendering time for display objects. The class performs multiple iterations to get minimum, maximum, and deviation values, and it runs tests synchronously or queued asynchronously.

The class returns a MethodTest report as a text document or XML file. It can perform comparisons between different versions of Flash Player and different versions of the same code base. More information on this class is available at http://gskinner.com/blog/ archives/2010/02/performancetest.html.

Native Tools

The Android Debug Bridge (ADB) logcat command grabs information from the device and dumps it onto a log screen via USB. A lot of information is provided. Some basic knowledge of the Android framework will help you understand it better.

Bitmap Size and Mip Mapping

The dimension of your bitmap does matter for downscaling bitmaps. Downscaling is the process of creating smaller and larger dimensions of a bitmap in place of the original art to improve scaling images. This process is done directly from the original to the desired size. It was precomputed in previous versions of Flash Player.

Both width and height values must be an even number. A perfect bitmap is one with dimensions that are a power of two, but it is enough if the dimensions are divisible by eight. The following numbers demonstrate downscaling a bitmap for the resolution of a Nexus One and a Samsung Galaxy Tab. The mip levels, scaling values reduced in half, work out perfectly:

[code]

800×480 > 400×240 > 200×120 > 100×60
1024×600 > 512×300 > 256×150 > 128×75

[/code]

A bitmap at 1,000 by 200 would reduce to three mip levels, but a bitmap at 998 by 200 only reduces to one level.

The point to take away from this is that you should create assets that fit within these dimensions.

RTMFP UDP

Peer-to-peer (P2P) communication is the real-time transfer of data and media between clients.

Real Time Media Flow Protocol (RTMFP) is an Adobe proprietary protocol. It enables peer-to-peer communication between applications running in Flash Player or the AIR runtime. It is meant to provide a low-latency, secure, peering network experience.

Real Time Messaging Protocol (RTMP) uses Transmission Control Protocol (TCP). RTMFP uses User Datagram Protocol (UDP), which provides better latency, higher security (128-bit AES encryption), and scalability. RTMP is faster than RTMFP, but does not guarantee perfect message ordering and delivery.

RTMP was designed to connect via an RTMFP-capable server, such as the Cirrus service (RTMP uses the Flash Media Server), but it can also be used over a local network using WiFi without the need for a server. Note that Flash Media Server 4.0 speaks RTMFP as well.

 

Rendering, or How Things Are Drawn to the Screen

As a developer, you trust that an object is represented as your code intended. From your ActionScript commands to the matching pixels appearing on the screen, a lot is done behind the screens.

To better understand Flash Player and AIR for Android rendering, I recommend watching the MAX 2010 presentations “Deep Dive Into Flash Player Rendering” by Lee Thomason and “Developing Well-Behaved Mobile Applications For Adobe AIR” by David Knight and Renaun Erickson. You can find them at http://tv.adobe.com.

Flash is a frame-based system. The code execution happens first, followed by the rendering phase. The concept of the elastic racetrack describes this ongoing process whereby one phase must end before the other one begins, and so on.

The current rendering process for Flash Player and the AIR runtime comprises four steps, outlined in the subsections that follow. Annotations regarding Enhanced Caching/ GPU Model indicate the process, if different, for AIR for Android using cacheAsBit mapMatrix and GPU rendering.

Computation

Traditional and Enhanced Caching/GPU Model

The renderer, qualified as retained, traverses the display list and keeps a state of all objects. It looks at each object matrix and determines its presentation properties. When a change occurs, it records it and defines the dirty region, areas that have changed, and the objects that need to be redrawn, to narrow down changes to a smaller area. It then concatenates matrices, computes all the transformations, and defines the new state.

Edge and Color Creation

Traditional Model

This step is the most challenging mathematically. It applies to vector objects and bitmaps with blends and filters.

An SObject is the displayObject equivalent in binary code, and is not accessible via ActionScript. It is made of layers with closed paths, with edges and colors. Edges are defined as quadratic Bezier curves and are used to prevent two closed paths from intersecting. Abutment is used to seam them together. More edges mean more calculation. Color applies to solids and gradients, but also bitmaps, video, and masks (combined color values of all display objects to determine a final color).

Enhanced Caching/GPU Model

No calculation is needed for cached items. They may be rotated, scaled, and skewed using their bitmap representation.

Rasterization

Traditional Model

This step is the most demanding in terms of memory. It uses multithreading for better performance. The screen is divided into dynamically allocated horizontal chunks. Taking one dirty rectangle at a time, lines are scanned from left to right to sort the edges and render a span, which is a horizontal row of pixels. Hidden objects, the alpha channel of a bitmap, and subpixel edges are all part of the calculation.

Enhanced Caching/GPU Model

Transformation cached rendering, also called processing, is used. It converts each vector graphic as a bitmap and keeps it in an off-screen buffer to be reused overtime. Once that process it done, scene compositing takes all the bitmaps and rearranges them as needed.

Presentation

Traditional Model

Blitting, or pixel blit, is the transfer of the back buffer to the screen. Only the dirty rectangles are transferred.

Enhanced Caching/GPU Model

The process is similar to the traditional model, but the GPU is much faster at moving pixels.

HTTP Dynamic Streaming, Peer-to-Peer Communication, Controls, YouTube

HTTP Dynamic Streaming

Adobe’s adaptive streaming delivery method, called HTTP Dynamic Streaming, is another option for live or on-demand video playback. This method is similar to Apple’s Adaptive Streaming and Microsoft’s Smooth Streaming.

Because HTTP Dynamic Streaming plays back video files in chunks, additional complex logic needs to be built in to put the segments together in the right order and play them back. Therefore, it is recommended that an Open Source Media Frameworkbased player be used for HTTP Dynamic Streaming.

You can build one from scratch using OSMF, or use a prebuilt solution such as Flash Media Playback or Strobe Media Playback. Video content needs to be prepared for HTTP Dynamic Streaming in a simple post-processing step that creates the segments along with a manifest file that describes the video and its segments. Adobe provides free command-line tools for preparing both on-demand and live streams.

More information on HTTP Dynamic Streaming is available online at http://www.adobe.com/products/httpdynamicstreaming/ and http://www.flashstreamworks.com/archive.php?post_id=1278132166.

 Peer-to-Peer Communication

Real Time Media Flow Protocol (RTMFP) is an Adobe proprietary protocol that enables peer-to-peer communication in Flash Player and the AIR runtime. It opens up possibilities for your applications using video.

Controls

The methods for controlling the stream are play, pause, resume, seek, and close. You cannot stop the stream because it is still downloading. You need to close it completely.

You can check the progress of the stream by checking its time property. Use that value and increment it to seek ahead in the video. You can also seek to a specific time, of course. With progressive download, you can only seek ahead to points in the video file that have been downloaded. With embedded video and RTMP, you can seek anywhere within the video file at any time.

To monitor these events, register for NetStatusEvent and its info.code object:

[code]function onNetEvent(event:NetStatusEvent):void {
switch (event.info.code) {
case “NetStream.Play.Start” :
break;
case “NetStream.Play.Stop”:
break;
case ” NetStream.Pause.Notify” :
break;
case ” NetStream.Unpause.Notify”:
break;
case “NetStream.Buffer.Full”:
break;
case “NetStream.Seek.Notify”:
break;
}
}
}[/code]

You cannot use the Sound object to control the audio part on the video. Use the Sound Transform property instead. Here we are setting the volume to silent:

[code]var transform:SoundTransform = new SoundTransform();
stream.soundTransform = new SoundTransform(0);[/code]

YouTube

It is almost impossible to talk about videos without mentioning YouTube. As you most certainly know, YouTube is a file-sharing video site that has changed the propagation of information via video, whether is it pop culture or current events.

YouTube videos can be played on Android devices if they are encoded with the H.264 video codec.

In both cases, you use a URLRequest:

[code]<uses-permission android:name=”android.permission.INTERNET” />
import flash.net.navigateToURL;
import flash.net.URLRequest;
var youTubePath:String = “http://www.youtube.com/watch?v=”;
var videoID:String = someID;
navigateToURL(new URLRequest(youTubePath + videoID);[/code]

To display the video at full screen, use a different path:

[code]var youTubePath:String = “http://www.youtube.com/watch/v/”;[/code]

 

 

A Brief History

The first multitouch system designed for human input was developed in 1982 by Nimish Mehta of the University of Toronto. Bell Labs, followed by other research labs, soon picked up on Mehta’s idea. Apple’s 2007 launch of the iPhone, which is still the point of reference today for multitouch experiences and gestures, popularized a new form of user interaction.

More recently, Microsoft launched Windows 7, Adobe added multitouch and gesture capability to Flash Player and AIR, and a range of smartphones, tablets, and laptops that include multitouch sensing capability have become available or are just entering the market. Common devices such as ATMs, DVD rental kiosks, and even maps at the local mall are increasingly equipped with touch screens.

For an in-depth chronology, please read Bill Buxton’s article at http://www.billbuxton.com/multitouchOverview.html.

Exploring the use of alternate input devices in computing is often referred to as physical computing. For the past 20 years, this research has advanced considerably, but only recently has it been reserved for exhibitions or isolated equipment installations. These advances are now starting to crop up in many consumer electronics devices, such as the Nintendo Wii, Microsoft Xbox, and Sony PlayStation Move.

A playful example of a human–computer interface is the Mud Tub. Created by Tom Gerhardt in an effort to close the gap between our bodies and the digital world, the Mud Tub uses mud to control a computer. For more information, go to http://tomgerhardt.com/mudtub/.

For our purposes, we will only use clean fingers and fairly predictable touch sensors.

 

Mobile Utility Applications

Several mobile utility applications are available for AIR developers.

Launchpad

As its name indicates, this Adobe Labs beta tool gives Flex developers a head start in creating AIR desktop and mobile applications. Launchpad is an AIR application that generates a complete Flex project that is ready to import to Flash Builder.

The process consists of four steps. The Settings option is for the application descriptor and permissions. The Configuration option is for setting up listeners of various events, such as the application moving to the background. The Samples option is for importing sample code for the APIs, such as Geolocation or Microphone. The Generate option creates a ZIP file at the location of your choice, ready to be imported. Sample icons and assets are automatically included based on your selection.

For more information on Launchpad, go to http://labs.adobe.com/technologies/airlaunchpad/.

Device Central CS5

Adobe Device Central is not available for AIR development. At the time of this writing, it is targeted at Flash Player development only. It provides useful tools such as accelerometer simulation that can otherwise only be tested on the device. ADL provides some, albeit more limited, functionality. It can simulate soft keys and device rotation.

Package Assistant Pro

Serge Jespers, from Adobe, wrote an AIR application that facilitates the packaging of AIR applications for Android. You can download the Mac OS X and Windows versions from http://www.webkitchen.be/package-assistant-pro/. Note that you need the .swf file, the application descriptor file, and the code-signing certificate.

The packager preferences store the location of the AIR SDK adt.jar (located in AIRsdk/lib/) and your code-signing certificate. The packaging process consists of a few easy steps, all accomplished via a user interface. Browse to select the required files, enter your certificate password, and choose a name for the APK file. You can choose to compile a device release or a debug version.

This tool is convenient if you use one of the tools we will discuss next, and is a nice GUI alternative to using the command-line tool.

De MonsterDebugger

De MonsterDebugger is an open source debugger for Flash, Flex, and AIR. Version 3.0 provides the ability to debug applications running on your device and send the output straight to an AIR desktop application. It lets you manipulate values, execute methods, and browse the display list. For more information, go to http://demonsterdebugger.com/.

 

Mobile Flash Player 10.1 Versus AIR 2.6 on Android

The Flash Player, version 10.1, first became available in Android 2.2 in June 2010. It runs within the device’s native browser. Developing applications for the mobile browser is beyond the scope of this book. However, understanding the similarities and differences between the two environments is important, especially if mobile development is new to you.

  • Both types of applications are cross-platform rich media applications. They both use the ActionScript language, but AIR for Android only supports ActionScript 3.
  • Both benefit from recent performance and optimization improvements, such as hardware acceleration for graphics and video, bitmap manipulation, battery and CPU optimization, better memory utilization, and scripting optimization.
  • Applications running in the Flash Player browser plug-in are typically located on a website and do not require installation. They rely on the Flash plug-in. AIR applications require packaging, certificates, and installation on the device. They rely on the AIR runtime.
  • Flash Player is subject to the browser sandbox and its restricted environment. The browser security is high because applications may come from many unknown websites. Persistent data is stored in the Flash Local Shared Object, but there is no access to the filesystem. AIR applications function as native applications and have access to local storage and system files. Persistent data may be stored in a local database. The user is informed upon installation of what data the application has access to via a list of permissions.
  • AIR has additional functionality unique to mobile devices, such as geolocation, accelerometer capability, and access to the camera.

 

Web Sockets

Real-time interaction has been something web developers have been trying to do for many years, but most of the implementations have involved using JavaScript to periodically hit the remote server to check for changes. HTTP is a stateless protocol, so a web browser makes a connection to a server, gets a response, and disconnects. Doing any kind of real-time work over a stateless protocol can be quite rough. The HTML5 specification introduced Web Sockets, which let the browser make a stateful connection to a remote server.7 We can use Web Sockets to build all kinds of great applications. One of the best ways to get a feel for how Web Sockets work is to write a chat client, which, coincidentally, AwesomeCo wants for its support site.

AwesomeCo wants to create a simple web-based chat interface on its support site that will let members of the support staff communicate internally, because the support staff is located in different cities. We’ll use Web Sockets to implement the web interface for the chat server.
Users can connect and send a message to the server. Every connected user will see the message. Our visitors can assign themselves a nickname by sending a message such as “/nick brian,” mimicking the IRC chat protocol. We won’t be writing the actual server for this, because that has thankfully already been written by another developer.

The Chat Interface

We’re looking to build a very simple chat interface that looks like Figure 10.2, on the next page, with a form to change the user’s nickname, a large area where the messages will appear, and, finally, a form to post a message to the chat.

In a new HTML5 page, we’ll add the markup for the chat interface, which consists of two forms and a div that will contain the chat messages.

<div id=”chat_wrapper”>
<h2>AwesomeCo Help!</h2>
<form id=”nick_form” action=”#” method=”post” accept-charset=”utf-8″>
<p>
<label>Ni ckname
<input id=”nickname” type=”text” value=”Guestl)ser”/>
</label>
<input type=”submit” value=”Change”>
</p>
</form>
<div id=”chat”>connecting….</div>
<form id=”chat_form” action=”#” method=”post” accept-charset=”utf-8″>
<p>
<label>Message
<input id=”message” type=”text” />
</label>
<input type=”submit” value=”Send”>
</p>
</form>
< / d i v >

We’ll also need to add links to a style sheet and a JavaScript file that will contain our code to communicate with our Web Sockets server.

<script src=’chat.js’ type=’text/javascript’></script>
<link rel=”stylesheet” href=”style.css” media=”screen”>

Our style sheet contains these style definitions:

Une l #chat_wrapper{
w i d t h : 320px;
h e i g h t : 440px;
b a c k g r o u n d – c o l o r : #ddd;
5 padding: lOpx;
}
#chat_wrapper h2{
m a r g i n : 0;
}
# c h a t {
w i d t h : 300px;
h e i g h t : 300px;
o v e r f l o w : auto;
15 background-color: #fff;
p a d d i n g : lOpx;
}

On line 14, we set the overflow property on the chat message area so that its height is fixed and any text that doesn’t fit should be hidden, viewable with scrollbars.

With our interface in place, we can get to work on the JavaScript that will make it talk with our chat server.

Talking to the Server

No matter what Web Sockets server we’re working with, we’ll use the same pattern over and over. We’ll make a connection to the server, and then we’ll listen for events from the server and respond appropriately.

EventIn our chat.js file, we first need to connect to our Web Sockets server, like this:

v a r webSocket = new W e b S o c k e t ( ‘ w s : / / l o c a l h o s t : 9 3 9 4 / ‘ ) ;

When we connect to the server, we should let the user know. We define the onopen() method like this:

webSocket.onopen = f u n c t i o n ( e v e n t ) {
$ ( ‘ # c h a t ‘ ) . a p p e n d ( ‘ < b r > C o n n e c t e d to the s e r v e r ‘ ) ;
};

When the browser opens the connection to the server, we put a message in the chat window. Next, we need to display the messages sent to the chat server. We do that by defining the onmessage() method like this:

webSocket.onmessage = f u n c t i o n ( e v e n t ) {
$ ( ‘ # c h a t ‘ ) . a p p e n d ( ” < b r > ” + e v e n t . d a t a ) ;
$ ( ‘ # c h a t ‘ ) . a n i m a t e ( { s c r o l l T o p : $ ( ‘ # c h a t ‘ ) . h e i g h t O } ) ;

The message comes back to us via the event object’s data property. We just add it to our chat window. We’ll prepend a break so each response falls on its own line, but you could mark this up any way you wanted.

Next we’ll handle disconnections. The onclose() method fires whenever the connection is closed.

w e b S o c k e t . o n c l o s e = f u n c t i o n ( e v e n t ) {
$ ( ” # c h a t ” ) . a p p e n d ( ‘ < b r > C o n n e c t i o n c l o s e d ‘ ) ;
};
};

Now we just need to hook up the text area for the chat form so we can send our messages to the chat server.

$ ( f u n c t i o n ( ) {
$( “form#chat_form”) . submi t ( f u n c t i o n ( e ) {
e . p r e v e n t D e f a u l t ( ) ;
v a r t e x t f i e l d = $(“#message”) ;
w e b S o c k e t . s e n d ( t e x t f i e l d . v a l ( ) ) ;
t e x t f i e l d . v a l ( ” ” ) ;
} ) ;

We hook into the form submit event, grab the value of the form field, and send it to the chat server using the send() method.

We implement the nickname-changing feature the same way, except we prefix the message we’re sending with “/nick.” The chat server will see that and change the user’s name.

$( “form#nick_form”) . submi t ( f u n c t i o n ( e ) {
e . p r e v e n t D e f a u l t ( ) ;
var t e x t f i e l d = $(“#m’c/cname”) ;
webSocket.send(“/m’c/c ” + t e x t f i e l d . v a l ( ) ) ;
} ) ;

That’s all there is to it. Safari 5 and Chrome 5 users can immediately participate in real-time chats using this client. Of course, we still need to support browsers without native Web  Sockets support. We’ll do that using Flash.

Falling Back

Browsers may not all have support for making socket connections, but Adobe Flash has had it for quite some time. We can use Flash to act as our socket communication layer, and thanks to the web-socket-js9 library, implementing a Flash fallback is a piece of cake.

We can download a copy of the plug-in10 and place it within our project. We then need to include the three JavaScript files on our page:

<script type=”text/javascript” src=”websocket_js/swfobject.js”x/script>
<script type=”text/javascript” src=”websocket_js/FABridge.js”x/script>
<script type=”text/javascript” src=”websocket_js/web_socket.js”x/script>
<script src=’chat.js’ type=’text/javascript’></script>
<link rel=”stylesheet” href=”style.css” media=”screen”>
</head>
<body>
<div id=”chat_wrapper”>
<h2>AwesomeCo Help!</h2>
<form id=”nick_form” action=”#” method=”post” accept-charset=”utf-8″>
<p>
<label>Ni ckname
<input id=”nickname” type=”text” value=”Guestl)ser”/>
</label>

<input type=”submit” value=”Change”>
</p>
</form>
<div id=”chat”>connecting….</div>
<form id=”chat_form” action=”#” method=”post” accept-charset=”utf-8″>
<p>
<label>Message
<input id=”message” type=”text” />
</label>
<input type=”submit” value=”Send”>
</p>
</form>
< / d i v >
</body>
</html>

The only change we need to make to our chat.js file Is to set a variable that specifies the location of the WebSocketMain file.

WEB_SOCKET_SWF_|_OCATION “websocket_js/WebSocketMain. swf” ;

With that In place, our chat application will work on all major browsers, provided that the server hosting your chat server also serves a Flash Socket Policy file.

Flash Socket Policy What?

For security purposes, Flash Player will only communicate via sockets with servers that allow connections to Flash Player. Flash Player attempts to retrieve a Flash Socket Policy file first on port 843 and then on the same port your server uses. It will expect the server to return a
response like this:

<cross-domain-policy>
<allow-access-from domain=”*” to-ports=”*” />
</cross-domain-policy>

This Is a very generic policy file that allows everyone to connect to this service. You’d want to specify the policy to be more restrictive if you were working with more sensitive data. Just remember that you have to serve this file from the same server that’s serving your Web Sockets server, on either the same port or the port 843.

The example code for this section contains a simple Flash Socket Policy server written in Ruby that you can use for testing. See Section 25,

Servers, for more on how to set that up on your own environment for testing.

Chat servers are just the beginning. With Web Sockets, we now have a robust and simple way to push data to our visitors’ browsers.

Servers

The book’s source code distribution contains a version of the Web Sockets server we’re targeting. It’s written in Ruby, so you’ll need a Ruby interpreter. For instructions on getting Ruby working on your system, see the file RUBY_README.txt within the book’s source code files.

You can start it up by navigating to its containing folder and typing this:

ruby s e r v e r . r b

In addition to the chat server, there are two other servers you may want to use while testing the examples in this chapter. The first server, client.rb, serves the chat interface and JavaScript files. The other server, flashpolicyserver, serves a Flash Policy file that our Flash-based Web Sockets fallback code will need to contact in order to connect to the
actual chat server. Flash Player uses these policy files to determine whether it is allowed to talk to a remote domain.

If you’re running on a Mac or a Linux-based operating system, you can start all these servers at once with this:

rake s t a r t

from the html5_websockets folder.

 

 

Embedding Audio and Video

Audio and video are such an Important part of the modern Internet.
Podcasts, audio previews, and even how-to videos are everywhere, and
until now, they’ve only been truly usable using browser plug-ins.
HTML5 introduces new methods to embed audio and video files into
a page. In this chapter, we’ll explore a few methods we can use to not
only embed the audio and video content but also to ensure that it is
available to people using older browsers.

We’ll discuss the following two elements in this chapter

<audio> [<audio src=”drums.mp3″x/audio>]
Play audio natively in the browser. [C4, F3.6, IE9, S3.2, OlO.l,
IOS3, A2]
<video> [<video src=”tutorial.m4v”x/video>]
Play video natively in the browser. [C4, F3.6, IE9, S3.2, 010.5,
IOS3, A21

Before we do that, we need to talk about the history of audio and video
on the Web. After all, to understand where we’re going, we have to
understand where we’ve been.

A Bit of History

People have been trying to use audio and video on web pages for a long
time. It started with people embedding MIDI files on their home pages
and using the embed tag to reference the file, like this:

<embed src=”awesome.mp3″ a u t o s t a r t = ” t r u e ”
1oop=”true” control 1er=”true”></embed>

The embed tag never became a standard, so people started using the
object tag instead, which is an accepted W3C standard. To support
older browsers that don’t understand the object tag, you often see an
embed tag nested within the object tag, like this:

<object>
<param name=”src” va”lue=”simpsons.mp3″>
<param name=”autoplay” value=”false”>
<param name=”contro”ner” value=”true”>
<embed src=”awesome.mp3″ a u t o s t a r t = ” f a 7 s e ”
1 oop=”false” control 1 er=”true”></embed>
</object>

Not every browser could stream the content this way, though, and not
every server was configured properly to serve it correctly. Things got
even more complicated when video on the Web became more popular.
We went through lots of iterations of audio and video content on
the Web, from RealPlayer to Windows Media to QuickTime. Every company
had a video strategy, and it seemed like every site used a different
method and format for encoding their video on the Web.

Macromedia (now Adobe) realized early on that its Flash Player could
be the perfect vehicle for delivering audio and video content across platforms.
Flash is available on close to 97 percent of web browsers already.
Once content producers discovered they could encode once and play
anywhere, thousands of sites turned to Flash streaming for both audio
and video.

Then Apple came along in 2007 with the iPhone and iPod touch and
decided that Apple would not support Flash on those devices. Content
providers responded by making video streams available that would play
right in the Mobile Safari browser. These videos, using the H.264 codec,
were also playable via the normal Flash Player, which allowed content
providers to still encode once while targeting multiple platforms.

The creators of the HTML5 specification believe that the browser should
support audio and video natively rather than relying on a plug-in thatrequires a lot of boilerplate HTML. This is where HTML5 audio and
video start to make more sense: by treating audio and video as firstclass
citizens in terms of web content.

Containers and Codecs

When we talk about video on the Web, we talk in terms of containers
and codecs. You might think of a video you get off your digital camera as
an AVI or an MPEG file, but that’s actually an oversimplification. Containers
are like an envelope that contains audio streams, video streams,
and sometimes additional metadata such as subtitles. These audio and
video streams need to be encoded, and that’s where codecs come in.
Video and audio can be encoded in hundreds of different ways, but
when it comes to HTML5 video, only a few matter.

Video Codecs

When you watch a video, your video player has to decode it. Unfortunately,
the player you’re using might not be able to decode the video you
want to watch. Some players use software to decode video, which can
be slower or more CPU intensive. Other players use hardware decoders
and are thus limited to what they can play. Right now, there are three
video formats that you need to know about if you want to start using
the HTML5 video tag in your work today: H.264, Theora, and VP8.

Codec and Supported Browsers

H.264
[IE9, S4, C3, IOS]
Theora
[F3.5, C4, 010]
VP8
[IE9 (if codec installed), F4, C5, 010.7]

H.264

H.264 is a high-quality codec that was standardized in 2003 and created
by the MPEG group. To support low-end devices such as mobile
phones, while at the same time handling video for high-definition devices,
the H.264 specification is split into various profiles. These profiles
share a set of common features, but higher-end profiles offer additional
options that improve quality. For example, the iPhone and Flash Player
can both play videos encoded with H.264, but the iPhone only supports
the lower-quality “baseline” profile, while Flash Player supports
higher-quality streams. It’s possible to encode a video one time and
embed multiple profiles so that it looks nice on various platforms.
H.264 is a de facto standard because of support from Microsoft and
Apple, which are licensees. On top of that, Google’s YouTube converted
its videos to the H.264 codec so it could play on the iPhone, and Adobe’s
Flash Player supports it as well. However, it’s not an open technology. It
is patented, and its use is subject to licensing terms. Content producers
must pay a royalty to encode videos using H.264, but these royalties do
not apply to content that is made freely available to end users.2
Proponents of free software are concerned that eventually, the rights
holders may begin demanding high royalties from content producers.
That concern has led to the creation and promotion of alternative
codecs.

Theora

Theora is a royalty-free codec developed by the Xiph.Org Foundation.
Although content producers can create videos of similar quality with
Theora, device manufacturers have been slow to adopt it. Firefox,Chrome, and Opera can play videos encoded with Theora on any platform
without additional software, but Internet Explorer, Safari, and
the iOS devices will not. Apple and Microsoft are wary of “submarine
patents,” a term used to describe patents in which the patent application
purposely delays the publication and issuance of the patent in
order to lay low while others implement the technology. When the time
is right, the patent applicant “emerges” and begins demanding royalties
on an unsuspecting market.

VP8

Google’s VP8 is a completely open, royalty-free codec with quality similar
to H.264. It is supported by Mozilla, Google Chrome, and Opera,
and Microsoft’s Internet Explorer 9 promises to support VP8 as long as
the user has installed a codec already. It’s also supported in Adobe’s
Flash Player, making it an interesting alternative. It is not supported in
Safari or the iOS devices, which means that although this codec is free
to use, content producers wanting to deliver video content to iPhones
or iPads still need to use the H.264 codec.

Audio Codecs

As if competing standards for video weren’t complicating matters
enough, we also have to be concerned with competing standards for
audio.

Codec and Supported Browsers

AAC
[S4, C3, IOS]
MP3
[IE9, S4, C3, IOS]
Vorbis (OGG)
[F3, C4, 010]

Advanced Audio Coding (AAC)

This is the audio format that Apple uses in its iTunes Store. It is designed
to have better audio quality than MP3s for around the same file
size, and it also offers multiple audio profiles similar to H.264. Also,
like H.264, it’s not a free codec and does have associated licensing fees.
All Apple products play AAC files. So does Adobe’s Flash Player and the
open source VLC player.

Vorbis (OGG)

This open source royalty-free format is supported by Firefox, Opera,
and Chrome. You’ll find it used with the Theora and VP8 video codecs
as well. Vorbis files have very good audio quality but are not widely
supported by hardware music players.

MP3s

The MP3 format, although extremely common and popular, isn’t supported
in Firefox and Opera because it’s also patent-encumbered. It is
supported in Safari and Google Chrome.
Video codecs and audio codecs need to be packaged together for distribution
and playback. Let’s talk about video containers.

Containers and Codecs, Working Together

A container is a metadata file that identifies and interleaves audio or
video files. A container doesn’t actually contain any information about
how the information it contains is encoded. Essentially, a container
“wraps” audio and video streams. Containers can often hold any combination
of encoded media, but we’ll see these combinations when it
comes to working with video on the Web

• The OGG container, with Theora video and Vorbis audio, which
will work in Firefox, Chrome, and Opera.
• The MP4 container, with H.264 video and AAC audio, which will
work in Safari and Chrome. It will also play through Adobe Flash
Player and on iPhones, iPods, and iPads.
• The WebM container, using VP8 video and Vorbis audio, which will
work in Firefox, Chrome, Opera, and Adobe Flash Player.

Given that Google and Mozilla are moving ahead with VP8 and WebM,
we can eliminate Theora from the mix eventually, but from the looks of
things, we’re still looking at encoding our videos twice—once for Apple
users (who have a small desktop share but a large mobile device share
in the United States) and then again for Firefox and Opera users, since
both of those browsers refuse to play H.264.3
That’s a lot to take in, but now that you understand the history and the
limitations, let’s dig in to some actual implementation, starting with
audio.