Windows Phone Performance Optimization, Fast! to Faster!

0
213

Optimizing your game’s performance

Games belong to a class of real-time software. This means that they are not only expected to produce the correct result, but they must also complete this within a fixed time window. In general, game developers shoot for a minimum of displaying 30 frames per second in order to produce smooth, glitch-free animations; and most prefer 60 frames per second. This means that all of the game calculations getting the player input, implementing enemy AI, moving objects, collision detection and handling, and drawing each frame must be completed within 16.7 milliseconds! When you consider that most modern video games have hundreds, or even thousands, of objects that have to be updated and drawn within that time period, it is no wonder that programmers feel they have to optimize every line of code.

However, many XNA programmers are not familiar with the tools and methods for determining when, where, how, or even if, they should optimize their code. The point of this recipe is to help you answer these questions.

Getting ready

The following section will help you to optimize your game’s performances

Design versus implementation

A common response by those who question, or even outright disagree, with the idea that optimizing the code early is a bad idea, is to point out that it is far easier to change software early in its lifecycle than after it has been written. That is, of course, very true. That is why it is important to understand the difference between the design optimization and implementation optimization.

While designing a game (or any software), you must take into account the size and complexity of your game, and select the correct data structures and algorithms that can support it. A simple 2D shooter or a platformer with no more than a hundred objects interacting at any given time can probably get away with a brute force approach for handling movements and collisions. Maintaining a simple list or an array of objects and iterating through it each frame will most likely work fine, and will be very simple to implement and debug.

However, a more complex game world, with perhaps thousands of active objects, will need an efficient method of partitioning the game space to minimize the number of object interaction tests in each frame. Similarly, games requiring detailed enemy AI will need to rely on algorithms that can produce “intelligent” actions as quickly as possible.

There are many resources available that discuss game programming algorithms. Some of them are as follows:

  • The use of quadtrees and octrees for partitioning the game world to minimize collision detection tests
  • The minimax algorithm with alpha-beta pruning for efficiently finding the “best” move in two player strategy games (please check the wiki link for more information at http://en.wikipedia.org/wiki/Alpha-beta_pruning)
  • The A* algorithm for efficient path finding (for more detail about the A* algorithm, please check the wiki link at http://en.wikipedia.org/wiki/A*_search_ algorithm)

The selection of appropriate data structures and algorithms during the design phase has a far greater impact on the eventual performance of your game than any implementation optimization you will make, as your algorithms determine the maximum number of operations your game will have to perform during each frame.

In order to demonstrate this point, imagine that for your first game you write a simple 2D shooter that relies on a brute force approach to collision detection. In every frame, you simply test every active object against every other active object to see if they intersect. As you decide to have only a limited number of enemies active at a time, it works well and easily runs at 60 frames per second.

With that experience under your belt, you now want to write a second game that is far more ambitious. This time you decide to write a Zelda-like adventure game with a large scrolling game board and hundreds of objects moving around it simultaneously. (The Legend of Zelda, an NDS game from Nintendo. You can find out more about this game at: http:// en.wikipedia.org/wiki/The_Legend_of_Zelda.) Using your existing code as a starting point, you get well into the game’s implementation before you discover that the brute force approach that worked very well in your simple game does not work so well in this new game. In fact, you may be measuring screen draws in seconds per frame instead of frames per second!

The reason is that, comparing every object against every other object is what is known as an O(n2) algorithm (for more information on estimating the algorithm time complexity, please see the classic book Introduction to Algorithm second edition, http://www.amazon. com/Introduction-Algorithms-Thomas-H-Cormen/dp/0262033844). That is, the number of operations that have to be performed is related to the square of the number of objects on which you are operating. If you have 10 objects in your game, you only have to perform a hundred tests to see if there are any collisions. If you have a hundred objects, you have to perform ten thousand tests, which may still be possible on a modern PC if each test can be done quickly enough. However, if you have five hundred just five times as many as the last example you will have to perform 250,000 collision tests. Even if each test took only 67 microseconds, you would still be using the entire 16.7 milliseconds frame time (usually at 60 frames per second) just for collision detection. The point is that it does not matter how efficiently you implement that algorithm in a code, its performance will still devolve exponentially with the number of objects in your game, and will therefore be the single greatest limiting factor to the size of your game.

Game runs slow?

Ok, so your game is playable with most of the features you want to be implemented. However, when you test the application, you find that the animation runs like a robot, the character should run, but it is crawling. What is wrong there? You might say, it is about the compiler features, such as the foreach keyword, or ask whether you need to pass the matrices by reference, not by values.

You have two choices: stop there and take a step back or fix it, and start going into each method trying to figure out how to find your way around the problem on a case-by-case basis. Maybe you will even succeed and get the game back into the runnable state that you had it in hours earlier. Maybe you are even lucky enough to have not introduced yet more bugs into the process. However, in all likelihood, you have not fixed the problem and now you have code that does not run any better than when you started, but is harder to understand, harder to debug, and has kludges in it to get around problems that you introduced trying to fix the wrong problem. Your time will be much better spent finding out where your problems are before you try to fix them.

Measuring the running time

A prototype is just a simplified version of software (in this case, your game), that focuses on one particular aspect of it. Prototypes are often used as proofs of concept to show that the software will be able to work as expected. As prototypes don’t have to deal with all of the details that the final software will, they can be written quickly so that, if necessary, different approaches can be evaluated.

Prototypes are frequently used to evaluate user interfaces, so that customers can provide early feedback. This can be useful for game programming too, as if you can implement a working display and control scheme, you may be able to find out what works and doesn’t work before you get too far along in the actual implementation of the game. However, the use of prototypes that we are concerned with here is to determine whether an algorithm is fast enough for the game we want to write. To do that, we will want to benchmark it. Benchmarking is just the process of timing how long an algorithm takes to run.

How to do it…

Fortunately, the .NET framework makes benchmarking very easy by providing the System. Debug.Stopwatch class. The Stopwatch class provides a Start and a Stop method. It keeps track of the total number of clock ticks that occur between calls to Start and Stop. Even better, like a real stopwatch, it keeps a running count of ticks between successive calls to Start and Stop. You can find out how much time has passed by querying its ElapsedTicks or ElapsedMilliseconds properties. A Reset() method lets us reset Stopwatch back to zero.

Now, follow the steps to take advantage of the Stopwatch class:

  1. As a showcase, the following code gives you a general picture on how to use the Stopwatch class for time measuring:
    [code]
    public abstract class Sprite
    {
    public Vector2 Position { get; set; }
    public Color Color { get; set; }
    // Sprite’s collision rectangle in screen coordinates.
    public BoundingRectangle BoundingBox { get; }
    public Sprite(
    string imageName,
    BoundingRectangle boundingBox);
    public virtual void Initialize();
    public virtual void LoadGraphicsContent(
    ContentManager content);
    public virtual void Update(GameTime time);
    public virtual void Draw(SpriteBatch spriteBatch);
    // Tests for collision with another Sprite. If a
    // collision occurs, it calls the Collide method for
    // both Sprites. Returns true if images collide.
    public bool TestCollision(Sprite item);
    // Called when the TestCollision method detects a
    // collision with another Sprite.
    //
    protected virtual void Collide(
    BoundingRectangle overlap,
    Sprite item);
    }
    [/code]
  2. As it is an abstract, it is intended to be used as a parent to other Sprite classes that will implement its behavior, so we will create our own TestSprite class. TestSprite will generate a random starting position, directional movement vector, and speed (in pixels per second), as shown here:
    [code]
    public override void Initialize()
    {
    // Set starting position.
    Position =
    new Vector2(
    random.Next(screenWidth),
    random.Next(screenHeight));
    // Create a random movement vector.
    direction.X = (float)random.NextDouble() * 2 – 1;
    direction.Y = (float)random.NextDouble() * 2 – 1;
    direction.Normalize();
    // Determine random speed in pixels per second.
    speed = (float)random.NextDouble() * 300 + 150;
    }
    [/code]
  3. In each frame, the following code will update its position based on its movement direction, speed, and the amount of time that has elapsed. It will also test to see if it has hit the edge of the screen, and deflect away from it:
    [code]
    public override void Update(GameTime time)
    {
    // Reset color back to white.
    Color = Microsoft.Xna.Framework.Graphics.Color.White;
    // Calculate movement vector.
    Vector2 move =
    (float)time.ElapsedGameTime.TotalSeconds *
    speed * direction;
    // Determine new position.
    UpdatePosition(move);
    }
    private void UpdatePosition(Vector2 move)
    {
    Position += move;
    if ((BoundingBox.Left < 0) ||
    (BoundingBox.Right > screenWidth))
    {
    direction.X = -direction.X;
    Position -= new Vector2(move.X, 0);
    }
    if ((BoundingBox.Top < 0) ||
    (BoundingBox.Bottom > screenHeight))
    {
    direction.Y = -direction.Y;
    Position -= new Vector2(0, move.Y);
    }
    }
    [/code]
  4. We will talk more about collision testing next. For now, we will see what it takes to time just moving our TestSprite around the screen. Inside our game, we will create a TestSprite object and call its Initialize() and LoadGraphicsContent() methods at appropriate places. And we will create SpriteBatch for our game and pass it to Draw(). Now all we need is to use Stopwatch to time it in the Update() method. In order to do this, we will create a couple of helper methods that start and stop Stopwatch, and print the amount of time it takes for each update:
    [code]
    private Stopwatch updateTimer;
    private int updates = 0;
    private int framesPerSecond;
    private void StartTimer()
    {
    updateTimer.Start();
    }
    private void StopTimer()
    {
    updateTimer.Stop();
    updates++;
    // Show the results every five seconds.
    if (updates == 5 * framesPerSecond)
    {
    Debug.WriteLine(
    updates + ” updates took ” +
    updateTimer.ElapsedTicks + ” ticks (” +
    updateTimer.ElapsedMilliseconds +
    ” milliseconds).”);
    int msPerUpdate =
    (int)updateTimer.ElapsedMilliseconds / updates;
    Debug.WriteLine(
    “Each update took ” +
    msPerUpdate + ” milliseconds.”);
    // Reset stopwatch.
    updates = 0;
    updateTimer.Reset();
    }
    }
    [/code]
  5. By putting calls to StartTimer and StopTimer around the calls to our sprite’s Update() method, we will get a report of the average time each call takes:
    [code]
    300 updates took 34931 ticks (9 milliseconds).
    Each update took 0.03 milliseconds.
    300 updates took 24445 ticks (6 milliseconds).
    Each update took 0.02 milliseconds.
    300 updates took 23541 ticks (6 milliseconds).
    Each update took 0.02 milliseconds.
    300 updates took 23583 ticks (6 milliseconds).
    Each update took 0.02 milliseconds.
    300 updates took 23963 ticks (6 milliseconds).
    Each update took 0.02 milliseconds.
    [/code]

How it works…

In step 1, the Initialize(), LoadGraphicsContent(), Update(), and Draw() methods are the standard methods for Windows Phone 7 XNA Game Programming. Additionally, it provides properties for getting and setting the position and color. For collision detection, the Collide() method called by TestCollision() tests for collision with another Sprite BoundingBox values intersect.

In step 3, an actual game may want to determine the actual point of intersection so it could deflect away from that point more realistically. If you need that level of realism, you would probably want to go ahead and implement your strategy here, so you could time it. However, all we are trying to prototype here is a basic update time, so that this version is fine for our needs.

Note that the Update() method does not test for collisions. We don’t want individual sprite testing for collisions because to do so, our Sprite class would have to know about other game objects and we would be severely limiting our design options for collision testing. Any change to our collision-testing algorithm could, and likely would, affect our Sprite class. We want to avoid anything that limits future design changes, so we will give our Sprite class the ability to test for collisions, but require another part of our code to determine what objects should be tested.

In step 6, each call took on average of 20 microseconds (on my development laptop your results will vary). However, notice that the very first set of updates took almost one and a half times as long to run as the others. That is because the first time these methods are called, the JIT compiler compiles the code and our Stopwatch is timing that as well. It is also possible, as this is a fairly small amount of code that is being called repeatedly, that some or all of it may be fitting in the cache, which will increase the speed of later calls.

These show some of the problems with benchmarking code. Another problem is that we are adding some time by using Stopwatch itself. Thus, benchmark times for prototype code can be used as a general guide, but cannot be relied upon for exact values. In fact, exact values of the time it takes for functions to run are very hard to determine. Although intended only to describe quantum phenomena, a variation of the Heisenberg Uncertainty Principle is at play here: the act of measuring something affects the thing being measured.

There’s more…

Now let’s expand our prototype to help us determine whether we can get away with a brute force approach to collision detection.

First, let’s look at the collision handling code that I have already placed in the Collide method. Remember that this gets called, for both sprites, whenever the TestCollision() method determines a collision between two sprites. All it does is set the Sprite’s color to red:

[code]
protected override void Collide(
BoundingRectangle overlap,
Sprite item)
{
// Turn the sprite red to indicate collision.
Color = Color.Red;
}
[/code]

Let’s give this a test by replacing our single TestSprite with an array of TestSprites. Every place we referenced TestSprite in the original code, we now have to loop through the array to handle all of our TestSprites. In order to make this a little easier to manage, we will refactor our original sprite.Update() call in the Update() method into a new UpdateSprites() method that updates every sprite. We will add a new HandleCollisions() method to our game to test for collisions. Finally, we will change the Update() method, so that it only calls StartTimer and StopTimer around the call to HandleCollisions(). The relevant sections look like the following code:
[code]
private TestSprite[] sprites = new TestSprite[10];
protected override void Update(GameTime gameTime)
{
if (Keyboard.GetState().IsKeyDown(Keys.Escape))
{
this.Exit();
}
UpdateSprites(gameTime);
StartTimer();
HandleCollisions();
StopTimer();
base.Update(gameTime);
}
private void UpdateSprites(GameTime gameTime)
{
foreach (Sprite sprite in sprites)
{
sprite.Update(gameTime);
}
}
private void HandleCollisions()
{
// This is brute force approach
for (int i = 0; i < sprites.Length; i++)
{
for (int j = i + 1; j < sprites.Length; j++)
{
sprites[i].TestCollision(sprites[j]);
}
}
}
[/code]

Looking at that, you may wonder why I am not using foreach for the HandleCollisions call. It is simply because with foreach, we have no way of knowing what sprites we already tested. This algorithm tests every sprite against every other sprite exactly once.

What are the results? On my machine, with 10 sprites, I get the following:
[code]
300 updates took 48827 ticks (13 milliseconds).
Each update took 0.04333333 milliseconds.
300 updates took 42466 ticks (11 milliseconds).
Each update took 0.03666667 milliseconds.
300 updates took 42371 ticks (11 milliseconds).
Each update took 0.03666667 milliseconds.
300 updates took 43086 ticks (12 milliseconds).
Each update took 0.04 milliseconds.
300 updates took 43449 ticks (12 milliseconds).
Each update took 0.04 milliseconds.
[/code]

Wow! Handling collisions for 10 sprites takes only twice as long as it did just to move one sprite. How could that be? It is partly due to the overhead of using the Stopwatch class and making method calls, and partly due to the fact that we are measuring very fast operations. Obviously, the closer you get to the resolution of the underlying timer, the more error you get in trying to time things.

Before we go on, notice also that the impact of the JIT compiler during our first set of updates is significantly less. This shows how effective the JIT compilation is and why we don’t need to worry about it affecting the performance of our game. We may take a performance hit the first time a section of code is running, but it is relatively miniscule to our overall performance.

Now let’s see what happens when we increase the number of sprites to 100:

[code]
300 updates took 2079460 ticks (580 milliseconds).
Each update took 1.933333 milliseconds.
300 updates took 2156954 ticks (602 milliseconds).
Each update took 2.006667 milliseconds.
300 updates took 2138909 ticks (597 milliseconds).
Each update took 1.99 milliseconds.
300 updates took 2150696 ticks (600 milliseconds).
Each update took 2 milliseconds.
300 updates took 2169919 ticks (606 milliseconds).
Each update took 2.02 milliseconds.
[/code]

Whether you should be impressed or dismayed depends on how you want to use this collision-handling algorithm. On one hand, averaging 2 milliseconds per frame is still a miniscule part of our 16.7 millisecond frame timing. If you are not planning to have more than a hundred sprites or so, this algorithm will suit your needs perfectly. However, looking at the relative time difference per sprite gives a completely different perspective. It takes us 50 times as long to handle 10 times the number of sprites.

How about when the number is increased to 500? I urge you to run this code, so that you can see the results for yourself!

[code]
300 updates took 28266113 ticks (7896 milliseconds).
Each update took 26.32 milliseconds.
300 updates took 28179606 ticks (7872 milliseconds).
Each update took 26.24 milliseconds.
300 updates took 28291296 ticks (7903 milliseconds).
Each update took 26.34333 milliseconds.
300 updates took 28199114 ticks (7877 milliseconds).
Each update took 26.25667 milliseconds.
300 updates took 28182787 ticks (7873 milliseconds).
Each update took 26.24333 milliseconds.
[/code]

At this time there is no way to hide the dismay. The movement is clearly getting far less than our desired 60 frames per second! In fact, just the HandleCollisions() call alone is taking almost twice our allotted 16.7 milliseconds per frame. Multiplying the number of objects by 5 increased our time by 13! The times are not increasing exactly in quadric, due to overhead, but the rate of increase is clear.

Does this mean we should never consider this algorithm? Hopefully, at this point the answer is obvious. Many games can easily get away with only having an order of a hundred or so objects active at a time, which we have clearly shown can be handled easily. The fact that the algorithm is trivial to implement and maintain makes it a no-brainer for a large number of games.

On the other hand, if you know you will need to have hundreds of objects, you will need another solution. You have two options: optimize this algorithm, or find a new one. Anyone who is experienced with code optimization will see several obvious ways to make both the algorithm and its implementation more efficient.

For starters, most games don’t actually need to test every object against every other object. Taking the Space Invasion game as an example, I don’t need to test invaders for collision with other invaders. In fact, it is almost crazy to do so.

Another obvious optimization is that the Sprite class’s BoundingBox property is adding the sprite’s current screen position to its internal BoundingRectangle every time TestCollision is called, this despite the fact that the position changes only once or twice per frame. TestCollision, on the other hand, is called once for every other sprite in the game.

In addition, the Sprite’s TestCollision code is computing the actual intersection rectangle even though we are not using it here. We could easily save some time by not computing it. However, we give ourselves more flexibility by going ahead and doing it. Remember that this is supposed to be a generic Sprite class that can be used for many games.

These suggestions don’t even get into implementation optimizations, such as always passing our BoundingBoxes by reference instead of value; and providing direct access to member variables instead of accessing them through properties. These are exactly the types of optimizations suggested by many efficiency proponents in the XNA forums. However, these also make the code less readable, harder to debug, and harder to maintain.

As Space Invasion never has more than around 60 objects on the screen at a time, the unoptimized brute force approach works just fine. In addition, that is undoubtedly true for many other games as well. However, what if your game does need more than 100 collidable objects? Should you not make those optimizations so you can handle them?

The answer is… maybe. By making some of these optimizations, we can get this same brute force algorithm to handle 500 objects at a far more reasonable 6.4 milliseconds per frame.

[code]
300 updates took 6682522 ticks (1866 milliseconds).
Each update took 6.22 milliseconds.
300 updates took 7038462 ticks (1966 milliseconds).
Each update took 6.553333 milliseconds.
300 updates took 7023610 ticks (1962 milliseconds).
Each update took 6.54 milliseconds.
300 updates took 6718281 ticks (1876 milliseconds).
Each update took 6.253334 milliseconds.
300 updates took 7136208 ticks (1993 milliseconds).
Each update took 6.643333 milliseconds.
[/code]

That is an impressive improvement and shows how significantly performance can be optimized through these techniques. However, the disadvantages mentioned earlier less maintainable and less flexible code should not be ignored. In addition, even if you do these sorts of implementation optimizations, keep in mind that this algorithm will still degrade exponentially as you add more objects. You may be able to move up from 100 to 500 objects, but it won’t get you to 1000. At some point, you need to recognize that you need a different algorithm to efficiently handle more objects, such as the one that partitions your game space, like quad trees.

Finally, remember that 6.4 milliseconds is still 40 percent of your entire frame time. If you are maintaining on the order of a thousand or more objects at a time, other parts of your code are almost certainly also going to be difficult to manage at a reasonable frame rate. Is optimizing your collision detection the best use of your time? How do you know in advance which ones to optimize? Optimizing all of them as you go will surely take you longer to write, not to mention make your code more difficult to debug and maintain.

If benchmarking shows your algorithm has problems without implementation optimizations, you are probably better off with a different algorithm.

Using the EQATEC Profiler to profile your game’s running time

Profiling your game performance is a significant part of the whole game development process. No matter how efficient the used algorithms are, or how powerful the hardware is, you still need to get sufficiently accurate CPU running time charts for different functions calling in different hardware conditions. Choosing a good profiling tool will help you to find the hot spots which consume most CPU game resources, and lead you to create the most efficient optimization. For Windows Phone, EQATEC is a good choice and in this recipe, you will learn how to use the EQATEC Profiler to profile your Window Phone game.

Getting ready

You can download the EQATEC Profiler from the official company website located at the following URL:

[code]
http://www.eqatec.com/Profiler/
[/code]

The following screenshot shows what website looks like:

EQATEC Profiler from the official company

After clicking on the Download EQATEC Profiler, a new page will let you choose the profiler version; the free version is fine for our needs. After filling out some basic information, the website will send a URL for downloading the profiler to your e-mail address. When you have installed the downloaded profiler, you are ready for profiling your Windows Phone 7 game.

How to do it…

Carry out the following steps:

  1. Run the EQATEC Profiler through the Start menu or through the root directory where the profiler binary application is located. If the profiler runs correctly, you should get the following screen:
    EQATEC Profiler through the Start menu
  2. The Browse button lets you locate the root directory of your Windows Phone 7 XAP file for profiling. When the directory is set, you will choose the XAP file which will be listed in the list box under the App path textbox. The testing XAP file and source code can be found in the bundle file of this chapter. After that, you should click on the Browse button for building the profile description file that processed the number of methods in the designated application and some application metadata.
  3. Then, after selecting the application you want to profile, click on the Run button to start the profiling. When the Run button is clicked, a prompt will pop up asking you about the device to be used for profiling, Windows Phone 7 Device or Windows Phone 7 Emulator. In the example, we chose the emulator as shown in the following screenshot:
    Windows Phone 7 Device or Windows Phone 7 Emulator
  4. Under the Run tab, if you are sure the Windows Phone 7 application is ready, it is time for profiling. The window should look similar to the following screenshot:
    Run tab, if you are sure the Windows Phone 7
  5. Now, click on the yellow Run app button. The profiler will automatically start a Windows Phone 7 emulator and connect to the emulator. Next, it will install the profiled Windows Phone 7 XAP file on the emulator. When this step is done, profiler will start to track and profile the designated application. At this moment, if you want to know the actual time of every method in your application costs, you need to click on the Take snapshot button with a timer symbol under the information list box, and a new snapshot report which includes the running time of every method will be generated. Then, click on the yellow View button once you have chosen the report you want to review.
  6. In the snapshot view window, you will find how many milliseconds every method takes. The windows will look similar to the following screenshot:
    many milliseconds every method

How it works…

The time of every method is listed in the list box:

  • Initialize() method: 693 MS
  • LoadContent() method: 671 MS
  • Draw() method: 122 MS
  • DrawModel() method: 50 MS
  • Update() method: 43 MS

You can find more details in the Details of Individual methods panel. This panel will tell you the percentage of called method costs on time of the caller. In this example, the LoadContent() method consumes 671 MS which occupies 97percent of the Initialize() method total time.

Reducing the game contents’ loading time

As you know, most of the time, before playing a game, a screen for loading game contents will show up with a running progress bar. Without this, you may feel the game is stuck and not responding. If you know that the game is loading and can see its progress, you know that all you have to do is wait. Usually, no one wants to wait too long for playing games, however. It is wasting time and can cause frustration to the user. For better user experiences, the following sections will tell you how to reduce the loading time of game contents.

Making good loading decisions

Often, the first step in reducing loading times is to understand where the current greatest expenses are. Highlighting the frequency and timing of content loading is an effective way to evaluate and adjust loading times, as well as validate that the right content (no more and no less) is being loaded in a given scenario. Consider instrumenting the following:

  • The time required to load an asset; the System.Diagnostics.Stopwatch object can be used for this
  • The frequency with which each asset has been loaded over multiple game levels, across sessions, and so on
  • The frequency with which each asset is freed
  • The average lifetime

Using the XNA content pipeline to reduce file size

Compressing the game contents into an XNB file will make a great file size reduction in building time.

For the XNA framework, assets are shared between PC, Xbox, and Windows Phone 7 platforms, and you can reuse the textures, models, and so on. If a texture is consistently scaled down for display on a Windows Phone 7, consider performing that scaling offline, rather than taking the processing penalty bandwidth and memory overhead when loading the content.

Developers may also want to exploit other texture types, such as PNG, where doing so would not contribute to already compressed assets. For sparse textures, PNG on Windows Phone will typically demonstrate superior compression to a DXT-compressed content that is brought through the XNA content pipeline. In order to use other texture types, the source files must be copied to the output directory and not compiled in the content pipeline.

Note that, while DXT-compressed assets can be used natively by Windows Phone 7 GPUs, many formats including PNG need to be expanded at runtime to a raw format of 32 bits per pixel. This expansion can lead to increased memory overhead compared to DXT compression.

In order to balance the runtime memory footprint of DXT with the loading time footprint of more aggressive compression formats, developers may choose to apply custom compression and runtime decompression to the DXT content (as built by the XNA pipeline into .xnb files), which can lead to a significant reduction in loading times. Developers should balance the loading time considerations with CPU requirements to decode their custom-encoded content, as well as with memory requirements to handle and manipulate the decompressed data. The offline custom compression and runtime title-managed decompression of the DXT content can offer a good balance of reduced size (and thus, reduced loading time) without large runtime memory costs.

Developers can also pack multiple images into a single texture, as demonstrated by the content processor in the spritesheet. We have already discussed in Chapter 4, Heads Up Display (HUD)—Your Phone Game User Interface, that spritesheets avoid DXT power-of-two restrictions imposed by the XNA content processor, and optimize the file loading (replacing many small files with one larger one).

In the realm of sound, if native audio assets from a console title are 48 kHz, consider down sampling them to 44.1 kHz (prior to applying the XNA pipeline’s own compression) for use on the phone. This will realize an immediate 8 percent savings (approximately) on storage and reading bandwidth, as well as mild CPU savings for running at the native sampling rate of the Windows Phone device (44.1 kHz).

Beyond compression, decreasing loading times can focus on data organization that focuses on the efforts of loading the content that is needed to drive to an initial interactive state, rather than preparing all possible loaded data. This is particularly important in avoiding the watchdog timer; a title that loads data for too long prior to drawing to the screen risks being terminated by the system. Developers should also give similar attention to the in-game content loading. Remember that returning to gameplay from interruptions (SMS, phone, app purchase, and so on) invalidates all the previously loaded content.

Evaluating the asynchronous background loading

Even if the game takes a substantial set-up time, there are numerous techniques to getting the user into some kind of interactive state sooner. Anything from a simplified arcade-style loading screen to cut-scenes, trivia, “did you know” facts, and other low-CPU-impact techniques can be leveraged to help smooth the setup and transition from loading to gameplay.

Loading to an initial menu state or a cut-scene, and then continuing to load additional assets in the background would seem to be appropriate strategies for masking loading times from the consumer. However, LoadContent() performs byte copies of each loaded texture asset that uses the XNA content pipeline, generating garbage. Moreover, LoadContent(), overall, will trigger the garbage collection at each megabyte of loaded data. Depending on the actual interactivity of foreground scenes, the potential CPU cost taken by garbage collection may be acceptable; playback of pre-rendered video cut-scenes takes advantage of purpose-built hardware, so the CPU utilization is typically negligible. Similarly, static or intermittently animated menu systems would likely have more success here than attempting to generate the CPU-intensive content rendered in-engine during the background loading.

Considering the custom serialization

Microsoft’s .NET framework provides an easy to use method for serializing data onto disks, using types present in the System.Xml.Serialization namespace. Simplicity always comes with tradeoffs, however; in this case, the tradeoff is the file size. The default serialization schema is verbose. The behavior of the XmlSerializer is trivially easy to change, however, and can result in significant savings in file sizes.

As an example, let’s consider the following class definition:

[code]
public class TestClass
{
public int count;
public float size;
public bool enabled;
public string
LongNameOfAMinorFieldThatDoesntNeedALongNameInTheFile = “test”;
}
[/code]

The preceding class definition, when serialized with the default XmlSerializer, produces the following XML:

[code]
<?xml version=”1.0″?>
<TestClass xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance
xmlns:xsd=”http://www.w3.org/2001/XMLSchema”>
<count>0</count>
<size>0</size>
<enabled>false</enabled>
<LongNameOfAMinorFieldThatDoesntNeedALongNameInTheFile>test
</LongNameOfAMinorFieldThatDoesntNeedALongNameInTheFile>
</TestClass>
[/code]

The default behavior of XmlSerializer is to treat each public field or property as an XML element. This generates quite a bit of extra data in the file; this XML file uses 332 bytes on the disk to serialize four fields. With a few simple changes, we can get significantly smaller files from XmlSerializer. Consider the following class declaration:

[code]
public class TestClass2
{
[XmlAttribute(AttributeName=”count”)]
public int count;
[XmlAttribute(AttributeName=”size”)]
public float size;
[XmlAttribute(AttributeName=”enable”)]
public bool enabled;
[XmlAttribute(AttributeName = “longName”)]
public string
LongNameOfAMinorFieldThatDoesntNeedALongNameInTheFile = “test”;
}
[/code]

With XmlAttribute added to properties, the XmlSerializer treats the field as attributes rather than elements, and gives the attributes alternative names. The resulting XML is the following:
[code]
<?xml version=”1.0″?>
<TestClass2 xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
xmlns:xsd=”http://www.w3.org/2001/XMLSchema” count =”0″ size =”0″
enable =”false” longName =”test” />
[/code]

The serialized file has significantly less wasted text. The file size also shrank to 167 bytes. This is a saving of roughly 50 percent, and a more reasonable file size to serialize four fields. Modifying your serialization code to prefer the XML attributes to XML elements will often result in similar savings. Even if you don’t perform renaming, as we did in this example, you will generally get close to a 50 percent reduction, as every XmlElement has to have a closing tag, while attributes don’t.

Avoid using XmlAttribute for complex types, or for collections of types. The space savings are minimal in these cases, and the resulting file is considerably more difficult to read. For larger amounts of data, consider writing a custom binary serialization code. In all cases, ensure that you time any new code to confirm any realized performance gains over the default Serializer settings.

Improving game performance with garbage collection

Discussing the garbage collector (GC) that runs on Windows Phone 7 devices is helpful for the Windows Phone 7 game developer. Anyone who has programmed in XNA for Windows or Xbox 360 before knows the GC well.

Value types versus reference types

One of the first things you must understand is the difference between value types and reference types. Value types such as int, float, Vector3, Matrix, and struct (this includes nullable types; a nullable type such as BOOL is just a special struct) live on the stack. The GC does not care about the stack. Well, technically, it cares slightly, but only to the extent that the system begins to run low on memory, and you would have to be trying very hard to get enough items on the stack to cause the system to run low on memory. So don’t worry about calling “new Vector3()” or “Matrix.CreateTranslation()” in your methods that run regularly (such as Update and Draw) it is just a stack allocation and it won’t anger the GC.

Classes are an entirely different matter. Classes, arrays (including arrays of value types, for example, int[ ]), collections (List<>, Dictionary<>, and so on.), and strings (yes, strings) are all reference types and they live on the heap. The heap is the GC’s caring. It pays attention to everything that shows up on the heap and to everything that no longer has any business there, but is still hanging around.

Defining a true value checking method

Take a look at the following code listing:

[code]
void CheckForTrue(bool value)
{
string trueText = “The value is true.”;
string falseText = “The value is false.”;
if (value == true)
{
Console.WriteLine(trueText);
}
else
{
Console.WriteLine(falseText);
}
return;
}
[/code]

Every time this method runs, trueText and falseText will both be allocated on the heap and will “go out of scope” when the the method is run. In other words, “gone out of scope” simply means that there are no more references to an object. A string declared with const never goes out of scope, and thus does not matter to GC for all practical purposes. This is also true for any object declared as static readonly, as once it is created it exists forever. However, the same is not true for a normal static, though many might mistakenly assume so. A static object without the readonly keyword applied to it will generally exist for the life of a program. However, if it is ever set to null, then unless there is some other reference to it, it goes out of scope and is subject to garbage collection.

Technically, the GC runs for every 1 MB of heap allocation. Whenever the GC is running, it takes time to comb through the heap and destroy any objects that are no longer in scope. Depending on how many references you have and how complex nesting of objects is, this can take a bit of time. In XNA, the clock is on a fixed time-step by default and in Windows Phone 7, the default frame rate is 30 FPS. This means that there are 33.3333333 milliseconds available for Update() and Draw() methods to finish their CPU-side tasks. Draw prepares things on the CPU-side, then hands over the actual drawing to the GPU which, being a separate processor, does not usually affect the Update/Draw side of things, except for stalls, but those are beyond the scope of this book and most people will never run into them anyway. If they finish ahead of time, the CPU hangs out and waits until it is time to run Update() again. If not, then the system takes notice that it is running behind and will skip as many draws as necessary to catch back up.

This is where the GC comes in. Normally, your code will complete just fine within 33.33 milliseconds, thereby maintaining a nice even 30 FPS (if your code does not normally complete within that time, you will see serious constant performance problems that may even cause your game to crash after a little while if XNA gets so far behind that it throws up its hands and surrenders). However, when the GC runs, it eats into that time. If you have kept the heap nice and simple, the GC will run nice and fast and this likely won’t matter. However, keeping a simple heap that the GC can run through quickly is a difficult programming task that requires a lot of planning and/or rewriting, and even then is not fool proof (sometimes, you just have a lot of stuff on the heap in a complex game with many assets). A much simpler option assuming you can do it is to limit or even eliminate all allocations during gameplay. You will obviously be allocating heap memory when you first start the game (for example, when loading assets in the LoadContent() method), and you will be allocating memory when loading levels, if you have a game with levels and decide to load each one in an interstitial screen. You will also be allocating memory when changing game screens. However, a small stutter from a couple of dropped frames in between levels or while switching screens is not a big concern the player is not going to accidentally fall off a cliff or get hit by an enemy projectile or anything when those things are happening. In fact, sometimes it makes a lot of sense to intentionally trigger the GC right before the game is going to (re)start. Triggering the GC resets the 1 MB counter and can prevent situations where the counter is at .94 MB when the level begins, such that even a small number of minimal allocations that would otherwise be perfectly acceptable, can cause problems.

Therefore, the goal is to minimize heap allocations. How do we do that? Well, the biggest contributors are needlessly creating new objects in your Update or Draw cycle and boxing value types. First, a quick note on boxing; the simplest example of boxing is casting a value type like int or enum to object in order to pass it as a state. Boxing is a great feature of .NET, but not recommended for game programming because of the heap allocations that can trigger the GC. So keep an eye out for it and try not to do it.

Another big contributor is creating new reference types. Every new instance of an object causes a heap allocation and increases that counter ever so slightly. There are several coding practices that will help you to eliminate needless heap allocation and increase performance for your game.

Using StringBuilder for string operations

Make any strings that never change into const strings.

Where you need strings that change, consider using System.Text.StringBuilder (visit http://msdn.microsoft.com/en-us/library/system.text. stringbuilder.aspx for more information on StringBuilder). All XNA methods that take a string (for example, SpriteBatch.DrawString) will also take a StringBuilder object. Make sure to use one of the constructors which take a default capacity and set it to a value high enough to hold as many characters as you plan, plus a few extra for good measure. If the internal array is large, it will never have to resize itself, and thus will never generate any heap allocations after it is created!

Drawing integer in string without garbage

If you need to draw an int value, such as a score or the number of lives a player has, consider using the following block of code (thanks to Stephen Styrchak):

[code]
public static class SpriteBatchExtensions
{
private static string[] digits = { “0”, “1”, “2”, “3”, “4”, “5”,
“6”, “7”, “8”, “9” };
private static string[] charBuffer = new string[10];
private static float[] xposBuffer = new float[10];
private static readonly string minValue =
Int32.MinValue.ToString(CultureInfo.InvariantCulture);
// Extension method for SpriteBatch that draws an integer
// without allocating any memory. This function avoids garbage
// collections that are normally caused by calling
// Int32.ToString or String.Format. the equivalent of calling
// spriteFont.MeasureString on
// value.ToString(CultureInfo.InvariantCulture).
public static Vector2 DrawInt32(this SpriteBatch spriteBatch,
SpriteFont spriteFont, int value,
Vector2 position, Color color)
{
Vector2 nextPosition = position;
if (value == Int32.MinValue)
{
nextPosition.X = nextPosition.X +
spriteFont.MeasureString(minValue).X;
spriteBatch.DrawString(spriteFont, minValue, position,
color);
position = nextPosition;
}
else
{
if (value < 0)
{
nextPosition.X = nextPosition.X +
spriteFont.MeasureString(“-“).X;
spriteBatch.DrawString(spriteFont, “-“, position,
color);
value = -value;
position = nextPosition;
}
int index = 0;
do
{
int modulus = value % 10;
value = value / 10;
charBuffer[index] = digits[modulus];
xposBuffer[index] = spriteFont.MeasureString
(digits[modulus]).X;
index += 1;
}
while (value > 0);
for (int i = index – 1; i >= 0; –i)
{
nextPosition.X = nextPosition.X + xposBuffer[i];
spriteBatch.DrawString(spriteFont, charBuffer[i],
position, color);
position = nextPosition;
}
}
return position;
}
}
[/code]

Taking advantage of the list for sprites

If you have a Sprites class, for example, create an object pool to reuse it rather than letting it fall out of scope and creating a new one each time one ceases to exist in the game and each time you need a new one. As an example, create a generic List<> of your Sprites class (refer http://msdn.microsoft.com/en-us/library/6sh2ey19. aspx for more information on lists). Use the List<> constructor overload that takes a default capacity and make sure to set it to a value high enough to contain all the objects of that sort and will exist at one time in your game (for example, 300). Then, use a for loop to go through and create all of the objects in the list up to the capacity. Add a public bool IsAlive { get; set; } property to your class to keep track of which ones are being used at any particular time. When you need a new one, loop through the list until you find one, where IsAlive is false. Take that one, set IsAlive to true, set the other properties (such as its position, direction, and so on.) to their appropriate values, and continue. When doing collision detection, loop through using a for or a foreach loop and process only the objects for which IsAlive is true. The same approach should be followed for updating and drawing them. Whenever one is no longer needed (for example, when it collides with something or it goes off screen), simply set its IsAlive to false and it will now be available for reuse without any memory allocation. If you want to be creative, you can expand on this further in several different ways. You could keep a count of the number of live objects, so that once you have processed that number in your update and draw methods, you can use the break keyword to get out of the loop early, rather than go all the way to the end. Alternatively, you could keep two lists: one for storing live objects and one for dead objects, and move objects between the two lists as appropriate.

Preferring struct rather than class when just an instance is needed

If you do want to create something, you can just create a new “instance” of each Update() or Draw() method. Try creating a struct instead of a class. Structures can perform most of the things that classes can (the major limitation being that they cannot inherit from another structure, a class, or anything else, but they can implement interfaces). Moreover, structures live on the stack and not on the heap, so unless you have a reference type, like a string or a class, as a field or property of the structure, you will not generate any trash using a structure. Remember, though, that an array of structures is a reference type (as are all arrays), and thus lives on the heap and counts towards the GC trigger limit whenever created.

Avoiding use of LINQ in game developing

Don’t use LINQ. It looks cool. It makes your code shorter, simpler, and perhaps even easier to read. However, LINQ queries can easily become a big source of trash. They are fine in your startup code, as you are going to generate trash there anyway, just by loading assets and preparing game resources. However, don’t use it in Update(), Draw(), or any other method that gets called during the gameplay.

Minimizing the use of ToString()

Minimize use of ToString(). At a minimum, it creates a string, which lives on the heap (refer to the Drawing integer in string without garbage section discussed earlier in this chapter). If you do need to use ToString(), try to limit how often it is called. If the string only changes every level, then generate it only once at the beginning of the level. If it only changes when a certain value changes, then generate it only when that value changes. Any limits you can set are worth it. The amount of time it takes to check a Boolean condition is so small as to be almost non-existent. You could probably fit tens and even hundreds of thousands of true/false checks for the time it takes the GC to run on a complex heap.