Today is an exciting day because I finally get to share the dirty details on some really cool tech written for Planetary Annihilation. If you aren’t familiar with the project it’s an upcoming real-time strategy game that had a wildly successful kickstarter in August 2012 and just entered beta. You can pre-order directly ($40 retail, $60 retail + beta) or via Steam and play on Windows, OS X, or Linux1. Here’s our awesome beta launch trailer.
I’m a programmer at Uber Entertainment working on PA and we’re actually doing quite a few interesting things on the technology front. Procedural planet generation via CSG2, planet sized virtual textures, and massive 40-player games to name a few. For this post I’m going to focus on one specific feature that influences the entire codebase, the ChronoCam3.
If a picture is worth a thousand words then a video is even better. Here is a brief clip showcasing the features of ChronoCam.
The ChronoCam is similar to a replay system except it’s in the live game. While you are mid-game you can jump back to look at the world from any point in time, play in slow/fast motion, scrub the timeline from start to finish, and even play in reverse.
If your scout gets destroyed and you weren’t paying attention you can use the ChronoCam to find out how it died. If you’re playing with dual monitors you can simultaneously view the game world from two entirely different points in time4.
The game engine driving PA is custom code written from scratch5. The architecture that enables ChronoCam is the brainchild of Jon Mavor (CTO) and William Howe-Lott (Lead Architect). I did not come up with the initial design. However I have worked on a lot of the pieces and understand the whole system well enough that I can write words about it on the internet.
In a lot of respects the systems I’m going to describe today are the natural conclusion of a single decision on how to structure our networked data. Before going into that it’s important to understand how other games work.
Many, if not most, strategy games are written with a lock step synchronous engine. Each client is running the exact same game and the only data sent over the network is player input. Once each player has all the input from every other player they can all simulate the next tick. These engines are often peer-to-peer but it isn’t a strict requirement6.
This is how an hour long Starcraft game with hundreds of units can have a replay file under 100kb. The only data in that file is player input which is used to resimulate the game from scratch.
Unfortunately there are some downsides. First and foremost is that if a single client can’t simulate the game fast enough then the game slows down for everyone. This greatly limits the potential scope of the game. It also makes features such as join in progress and reconnection significantly more difficult and complicated to implement.
I wrote about synchronous engines in length with respect to Supreme Commander in a previous blog post, Synchronous RTS Engines and a Tale of Desyncs (2011). Additional articles on this topic worth reading are 1500 Archers on a 28.8: Network Programming in Age of Empires and Beyond (2001) and Floating Point Determinism (2013).
A more common model is that of the client-server. All players connect to a server which continuously sends each player the current state of the world. The clients use this “current” data to interpolate, and sometimes extrapolate, their local view of the world. Servers do their best to keep players fully up to date with a minimal amount of latency.
A phenomenal talk on this subject is David Aldridge’s GDC presentation “I Shot You First! — Gameplay Networking in Halo: Reach” (2011). Bungie has made the slides available in a lightweight video free 7mb version (link) and a media rich 530mb version (link). I highly recommend taking the time to go through these slides.
For Planetary Annihilation we want huge games that scale with server power. A beefy rig with many cores should be able to support more players and more planets than a normal PC.
The synchronous model can run only as fast as the slowest player. Due to the popularity of laptops running integrated GPUs that turns out to be remarkably slow. To support massive games spanning the solar system we have chosen to implement a client-server architecture7. This enables the multi-planet heavy simulation to be moved to the server and leaves clients largely responsible only for rendering what they can see.
The next decision is how to represent, store, and update data for game objects. Our novel approach which enables ChronoCam is curves. Every bit of data sent from the server to the client is represented by a curve8. You can think of a curve as a timeline that begins when the object is created and ends when the object is destroyed.
Understanding what a curve is and how it works is a bit tricky. If you understand keyframe interpolation, such as the kind used in 2d/3d animation, then you’re halfway there.
Let’s say you have a tank. This tank will have a curve representing it’s position and it has two keyframe values: (t=0, pos=1) and (t=1, pos=5). We clearly know the position at time t=0 and t=1.We can also calculate the position at t=.25, t=.5, and t=.75 with ease.
Every time the tank moves a new keyframe is added. If the tank does not move then no new keyframes are added and the last value is used (shown by the dotted line).
Units are composed of only a small handful of curves. Position, orientation, health, built_fraction, vision_flags, weapon_target, weapon_angle, and a few others. Each curve is updated independently. If only position changes then only a position curve keyframe is sent over the network. If no curves change then zero bandwidth is consumed.
State of the World
What makes ChronoCam possible is storing all of the curves. Clients are almost “dumb” because all they do is render what the world looks like at a single point in time. ChronoCam is simply an interface to control where that point is.
If a tank is created at t=100 and destroyed two minutes later at t=220 and you have the curves for it’s position and orientation for that time range then you can show where the tank was at any point in it’s entire life. If you have those curves for all the tanks and all the objects then you can render the whole world as it existed at any point in time.
Predicting the Future
Storing data in curves let’s the engine do all kinds of cool things that other games can’t do.
Let’s say you’re building a factory and it’s going to take 50 seconds to complete and there is a number from 0 to 1.0 to represent how complete it is. This number is used to show the progress bar, visual effects, etc. In a game like Halo the server would need to continuously send updates every server tick. 0.0 -> 0.02 -> 0.04 ..… 0.96 -> 0.98 -> 1.0. That’s a lot of data!
With curves the server only needs to send two keyframes: (t=start_time, value=0.0) and (t=start_time+50, value=1.0). The client can then interpolate the exact completion percentage at any point in time. That’s way less data than the standard approach!
“Hold on a second!” said the astute reader. What happens if the build rate changes? Multiple units can help construct a factory and it will go faster! You are correct. In this case we need to reshape the curve by removing the original end keyframe, adding a new intermediate keyframe, and adding a new end point.
Let’s assume that after 20 seconds a second builder is added that doubles the build rate. Here’s what that would look like.
Tada! Problem solved. Alas this solution is not free. It comes with a cost. The server initially sent an end keyframe that ended up being wrong. Bandwidth was wasted sending the initial keyframe and then a little more to tell the client to delete the keyframe.
The rule of the thumb is to predict things into the future as far as you can. However if you’re wrong there is a bandwidth penalty. If you’re wrong too often then you’ve out clevered yourself and it’s cheaper to send keyframes tick by tick.
In client-server architectures it’s common for the server to run at a reduced rate. The Planetary Annihilation server and Supreme Commander sim both tick at 10 frames per second even though clients can run a silky smooth 60.
This reduced tick rate is mostly seamless to the client player but it does have some problem cases. The classic example is that of a bouncing ball. If you were to drop a golf ball from 2 meters high straight down to the floor it’s height over time should look like this.
With a 10 Hz tick rate the interpolated position is not smooth due to an ugly hitch. This is because the moment of impact occurs in the middle of a tick and the client only has data on 100 ms intervals.
With the curve system this issue is easily resolved. The server sends clients updated keyframes once per tick. However the server is also free to send as many keyframes as it wants, including intermediate values. The bouncing ball case is trivially solved in our engine by sending a single additional keyframe from the middle of the tick at the moment of impact.
There is a general goal to send as few keyframes as is necessary to minimize bandwidth usage. Additional keyframes are easy to add and can be worth the cost when there is a clear improvement in quality.
So far I’ve only talked about one type of curve, linear. Currently the engine has two additional curve types, step and pulse.
Step curves are the opposite of linear curves. There is no interpolation. This is useful for integer values which you probably don’t want to interpolate as well as booleans which you can’t interpolate.
Here is an example using ammo capacity. A unit has a full clip with 10 shots. It then fires 5 shots in succession — pew pew pew! There is a pause while the unit reloads and capacity jumps back to 10.
Step curves allow any non-interpolated data types to be easily stored in the curve system. For example we have a step curve on units that stores a std::vector of order guids the unit is following9.
Pulse curves are for instantaneous events that have zero duration. These events are triggered by the server and used by the client to play effects and sounds. Here is an example for a unit that fires two short bursts before dying.
Technically speaking clients do not show a single point in time. Instead each update represents a small slice of time. For a game running at 60fps that slice is 16.6ms wide. When updating the client will find all pulses within the time slice and process them appropriately.
A lot of readers have probably thought about compression by now. Why are curves linear? Would other curve forms look smoother? Could they save space? Aren’t splines better?
Well yes and no. There are two distinct situations two consider. The live game and complete replays.
We’re Doing It Live
When playing live and updating there is less room for curve optimization than you’d think. Once data has been sent over the network that’s it, you’ve paid for it. When appending a single new keyframe to the end of a curve there is little to no room for cleverness. It is possible to erase and/or modify keyframes after they’ve been sent to the client. However doing so means you pay the initial bandwidth cost plus a penalty to send a remove keyframe message.
Most of the interesting server side optimization work occurs outside of the curve itself. The best curve is the one you never have to send. The second best curve is one with a minimal set of keyframes and an accurate prediction.
A place where there is room for compression is in the final replay file once a game is complete. Unfortunately we haven’t spent much time working in this area so I don’t have much to say just yet. We just released beta and this type of optimization won’t be high priority until closer to launch.
Seek and Advance
Clients have a world state that comes from sampling all the curves for all the objects at a point in time. There are a few ways this time value can be updated. Supporting all of them is a lot of work, but it’s also what makes ChronoCam possible.
The simplest method is advance and it’s pretty much what it sounds like. Calculate client world state for some value of time, render the world, advance time by dt, and do it all again.
The basic case is dt=1/60 (at 60fps) and it works like pretty much every video game ever. ChronoCam supports variable playback rate so dt is not directly tied to framerate. It can be scaled faster (1.5x, 2x, 3x) or slower (0.5x, 0.25x, 0.1x).
Where advance gets complicated is that dt can also be negative. Remember that ChronoCam can play in reverse! For values calculated from curves — position, orientation, etc — this is simple and “just works”. What’s not so simple is any client side data that is derived from sampled curve data.
Animations, sounds, effects, and more all need to support playing in reverse. For animation this is fairly easy. Particle effects are less easy. Not everything can trivially be updated backwards.
There are a variety of cases to handle and generalized solutions are an on-going conversation.
The other major method of updating time is seek. At the start of the post I said that ChronoCam influences the entire codebase and this is what I was referring to. Seek is the ability to sample world state at any arbitrary point in time. It’s what happens if you use ChronoCam to go back by 10 minutes.
For seek to work the complete client world state must be calculable from only curve data. More specifically this means nothing on the client can fully depend on the previous frame. This is crazy! Games are built on tried and true game loops. You update from frame A to frame B to frame C. For seek to work it must be possible to generate any frame from scratch.
Seek creates a lot of interesting edge cases. For example our server does not play animation. Bots walk and turn but the animation is purely client side. It would be a ridiculous waste of bandwidth for the server to send playback data down the wire.
What units do have is a position curve. They also have a velocity curve which is calculated as a derivative of position10. This velocity value can then be used to start and stop a walk animation. When the client advances, not seeks, by dt it can also advance the animation tree by dt.
Now imagine the seek scenario. A bot army is destroyed and a player uses ChronoCam to go back in time to when the bots are alive and marching to their doom. This re-allocates the units and causes a seek on the anim tree11. Ordinarily the bots would be walking across the battlefield each on a unique animation frame. After a seek case every bot is starting their walk animation at the same time so they’ll be perfectly in sync.
This is going to sound silly, but we don’t actually want our robots to feel too robotic. How do we fix it? Tragically, on a case by case basis. When seeking into a walk animation the solution is to pick a random frame of the walk loop and advance from there.
Each client subsystem must fully evaluate how they advance forward, advance backwards, and seek.
Cheating in video games, especially PC games, is rampant. A particularly common hack in strategy games is the map hack. In a synchronous engine there honestly isn’t much a game dev can do to stop it. All clients simulate the entire game so all state information is in memory. It may require bit twiddling trickery but if an enemy unit location is in memory then cheaters will find it.
In Planetary Annihilation with our client-server architecture cheating just isn’t a major concern12. Curves do not offer anything new to devs to stop cheating, but they do make it a lot easier. Want to prevent a specific client from having position information on an enemy unit because they don’t have line of sight? Easy, don’t update the curves for that unit for that player. Problem solved.
What happens if a curve has missing data? Nothing. Sweet, sweet nothing. It’s a double win because players can’t cheat and bandwidth isn’t wasted on units a player can’t see. For the most part a player will only receive updates for their units and the few enemy units within their vision range.
Eventually cheaters will find a way to force rendering of units that are in the fog of war. It can’t be stopped. All that it will accomplish is showing where the unit was the last time it was updated, but not where it is. Clients don’t even know a unit is destroyed until they have visual confirmation13.
If how cheats are written is of interest to you I wrote about another form of cheating in a previous blog post, Extravagant Cheating via Direct X.
One of the driving inspirations for ChronoCam is the robust replay support it enables. The ability to freely pause, jump back, rewind, and fast forward are features that every game wants but few can provide.
Entire online communities are built on top of replays. Replays are spectated, shared, streamed, commentated on, and more. The popularity of YouTube Let’s Play videos is nothing short of staggering14. Providing stellar support to these communities has been a major goal from day one.
Our curve system also provides a lot of flexibility when it comes to features such as live streaming. In a sense the server buffers the entire game and can serve data to spectators with an arbitrary tape delay. Curves can also be distributed to additional servers to spread bandwidth loads if there is sufficiently high demand15.
An issue with many replay systems is old replays being invalidated any time there is a patch. This is particularly problematic with synchronous engines where the only contents of a replay file are player input. Some games, such as Starcraft 2, work around this issue by actually loading an old copy of the game when playing an old replay.
In client-server games what often causes the problem is a change in the network protocol. Binary packet data definitions get altered making it impossible to parse old versions. Part of the solution is to use a backwards compatible serialization structure such as Protocol Buffers or Cap’n Proto.
In a dream world old replay files will “just work”. To make this dream a reality we are developing our own twist on Protocol Buffers with a custom library called UberProto. Once completed it will let clients play back replays from any previous game version. It’s a bit too early to share all the details of UberProto so I’ll report back when I am able.
The elephant in the room at this point is bandwidth. Do curves consume a lot of bandwidth? Do they take up a lot of memory? How much bandwidth do clients and servers need? How much do they want?
Because we’re client-server the most demanding data stream is the server upload. Clients mostly download with minimal upload. Servers have minimal download but must upload data to every client.
Now let’s pause for a moment and talk about advances in technology. Age of Empires was designed to run on a 28.8 modem in 1997. Halo: Reach was designed for 16-player games hosted on cheap consumer broadband in 201016. Planetary Annihilation is designed to run on dedicated servers in the cloud in 2013.
We have our own infrastructure, UberNet, which runs all back-end services for all our games. It provides all of the features needed for a modern game — patching, matchmaking, server hosting, replay storing, stat tracking, payment processing, etc. Our servers run on a variety of sources spread across the globe. We primarily use dedicated boxes and spin up dynamic instances to handle overflow17.
Because our dedicated servers run in large data centers they have access to a 1 gigabit upstream connection. The only throttle we’ll hit is the one we set manually. This lets us focus on core functionality and gameplay systems during early development. Network optimization can then be delayed until later in the dev cycle.
What I find to be a delightfully elegant situation is that we pay for bandwidth based on how much we use. It makes for a very strong incentive to keep servers fast on the cpu and light on the upstream which improves the experience for everyone involved18.
What is the bandwidth cost of a unit? It depends! A unit doing nothing consumes zero bytes of bandwidth. On the other hand units engaging in combat have multiple curves all updating in an unpredictable and non-linear manner.
Planetary Annihilation is targeting a modest 1 megabit connection for players19. This is a number that is low enough to not be overly burdensome to either clients or servers, but also high enough to allow for exciting games that are epic in scale.
Servers will need to support 1 Mbit per connected player. For large games with many players this can add up quickly. This is why Uber is running dedicated servers so players never have to worry about it.
It’s worth noting that the bandwidth target is for late game with a lot of units and action. When the game first starts and each player has a tiny base it uses only a few kilobytes per second.
I think that just about does it. Phew. I’m exhausted.
This has been a fairly in-depth look into a large portion of the Planetary Annihilation code base to the best of my ability. I think ChronoCam is cool as hell and I’d like to think I’m not the only one. I hope it all made sense and I hope you enjoyed reading.
If you have any questions please ask away in the comments. If there is enough interest I’ll round up questions to more publicly answer in a sequel post.
There are also plenty of other exciting things to talk about on the Planetary Annihilation technology front. If there’s any topic that strikes your fancy please say so in the comments and I’ll see what I can put together.
- It’s my blog where I have reserved the right to shill.
- Constructive Solid Geometry. Additional reading: Bending Solid Geometry in PA, PA Engine Terrain
- The kickstarter community on our backer forums actually suggested this name.
- This feature isn’t available yet, but it will be. It will require more ram to support both views.
- No licenseable engine on the market meets the needs of a large scale RTS.
- A middle-man or server is often needed to resolve NAT connection issues anyways.
- Bonus Fact: Total Annihilation used an asynchronous peer-to-peer architecture.
- Technically not every bit. Constant data, such as object id, isn’t a curve for obvious reasons.
- Unit orders are input commands such as move, attack, assist, repair, patrol, build, etc.
- Derivatives are calculated entirely on the client and consume no bandwidth. Hooray!
- Curves persist in memory but the objects they represent do not.
- Did I just throw the gauntlet? Does it even make a difference if I did?
- Many units will be created and destroyed without being seen by a given player. They will never know the unit ever existed. That’s actually kind of sad.
- YouTube has well over 100 gaming specific channels with over 100,000,000 views each.
- This won’t be needed until well after launch, but it’s nice to know it can be done.
- Min Host Upstream: 16 kbps/player, 250 kbps/game. Max Host Upstream: 45kbps/player, 675kbps/game.
- Mostly SoftLayer for dedicated and Amazon EC2 for elastic. We have a generic interface and plug in servers from a variety of providers.
- The game is DRM free and server binaries will be released with retail so players can host their own servers. This includes LAN support where bandwidth is of little concern.
- Current usage in a ~6 player game after an hour is at or under 1mbit. We haven’t even begun major optimization work which will let us hit this rate for much, much, much larger games.