Production 101 - #13 How to Build a Milestone Schedule
Building timelines that survive contact with a development team
A milestone schedule is a communication tool, not a prediction machine. It tells the team what they’re building to and when, tells leadership what to expect, and tells publishers and stakeholders when money moves.
Working backwards from a ship date is the right starting point. Find the hard external gates first, then fill in backwards from there.
Contingency placed evenly across a schedule tends to disappear. A single protected buffer late in the project is harder to erode than padding spread through every iteration.
Probabilistic forecasting, done well, lets you say “there’s a 75% chance we ship before March 22” instead of “we’ll ship on March 15.” The first statement is more honest and gives stakeholders something they can actually plan around.
This is part of the Production 101 series.
A schedule is a theory. The moment the project starts, reality runs an experiment to test it. The producers who understand this build schedules that break gracefully, in predictable places, with enough room to recover. The producers who treat the schedule as a fixed plan spend the back half of every project explaining why the theory was wrong.
What a milestone schedule is actually for
Most people, when they think about a milestone schedule, think about prediction. The schedule will tell us when things will be done. That framing leads to disappointment almost immediately, because the schedule is not a prediction device. It’s a communication device.
The schedule talks to different audiences simultaneously, and each of them wants something different from it. The development team wants to know what they’re building to and when. They need enough structure to make decisions week to week without waiting for permission on every call. Leadership wants confidence in delivery. They need to see a credible path from where the project is now to a shipped product, with enough milestones along the way that they can track progress without reading every iteration update. Publishers and external stakeholders want to know when money moves, when they can plan marketing, when platform certification becomes a real conversation.
The same document has to serve all of them, which is part of why schedules are difficult to build well. You’re not writing one story. You’re writing a document that reads differently depending on who’s looking at it, and if you lose track of that, you end up with a schedule that tells the team too much and leadership too little, or vice versa.
A schedule that serves the team but not the stakeholders will get ignored by the people who fund the project. A schedule that serves the stakeholders but not the team will be invisible in the room where the work actually happens. You need both.
Working backwards from the ship date
The right starting point for any milestone schedule is the end. Find the hard external gates first: platform certification submission, marketing lock, gold master, press review build, the dates written into the contract. Those dates are not moveable, or they’re moveable only at significant cost. Everything else in the schedule fills in backwards from there.
This is how you find the critical path. The critical path is the chain of dependencies that determines the earliest possible ship date. It’s not the whole project. Most of the work on any project runs in parallel and only a subset of it directly controls when you can ship. Finding that chain is the first job, because until you know it, you don’t know where schedule pressure actually lands.
A common mistake is building the schedule by filling forward from the start date, estimating how long things will take, and seeing where you end up. The problem with this approach is that you find out late when there’s a problem, and by then the external gates haven’t moved but the available time has shrunk. Working backwards surfaces those conflicts early, while you still have options.
I’ve started projects both ways. The forward-fill approach always felt more natural at the start because you’re building the picture of the project rather than reasoning back from a date that feels arbitrary. But every time I built a schedule that way, I spent the first month in gentle denial about a conflict I’d have seen immediately if I’d worked backwards. The backwards approach forces honesty about whether the external constraints are even compatible with the work involved. That’s uncomfortable at the start and useful three months later.
Estimating without deceiving yourself
Teams have two competing instincts when it comes to estimation, and both create dysfunction.
The first instinct is inflation as protection. Teams pad their estimates because they’ve been burned before: they gave a realistic estimate, it was cut, and they missed the deadline. The padding is a rational response to a bad system. But a schedule built on padded estimates gives leadership a false picture of where time is actually going, and when everything looks like it’s on track right up until it isn’t, nobody is positioned to respond.
The second instinct is compression as optimism. Studios, and sometimes producers, compress estimates because they want the schedule to be possible. The work has to fit in the time available, so the estimates adjust until it does. A schedule built this way is one that nobody on the team actually believes, and when nobody believes the schedule, it stops having any effect on behaviour.
The fundamental error is conflating estimation with commitment. When an estimate becomes a commitment before any work has started, both honesty and accuracy disappear. An estimate is a forecast. A commitment is something else, and it belongs later in the process.
What historical velocity data gives you is a check on both failure modes. If your team has averaged 18 stories per iteration for the last six months and your new schedule assumes 30, something needs to change before you start. The data doesn’t tell you what the right estimate is. It tells you what the team has actually done, which is a much better starting point than asking people to say how long things will take under pressure.
The distinction to preserve is between estimation and commitment. An estimate says “this is our best current picture of the time this work will take.” A commitment says “we will deliver this thing by this date.” Teams make commitments. Schedules contain estimates. Treating them as interchangeable is where the dysfunction starts.
Where to put contingency
Every experienced producer has been told to include contingency in their schedule. Most of them have watched that contingency disappear. The question is where to put it.
Spreading contingency evenly across the schedule, a few days of buffer at the end of every iteration or milestone, sounds reasonable. It never works. The buffer disappears into Parkinson’s Law: work expands to fill the time available. A team with a week of buffer at the end of an iteration will unconsciously calibrate to use it. By the time the external milestone arrives, the buffer has been spent on tasks that felt urgent in the moment.
Student’s Syndrome adds a second problem. When people know that buffer time exists, they tend to start work later. The buffer becomes the safety net, and the safety net gets used not for genuine overruns but for the delay in starting. End result: the buffer is gone, but the tasks finished roughly when they would have anyway.
The alternative that holds up better in practice is a single protected contingency block late in the project. Not distributed throughout; concentrated at the end, explicitly labelled, and defended against anything that isn’t a genuine schedule threat. Teams can see it. They know it exists and they know why it exists. The social contract is that it doesn’t get raided for scope creep or features that should have been cut earlier.
The single-buffer approach works best when the producer is explicit about it. Not a hidden reserve, but a declared piece of the schedule: “We have three weeks of contingency before certification. We’re not spending it unless something in the critical path goes wrong.” Making it visible makes it harder to erode. Teams don’t spend budget they can see, nearly as readily as they spend time they can’t.
The other thing the single buffer does is concentrate schedule pressure. If every iteration has a cushion, nobody feels the urgency of the critical path. If the only buffer sits at the end, the cost of slipping any critical-path item is immediately visible. That’s uncomfortable, but it’s accurate, and accurate discomfort is more useful than comfortable imprecision.
Probabilistic forecasting
There is a better answer to “when will we ship?” than a single date. That answer is a probability range, and it comes from the team’s actual throughput history rather than from estimates made at the start of the project.
Monte Carlo simulation treats your historical throughput data, how many tasks the team has completed per iteration or per week over the last several months, as the input and produces a distribution of likely completion dates. Instead of saying “we will ship on March 15,” you can say “based on current throughput, there’s a 75% probability we ship before March 22.” That second statement is more honest and, in practice, more useful for planning. A publisher who knows there’s a 25% chance of slipping past a date can make contingency plans. A publisher who was told “March 15” and finds out in February that it’s now April has nothing to work with.
I covered the mechanics of this approach in detail, including the specific tools and how to read the outputs, in From Meteorology to Project Management: The Power of Predictive Modelling, Part 2. If probabilistic forecasting is new territory, that post is the right place to start. What I’ll add here is that the confidence level you quote matters, and it should vary by audience. When you’re talking to the team about iteration planning, 50% confidence is a useful working target. When you’re quoting a date to a publisher or committing to a certification window, you want to be at 85% or higher. That difference is calibrating risk to consequence.
Presenting the schedule to different audiences
The version of the schedule you show to a publisher is not the version on the project board. The distinction is about signal-to-noise.
Technical teams need granularity. Which features are in which iteration. Which dependencies are in flight and where the handoffs happen. What’s been cut, what’s been added, and what’s still under discussion. The project board is their instrument panel and it needs to be accurate to be useful. A summary-level view strips out the information they need to do their jobs.
Executives need the headline range and the key dates. They are reading the schedule to understand whether the project is on track and whether the external commitments are safe. They do not need to see every task; they need to see the milestones, the current health indicator, and whether anything has changed since last time. The more detail you put in front of an executive, the less they see the signal. They start asking questions about the tasks rather than the delivery risk, and the conversation moves away from where it needs to be.
Publishers and platform stakeholders have a third set of requirements. They’re looking at certification windows, marketing beats, and whether the date in the contract still has a reasonable probability of holding. What they need is a clear statement of the key dates, the confidence level behind them, and a brief account of any risks that could move those dates. A publisher who receives a three-page Gantt chart learns less from it than a publisher who receives a single page with four dates, three risk items, and a current confidence rating.
The mistake producers make is building one schedule and sending it to everyone. The project board is not a stakeholder communication. It’s an internal instrument. Build the communication layer on top of it rather than pointing people at the board and hoping they’ll find what they need.
The practical approach is to maintain one source of truth at the task level, which the team uses, and to build audience-specific views on top of it for every other conversation. That separation takes about twenty minutes to set up and saves hours of misaligned expectations.
Reading the warning signs
By the time a milestone review tells you the schedule is in trouble, you’ve already missed your window to respond. The signals come earlier, and they’re readable if you know what to look for.
Velocity dropping is the most visible indicator. If the team has been averaging 20 stories per iteration and the last two iterations came in at 12 and 11, that’s not a bad week. That’s a pattern, and it’s pointing at something. The cause might be technical: something in the codebase has become harder to work with and estimates are growing. It might be people: someone key is distracted, stuck, or burning out. It might be process: a dependency with another team has created a bottleneck that’s not showing up in the task list. Whatever the cause, three iterations of dropping velocity is a schedule problem in progress, and you want to know about it before the milestone review does.
Tasks reopening is a close second. When work that was marked done comes back as incomplete, it means either the definition of done is unclear or quality issues are emerging that weren’t caught on the first pass. Either way, the time estimate for that work was wrong, and everything downstream of it needs adjusting.
Late estimates lengthening is harder to spot but important. When the team starts giving longer estimates for work in the current iteration than they did for equivalent work two months ago, it means they’re experiencing something that’s making the work harder. That could be scope creep absorbed into existing tasks, technical debt accumulating, or simply the fact that later-game integration work is almost always harder than early-project greenfield work. The estimates are trying to tell you something.
The leading indicator I’ve learned to watch most carefully is QA finding significantly more defects than expected in the last third of an iteration. That pattern, early progress followed by a late spike in issues, tends to mean that tasks are being marked complete before they’re genuinely integrated. The work is done in isolation but it hasn’t been tested against the rest of the build. When I see that pattern two or three iterations in a row, I start the investigation before the milestone review, not during it.
None of these signals require sophisticated tooling to spot. They require looking at the same numbers week over week and noticing when something changes. The status report discipline I covered in Production 101 #9 is part of what makes this possible: when you’re tracking the same metrics every week, the deviations are visible. When you’re writing status reports reactively, reconstructing the week from memory, the early signals tend to get smoothed over.
The question is not what to do when a milestone review confirms the schedule has slipped. The question is what to do when the leading indicators appear, when there’s still enough of the schedule left to respond. That’s a harder problem to solve, but it’s the one that actually matters.



