At each stage of the modeling process, the forecaster must make assumptions about future changes in the population and economic characteristics of the region, and how people will respond to changes in travel times and costs. The output of one step in the modeling process is the input to the next. Thus, even small differences in assumptions (or math errors or typos) can be magnified with each step, having a large effect on the total ridership estimate.
Troublingly, forecast errors may not always be the result of technical errors. Forecasters may intentionally introduce error into forecasts in response to implicit or explicit pressure from clients or employers who wish to see proposed projects cast in a favorable light. In some cases, the distinction between mistakes and deliberate distortions might be unclear. If the ridership forecast for a project is surprisingly low, the forecaster (whose client or employer might hope the forecast will justify the project) can analyze the model to determine whether the low ridership forecast is the result of an error. However, if a ridership forecast is surprisingly high, the forecaster might just accept it as good news, rather than expending resources to look for an error. According to one forecaster I interviewed:
Despite our best efforts, sometimes there are errors. … As we’re doing these projects, even though they take years to go through the planning process, it seems like every time … we need a decision made and we’re putting together the data, things get rushed. … And it seems like every time we do a new model run, we find something that we were missing before. … They’re not just tweaks, but they’re catching omissions or errors. … Those are the types of quick things that should be done regardless of where you are, but just because of time constraints, you may not focus on them unless you’re running into issues with [low ridership].
Transit planners may lack motivation or resources to rigorously detect and correct for modeling errors, and these potentially flawed models are often reused for many projects over time. Given these factors, Pickrell’s finding that ridership forecasts were often wildly inaccurate comes as no surprise.
Looking beyond ridership
In reaction to the demonstrated failures of ridership forecasts and subsequent attempts to improve their accuracy, some practitioners argued that emphasizing ridership as the sole or central measure of a project’s potential benefits might be misplaced. Couldn’t new transit projects generate benefits beyond just attracting new riders? Certainly. Transit projects might contribute to economic development or congestion relief, for example. But the economic development benefits of transit flow largely from people riding transit — to work or shop at local businesses. Congestion relief is likewise achieved when travelers choose to ride transit rather than drive their own vehicles.
But not all transit project benefits flow from added riders. Transit projects can improve service for existing riders, which isn’t measured by the number of new riders. In 2001, the FTA introduced Transportation System User Benefits (TSUB) to replace ridership as the primary measure of benefits in proposed transit projects. This new measure combined the projected travel time savings for existing riders with the number of new riders to produce a dollar value of the total project benefits. Although the logic behind this calculation had a firm theoretical basis in microeconomics, it was less intuitive for those without an economics background. Relative to a simple ridership metric, the TSUB metric was hard to understand or succinctly explain.
One transit manager described how the ranking of transit projects changed when travel time savings were incorporated into the measure of project benefits:
When I started working in project development, it was a pretty simple calculation of cost versus ridership. … Then FTA changed that to look at user benefits. And user benefit measured whether there were travel-time savings that happen from the project. When they went from just riders to that travel-time saving measure, it really changed the kinds of projects that could qualify for New Starts funds. It really benefitted long-haul light-rail projects. It benefitted commuter rail projects. Streetcar projects didn’t really show particularly well because they’re not really saving anybody travel time.
On the other hand, a consultant observed that, perhaps surprisingly, projects performing well by one measure generally performed well by the other:
We did a little exercise to see how cost per hour of user benefit … correlated with cost per project trip, just to see if it really changed the playing field. To my surprise, it really didn’t. The ones that were good under the old measure are still good under the new measure. So maybe it’s okay.
If selecting projects based on the projected number of new riders produces roughly the same outcome as a complicated user benefits measure, then why bother with the more complicated measure? Ultimately, the complexity of the TSUB metric was its undoing. One transit manager I interviewed described how he never really understood what the TSUB measure was supposed to represent, even after sitting down with economists to have them explain it to him. Another explained how the concepts behind the TSUB measure were so complicated that when Congress passed the Moving Ahead for Progress in the 21st Century Act (MAP-21) — its surface transportation bill in 2012 — they required FTA to abandon TSUB in favor of a return to a simpler measure of ridership.
I think the switch to user benefits was significant because it did try to capture all of the transportation benefits of a project, not just new riders. But in part it was the seeds of its own undoing because it got really complicated and it required sophisticated modeling. … In MAP-21, Congress said “Enough of that! That’s too complicated!” I think this administration was also trying to step away from that kind of a measure. We ended up with cost per project trip, which to me is a step backwards because I don’t think that measure is a particularly good indicator of benefit at all. You can have a lot of people riding on a project, but are they better off?
In the end, TSUB may have been a more complete measure of project benefits than ridership alone, but it was too complicated to convey those benefits in a meaningful way. On the other hand, it was complicated enough to address a major fault of using ridership as a measure of future project benefits: a lack of confidence in ridership forecasts.
Summit saves the day
The TSUB metric wasn’t just complicated to understand — it was complicated to calculate, especially for a forecaster who had become accustomed to treating a travel demand model as a “black box.” To make things easier, the FTA introduced a software package called Summit in 2003 to assist project sponsors in calculating the TSUB metric.
An FTA staff member described how the introduction of the Summit software package was a watershed in evaluating the underlying assumptions used in ridership forecasts. Although the purpose of Summit was to assist project sponsors in computing travel time savings, it had the additional effect of providing greater transparency about a travel demand model’s underlying assumptions:
An ancillary, but it turned out — in my view anyway — a more important result of the Summit software was that, for the very first time, it produced detailed reporting of the ridership forecasts. That was the equivalent of shining a light into a really dark box, and there was all sorts of pretty ugly stuff going on that you would normally have a very hard time finding because of the complex nature of ridership forecasting. … There were all sorts of unintended things happening. And all of a sudden, the ridership stuff got a lot more rigorous.
A consultant confirmed that, although Summit was no longer used after the requirement to report travel time savings was discontinued by FTA, it had a permanent effect on the accuracy of ridership forecasts. Summit allowed modelers to correct persistent errors in their models and improve their knowledge and understanding of travel-demand modeling:
[Summit] was a big game-changer because you could actually identify and describe the major problems that the model had. Previously, you would just be lost and swimming in too many numbers, and you couldn’t actually figure out what the hell was going on except at a very deep level. Summit allowed you to look at it … and actually see that the model’s not doing a very good job at all. It found that the model-development practice and model-application practice was pretty bad. And I would say bad to the point of being almost criminal or fraudulent. … That was a complete watershed moment. We unlearned more about what we knew than I had ever learned. I’ve learned more about forecasting in the last 13 years than I had in the previous 13, by a country mile.