Your Watch Doesn’t Know How Much Recovery You Need

Updated July 10, 2025 02:26PM

A recurring joke among my training partners these days is the advice their GPS watches give them when we finish an easy run. “You need 42 hours of recovery,” the algorithm tells them, even though they’re not even breathing hard. Hard interval workouts, on the other hand, don’t seem to impress the watches.

Where do these numbers come from? The algorithms are proprietary (Garmin’s, for example, are provided by a Finnish company called Firstbeat Analytics, which it bought in 2020), but they’re based on a concept that’s taking on greater importance in the age of ubiquitous wearable devices: training load. Athletes used to track their training with a hodgepodge of different variables: miles run, duration of session, average power, heart rate, and so on. Now that detailed second-by-second training data can be recorded and analyzed with minimal effort, they’re combining multiple variables into a single measure of how much stress a given workout imposed on their body. The approach has great potential to fine-tune the delicate balance between training and recovery—but, according to a recent study in the International Journal of Sports Physiology and Performance, it also has some fundamental flaws.

It’s clear that neither duration nor intensity, on their own, paint a complete picture of how hard a workout is. As University of Calgary sports scientist Louis Passfield and colleagues from Calgary and the University of Rome Foro Italico explain in the new paper, the idea of combining them in a single measure of training load dates back to research in the 1970s. A given dose of training, the theory went, should produce corresponding increases in both fitness and fatigue that would rise and fall at characteristic rates after each workout. Add up the accumulated fitness from all your previous workouts, subtract the accumulated fatigue, and you’ve got an estimate of how fast you can race at that moment, as well as what dose of training or rest you need next. This is the fundamental insight that underlies most modern training software.

The simplest example of a training load calculation is to multiply duration (in minutes) by intensity (your subjective evaluation of how hard the workout was, on a scale of one to ten). There are lots of alternatives that use metrics like power or pace or heart rate instead of subjective effort, that calculate dose on a second-by-second basis instead of averaged over the whole workout, or that break training down into different intensity zones. Various studies have compared different versions of training load to each other. The problem, according to Passfield and his colleagues, is that all these comparisons are circular: no one knows how to express the real training load experienced by the body, so figuring out whether one definition agrees with another doesn’t answer the most important question.

There are indeed some problems with the simple duration-times-intensity view of training load. A memorably titled paper (“Would You Rather Have One Big Rock or Lots of Little Rocks Dropped on Your Foot?”) published last year by Andrew Renfree of the University of Worcester and his colleagues illustrated some of them. For example, a demanding interval workout of two all-out 400-meter repeats (2 minutes x 10 out of 10 effort = 20) might have a lower training load than the subsequent ten minutes of easy cool-down jogging (10 minutes x 3 out of 10 effort = 30). Even with tweaks that give more weight to higher-intensity training, many versions of training load (including the ones loaded on my training partners’ sports watches) seem to overvalue duration.

So what is the real measure of how hard a workout is? Passfield and his colleagues make the case for what they call acute performance decrement, or APD. If you do some sort of exercise test, like a five-minute time trial, then do a workout, then repeat the five-minute time trial, the performance difference between the two time trials tells you exactly how much the workout took out of you. Obviously this isn’t a practical way of monitoring your training, because you can’t do a mini-race before and after every training session. But in the lab, this technique gives researchers a way of comparing the impact of different types, intensities, and durations of workout.

Here’s an example from some prior research by Passfield and his colleagues. They compared four different cycling workouts: five minutes all-out; 20 minutes all-out; 20 minutes at a submaximal pace; and 40 minutes at an even easier submaximal pace. A typical way of comparing the training load of these workouts would be to look at the total work done (TWD), which involves multiplying cycling power by time. Here’s what TWD looks like for each workout:

Even though the five-minute all-out workout was as hard as the subjects could go, it doesn’t accumulate much total work because it’s so short. The other three workouts, with varying duration and intensity, have fairly similar TWD.

In contrast, here is the APD, which is the percent slowdown in five-minute time trial from before to after each workout:

Now the results correspond better to our intuition. The two all-out sessions, regardless of whether they’re 5 or 20 minutes, cause a large slowdown. The two submaximal sessions have a much smaller impact—a convincing illustration of the potential drawbacks of overly simple training load metrics like TWD.

This still doesn’t solve the practical problem of how to track training in the real world. The authors propose another metric called %tmax, which compares how long the workout lasted to how long you could have sustained that pace before reaching exhaustion. A half-hour ride at a pace you could have sustained for five hours has a %tmax of 10 percent; a 30-minute run at your 40-minute all-out pace has a %tmax of 75 percent. Here’s more data from the same study, comparing the %tmax for the four workouts:

The %tmax data gives a pretty good approximation of the APD data. Since race pace at different distances follows a predictable curve (see here), you—or, preferably, an algorithm in your training software—can use race performances at any three distances to estimate your %tmax for workouts at any given pace. You can also generalize the approach to interval workouts: if you do ten repeats at a pace you could have sustained for 15 repeats, your %tmax for the session is 67 percent. But it breaks down for more complex multi-pace workouts, or for progression runs where the pace keeps changing.

The researchers also float some simpler options, like your subjective sense of effort or your breathing rate (which is a good proxy for subjective effort) at the end of the session. In theory, the chest straps used for heart rate monitors can measure breathing rate, and that gives an objective metric for those who don’t like relying on the seeming imprecision of feelings. In both cases, the assumption baked in is that duration on its own isn’t all that important: what matters is how breathless you are at the end thanks to whatever combination of duration and intensity you just did. That assumption may work for track cyclists, but I’d be cautious about making the same assumption for an ultrarunner returning from a five-hour run. There are forms of fatigue—and of training stimulus—that might not leave you out of breath.

Whether any of these alternate training load metrics will do a better job of guiding training remains to be seen. For now, Passfield and his colleagues’ advice is to track the duration and intensity of your training separately, and to treat with caution any insights derived from combining them into a single training load metric. I’m not quite as skeptical as they are about the usefulness of current training load models—but then again, I don’t have a fancy watch that keeps telling me to take three days off every time I run.

For more Sweat Science, join me on Twitter and Facebook, sign up for the email newsletter, and check out my book Endure: Mind, Body, and the Curiously Elastic Limits of Human Performance.