Recovery: Readiness Scores Compared
Whoop, Garmin, and Oura use HRV-anchored algorithms with 24-hour update cycles; Düking et al. 2018 found low-to-moderate agreement between wearable readiness scores and gold-standard lab markers.
| Measure | Value | Unit | Notes |
|---|---|---|---|
| HRV measurement agreement (vs. ECG) | r=0.82-0.96 | correlation | Photoplethysmography (PPG) wrist-based HRV approximates ECG-derived HRV; accuracy varies by motion and skin tone |
| Readiness score vs. lab performance | Low-to-moderate | agreement | Düking et al. 2018 found wearable readiness indices do not reliably predict same-day performance test outcomes |
| Oura sleep stage accuracy | ~79 | % vs. PSG | Polysomnography comparison from Altini & Kinnunen 2021; best among consumer wearables |
| Whoop Recovery update frequency | Every 24 | hours | Updates after each sleep period; requires consistent sleep data for accurate daily score |
| Garmin Body Battery range | 5-100 | points | Proprietary energy reservoir model based on HRV, sleep, and activity; recharges during sleep, depletes with activity |
| Oura Readiness score range | 0-100 | points | Composite of resting HR, HRV, body temperature, sleep, and activity balance factors |
Wearable readiness scores give athletes a daily number attempting to answer: ‘How recovered am I?’ The algorithms behind them differ significantly in inputs and design.
Düking et al. (2018 — PMID 29742032) evaluated wearable monitoring devices in athletes and found low-to-moderate agreement between wearable-derived readiness indices and gold-standard laboratory markers of physiological state. No consumer device reliably predicts same-day performance test outcomes. What they do well is track relative trends over time within an individual.
Device Comparison Table
| Device | Algorithm Inputs | Update Frequency | Validation Studies | Reliability | Key Limitation |
|---|---|---|---|---|---|
| Whoop Recovery | HRV (rMSSD), RHR, sleep duration/stages, respiratory rate | Every 24h (post-sleep) | Limited; primarily internal | Moderate within-subject | Requires consistent wear; no display |
| Garmin Body Battery | HRV, RHR, sleep, accelerometer (activity drain) | Continuous (depletes in real-time) | Limited independent studies | Moderate | Activity model is proprietary; poor with shift work |
| Oura Readiness | HRV, RHR, body temperature, sleep stages, activity balance | Every 24h (post-sleep) | Most published (Altini & Kinnunen 2021) | Moderate-to-good | Ring fit affects PPG accuracy |
| Apple Health readiness | RHR trend, HRV trend, sleep, walking HRV | Daily; limited synthesis | Minimal peer-reviewed data | Low-to-moderate | No unified readiness score; fragmented |
| HRV4Training (app) | Morning camera HRV, subjective wellness survey | Daily (manual measurement) | Plachta et al. 2022 — strongest independent validation | Good for HRV trends | Requires active morning routine; no wearable passivity |
| Manual RMSSD (chest strap) | Single-lead ECG via Polar H10 | Daily (60-second morning measurement) | Gold standard consumer method | High — matches clinical ECG closely | Requires dedicated hardware + app (Elite HRV, etc.) |
Algorithm Inputs in Depth
All four systems anchor primarily on HRV — specifically rMSSD (root mean square of successive differences), which reflects parasympathetic tone. Resting heart rate trend is the secondary input in all systems. Beyond these, the approaches diverge: Oura adds wrist skin temperature (a reliable infection/stress indicator), Garmin uses real-time activity data to model energy depletion, and Whoop weights respiratory rate during sleep as a physiological stress indicator (Flatt & Esco, 2016 — PMID 26964014).
Sleep stage accuracy matters because all systems use sleep quality as a readiness input. Altini & Kinnunen (2021 — PMID 33348753) found Oura’s sleep stage detection achieved approximately 79% agreement with polysomnography — the best published result among consumer wearables, though still imperfect.
How to use this data: Readiness scores are most valuable as rolling trend indicators, not daily pass/fail signals. Track a 7-day moving average. Adjust training intensity for trends, not single-day scores. Combine the readiness number with subjective feel (morning mood, motivation, soreness) — when both are low, a reduced session is clearly warranted. When they disagree, subjective feel often carries equal weight for same-day decisions.
Related Pages
Sources
- Düking et al. 2018 — Comparison of Monitoring Devices for Measuring Physical Activity in Athletes
- Altini & Kinnunen 2021 — The Promise of Sleep: A Multi-Sensor Approach to Accurate Sleep Stage Detection Using the Oura Ring
- Flatt & Esco 2016 — Smartphone-Derived Heart Rate Variability and Training Load in a Women's College Soccer Team
Frequently Asked Questions
Which wearable readiness score is most accurate?
No consumer wearable has been validated against gold-standard performance metrics with consistently strong results (Düking et al. 2018). Oura has the most published sleep validation data. Whoop has more athlete-focused training load integration. Accuracy varies by individual physiology, skin tone, and consistency of wear. None should be used as a sole training decision tool.
Can I use two devices simultaneously to cross-validate?
Cross-validation is useful for identifying consistent signals. If both Whoop and Oura show low readiness on the same morning, the signal is more reliable. Disagreement between devices on a given day is common and normal — treat it as data uncertainty rather than conflicting ground truth.
How much should I adjust training based on a low readiness score?
A single low score warrants attention, not automatic deload. A trend of 3+ consecutive low scores is more meaningful. The research on using readiness scores for training modification shows mixed results — athletes who adjust training responsively based on HRV-anchored scores tend to perform slightly better over multi-week blocks, but the evidence is not strong for single-session decisions.
Does the Garmin Body Battery measure recovery differently from HRV-based scores?
Yes. Body Battery uses a proprietary energy reservoir model that depletes with activity (using accelerometer and heart rate data) and recharges during sleep (using HRV). It is more activity-context-aware than pure HRV scores but less directly tied to autonomic nervous system state. It integrates more data types but with less physiological specificity.
What is the biggest limitation of all these scores?
All readiness scores are backward-looking — they summarize recovery from recent stress. They do not measure readiness for a specific type of future effort. A high readiness score does not mean optimal performance for a maximal power session; it means recovery from recent load is good. Training context and accumulated fatigue over weeks require human interpretation beyond any single daily score.