IWRI Methodology
Last updated: 7 May 2026
The Invisible Work Risk Index (IWRI) is a screening tool. It analyses sprint and issue data already in Jira to surface Scrum teams whose workflow patterns suggest hidden invisible work, and points you to where to start a conversation: which team, which sprint, which signal. The goal is to help you ask better questions about effort that is not showing up in the numbers.
IWRI rests on three principles:
- Relative, not absolute. Every score is measured against the population on your own site, not an industry average.
- Screening, not diagnosis. A flag indicates a pattern worth investigating. It is not a verdict.
- Conversation starter. Flags are designed to start conversations, not drive performance decisions.
1. Where the seven signals came from
Every team does work that never makes it into Jira: mentoring juniors, responding to incidents, fixing things before they become tickets, answering questions in Slack. That “invisible work” drains capacity, skews estimates, and burns people out. The research question was: can you detect it from the data teams already have?
We studied hundreds of Scrum teams across multiple organisations in banking, telecommunications, and other sectors. Each team rated their own volume of invisible work through structured surveys. Team leads were then interviewed to map what kinds of invisible work they experienced and where it showed up in their workflow. Over 20 candidate indicators (patterns in sprint data, ticket behaviour, and activity logs) were tested against this self-reported data. Indicators that did not show a consistent, repeatable pattern were removed.
Seven indicators survived. Each captures a different facet of how invisible work manifests in team data, and each has a documented split-half reliability score between 53 and 86 percent.
2. Pipeline at a glance
IWRI transforms raw Jira data into risk signals through a six-stage pipeline. Each stage builds on the previous, ensuring that the final risk assessment is grounded in measurable, standardised data rather than subjective judgement.
- Discover projects on the Jira site.
- Apply eligibility filters (see the Eligibility page).
- Pull sprint and issue data for each eligible project.
- Compute the seven raw indicators per team.
- Standardise each indicator into a z-score against the site population.
- Translate z-scores into flags and risk levels.
3. What we analyse
IWRI examines completed sprint data for eligible projects. The analysis window and filters ensure statistical relevance while capturing recent team behaviour.
| Parameter | Value |
|---|---|
| Sprint window | Last 8 completed sprints within 90 days |
| Minimum sprint duration | 7 days per sprint |
| Issue types | Story, Task, Bug (not sub-tasks or epics) |
| Commitment window | 2 calendar days from sprint start |
| Population minimum | 20 eligible projects required for scoring |
| Team size | 3 or more distinct assignees |
4. Eligibility
Only projects that meet every eligibility criterion are scored. Projects that fail one or more are excluded from the scored population. They still appear in filter dropdowns (greyed out, marked “Excluded”) but do not contribute to or receive scores.
The full criteria set, including the population minimum of 20 eligible projects, is documented on the Eligibility page.
5. The seven indicators
Each indicator measures a specific dimension of team workflow behaviour. The first three are averaged into a composite called Completing Less Than Planned. The remaining four are assessed independently, and each powers its own flag.
Sprint-delivery indicators (feed Completing Less Than Planned)
| Indicator | What it measures | Formula |
|---|---|---|
| Velocity Stability | How inconsistent a team’s completed work is relative to total demand. High residual means velocity swings more than demand, a sign of capacity instability. | CV(velocity) − CV(demand) |
| Commitment Gaps | The average shortfall between what a team commits to and what they deliver. Over-delivery is clamped to zero; this only measures under-delivery. | mean( max(0, (committed − completed) / committed) ) |
| Work Left Undone | The proportion of committed work that was never started by the time the sprint closed. A high value suggests overcommitment or displacement by unplanned work. | mean( notStartedCount / committedCount ) |
Independent signals (each fires its own flag)
| Indicator | What it measures | Formula |
|---|---|---|
| Tracking per Person | Issues tracked per person per week. Teams tracking fewer items per developer than peers may have significant work happening outside Jira. Sign-flipped: low values indicate higher risk. | mean( totalIssues / teamSize / sprintWeeks ) |
| Daily Jira Activity | The proportion of working days with zero Jira activity. Higher values suggest potential disengagement or work happening elsewhere. | mean( zeroActivityDays / totalWorkingDays ) |
| Ticket Descriptions | The proportion of completed tickets with empty descriptions. Sparse descriptions often indicate rushed creation or verbal-only requirements. | mean( emptyDescriptionCount / completedCount ) |
| Mid-Sprint Changes | How unpredictable the volume of mid-sprint additions is across sprints. High volatility suggests inconsistent sources of interruption that destabilise planning. | CV( midSprintAdditions ) |
6. Z-score standardisation
Raw indicator values are measured in different units: percentages, ratios, coefficients of variation. Z-score standardisation makes them comparable by expressing each value as standard deviations from the population mean.
z = (raw_value − population_mean) / population_stddev
A z-score of 0 means the team is exactly average. A z-score of 1.5 means the team is 1.5 standard deviations above the population, in the direction of higher risk.
| Band | Range | Interpretation |
|---|---|---|
| Normal | z ≤ 0 | At or below the population average. No flag. |
| Low concern | 0 < z ≤ 1 | Above average but below the flag threshold. Informational only. |
| Moderate | 1 < z ≤ 2 | Flag fires. The team is meaningfully above the population average. |
| High | z > 2 | Flag fires. The team is in the top few percent on this indicator. |
Note on Tracking per Person: this is the only indicator where the z-score is sign-flipped. Lower density means higher risk (work may be happening outside Jira), so the z-score is multiplied by −1 to maintain the convention that positive z = higher risk.
7. How Completing Less Than Planned is computed
Completing Less Than Planned is the only signal built from a composite of three z-scores: Velocity Stability, Commitment Gaps, and Work Left Undone. The three are averaged with equal weight. The same value serves two purposes: it appears as an indicator on the team drill-down, and it fires as a flag when it exceeds 1.0.
CompletingLessThanPlanned = mean( z_velocityStability, z_commitmentGaps, z_workLeftUndone )
Partial computation: if only one or two of the three components are available (for example, due to insufficient sprint data for one indicator), Completing Less Than Planned is computed from the available components. The UI shows “N of 3 components” so you can interpret accordingly.
8. The five risk flags
Flags are binary signals derived from z-scores. A flag is activated when the relevant z-score exceeds 1.0, meaning the team is more than one standard deviation above the population average for that risk dimension (roughly the 84th percentile). The threshold is deliberately strict: requiring a team to be well above the mean (not just above it) ensures that multiple converging flags produce a meaningful risk signal, not just noise.
| Flag | Trigger | What it suggests |
|---|---|---|
| Capacity Drain | Completing Less Than Planned > 1 | The team consistently completes less than planned: velocity is unstable, commitments are missed, and planned work is left unstarted. |
| Low Ticket Volume per Person | z(Tracking per Person) > 1 (sign-flipped) | The team tracks fewer items per developer than peers, suggesting significant work may be happening outside Jira. |
| Low Day-to-Day Jira Activity | z(Daily Jira Activity) > 1 | The team has more zero-activity days than peers. May indicate disengagement, burnout, or work in other tools. |
| Tickets Lacking Descriptions | z(Ticket Descriptions) > 1 | A higher proportion of completed tickets lack descriptions, suggesting rushed creation or verbal-only requirements. |
| Unpredictable Mid-Sprint Changes | z(Mid-Sprint Changes) > 1 | The volume of unplanned work added mid-sprint varies unpredictably, undermining sprint planning. |
9. Risk classification
Risk level is determined solely by the number of active flags. The more flags triggered simultaneously, the less likely all of them are caused by benign confounders. Converging signals increase confidence that invisible work patterns are present.
| Active flags | Risk level |
|---|---|
| 0 | Healthy |
| 1 | Watch |
| 2 | Elevated |
| 3 | Concerning |
| 4 | At Risk |
| 5 | High Risk |
10. Known limitations
IWRI is designed for transparency. These are the known limitations and assumptions built into the methodology.
| Limitation | What it means |
|---|---|
| Screening, not diagnosing | Every indicator has alternative explanations. A flag indicates a pattern worth investigating. It does not confirm invisible work is present. |
| Relative, not absolute | Z-scores are relative within your scored population. A team with z = 1.5 is above average for your organisation, not the industry. Cross-organisation comparison is not supported. |
| Adapted teams may be missed | Teams that have already accommodated invisible work into their practices (for example by inflating estimates) may not trigger flags, because their patterns appear “normal” within the population. |
| Persistence, not prediction | Split-half reliability demonstrates that patterns persist across sprint halves, but this is not predictive or construct validity. |
| Uniform sizing assumption | Issue-count-based indicators assume approximately uniform issue sizing. Teams with highly variable story point distributions may see skewed results. |
| Team-size confound | Daily Jira Activity correlates with team size. Smaller teams naturally have more zero-activity days. This known confound is flagged in the UI but not adjusted for. |
| Internal comparison only | IWRI scores are meaningful within a single organisation’s scored population. Comparing scores across different organisations or Jira instances is not valid. |