Likelihood Ratios vs Relative Likelihood Ratios
These two terms are often used interchangeably — but they mean different things. Getting the distinction right matters, particularly when presenting evidence in professional or legal contexts.
Likelihood Ratio (LR)
A Likelihood Ratio expresses the probability of an outcome for a given group — specifically, the ratio of outcomes to the population at risk. It is the group's rate.
Each group has its own LR. On its own, it tells you how common the outcome is within that group — but not how that compares to anyone else.
Relative Likelihood Ratio (RLR)
The Relative Likelihood Ratio is the comparison between two groups' likelihood ratios. It answers the question: how many times more likely is this outcome for Group A than for the reference group?
The reference group (Group B) is typically the majority or comparator group — e.g. White British, Male, Non-disabled. The RLR is the number that appears in equality reports, tribunal evidence, and PSED monitoring data.
Reading the RLR — Direction Matters
An RLR can fall above or below 1.0, and whether that is a problem depends entirely on the outcome being measured. For a negative outcome (disciplinary, use of force, dismissal), a high RLR signals the focus group is disadvantaged. For a positive outcome (promotion, training access, commendations), a low RLR signals the same thing. Always ask: which direction represents harm for this group?
| RLR Value | What It Means | Concern Level |
|---|---|---|
| < 0.80 | Focus group substantially less likely — worth checking; may reflect under-recording or genuine under-use | Note |
| 0.80 – 1.25 | Within the four-fifths rule tolerance — broadly proportionate (see note below) | Acceptable range |
| 1.25 – 1.5 | Exceeds four-fifths threshold — adverse impact identified, warrants investigation | Investigate |
| 1.5 – 2.0 | 50–100% more likely — moderate to significant disparity | Significant |
| 2.0 – 3.0 | 2–3× more likely — serious disparity, action required | Serious |
| 3.0+ | 3× or more likely — severe disparity, urgent review needed | Critical |
| RLR Value | What It Means | Concern Level |
|---|---|---|
| > 1.25 | Focus group more likely to access the positive outcome — may indicate over-representation worth monitoring | Note |
| 0.80 – 1.25 | Within the four-fifths rule tolerance — broadly proportionate | Acceptable range |
| 0.50 – 0.80 | Exceeds four-fifths threshold — focus group substantially less likely to benefit; adverse impact identified | Significant |
| < 0.50 | Focus group less than half as likely to access the positive outcome — serious barrier indicated | Serious |
⚖️ The Four-Fifths Rule (0.80 / 1.25)
The most widely used rubric in UK workforce equality monitoring is the four-fifths rule, formalised in the NHS Workforce Race Equality Standard (WRES). It defines adverse impact as any RLR below 0.80 or above 1.25. Note that 1.25 = 1 ÷ 0.80 — the thresholds are reciprocals, making them symmetrical around 1.0.
These are rules of thumb, not legal thresholds. An RLR of 1.24 is not automatically safe, and an RLR of 1.26 is not automatically unlawful. What the rubric does is provide a consistent, defensible trigger for further investigation — and a common language across organisations.
Different sectors use slightly different thresholds (some policing contexts use 2.0 as the trigger for formal scrutiny), so always clarify which rubric applies in your context.
📊 Statistical Significance: Don't Overlook Small Numbers
A large RLR can be misleading when the underlying numbers are small. If a group has only 10 members and 3 face a particular outcome, the RLR may appear alarming — but with such a small sample, the result is highly volatile and unreliable. A different year, or a single additional case, could change the figure dramatically.
This is where chi-square (χ²) analysis is valuable. A chi-square test assesses whether the difference between observed and expected frequencies across groups is statistically significant — i.e. whether it is likely to reflect a real pattern rather than random variation.
Best practice is to report both the RLR and the statistical significance: an RLR of 2.0 that is not statistically significant (perhaps because n=15 in one cell) tells a very different story to an RLR of 1.4 that is highly significant across 500 cases. Chi-square is accessible in practice — it can be calculated directly in Excel using CHISQ.TEST on a simple 2×2 table of outcomes vs non-outcomes for each group. Some statistical software will also produce confidence intervals around the RLR, which communicate the same uncertainty more explicitly if needed.
Equality Examples
Relative Likelihood Ratios appear across the full spectrum of equality data — from policing outcomes to workplace progression. Here are three common contexts.
Stop & Search Disproportionality
Stop and search rates in England and Wales are recorded by ethnicity per 1,000 population. The ratio of a group's rate to the White rate is the Relative Likelihood Ratio (RLR) — or in this context, the Disproportionality Rate.
(7 per 1,000)
(38 per 1,000)
(Black ÷ White)
Figures illustrative based on published Home Office data patterns. Reference group = White. RLR = Black LR ÷ White LR.
What does this RLR tell us?
A Black person is approximately 5–6× more likely to be stopped than a White person. This doesn't tell us whether individual stops were justified — but it tells us the distribution of policing activity is highly unequal, and that a lawful, proportionate explanation is required. In the absence of one, this constitutes evidence of indirect discrimination under the Equality Act 2010.
Misconduct & Disciplinary Proceedings
In organisations, Relative Likelihood Ratios can reveal whether employees from certain groups face disciplinary action at disproportionate rates — even when controlling for performance or seniority.
| Group | Employee Count | Disciplinaries | Rate per 100 | RLR |
|---|---|---|---|---|
| White | 320 | 16 | 5.0% | 1.0 (reference LR) |
| Asian | 80 | 6 | 7.5% | 1.5× |
| Black | 60 | 7 | 11.7% | 2.3× |
| Mixed / Other | 40 | 4 | 10.0% | 2.0× |
Promotion & Progression
RLRs can be applied in reverse — when the outcome is positive (e.g. promotion), an RLR below 1.0 for a group means they are less likely to progress, revealing potential barriers.
| Group | Applied | Promoted | Success Rate | RLR |
|---|---|---|---|---|
| Male (non-disabled) | 80 | 32 | 40% | 1.0 (reference LR) |
| Female (non-disabled) | 70 | 21 | 30% | 0.75× |
| Male (disabled) | 30 | 9 | 30% | 0.75× |
| Female (disabled) | 20 | 4 | 20% | 0.50× |
Relative Likelihood Ratio Calculator
Enter the outcome counts and population sizes for two groups. The calculator derives each group's Likelihood Ratio, then divides them to produce the Relative Likelihood Ratio.
Group A (Focus Group)
Group B (Reference Group)
Simpson's Paradox
A statistical phenomenon where a trend that appears in aggregated data reverses — or disappears — when the data is broken down into subgroups. In equality work, it can both hide and reveal discrimination.
The Classic Problem
Imagine an organisation where, in aggregate, women appear to be promoted at a much lower rate than men. Does this mean women are being treated unfairly within the organisation's promotion processes? Not necessarily — and the answer matters, because it determines the right intervention.
If women are concentrated in departments with lower overall promotion rates, and men are concentrated in departments with higher overall promotion rates, the aggregate can make the overall picture look worse than what is happening in any individual department — even if women are actually favoured within every department.
| Department | Gender | Applied | Promoted | Rate | RLR (W vs M) |
|---|---|---|---|---|---|
| Operations (lower promo rate) |
Men | 10 | 1 | 10% | 1.50 ✓ Women favoured |
| Women | 40 | 6 | 15% | ||
| Strategy (higher promo rate) |
Men | 40 | 20 | 50% | 1.20 ✓ Women favoured |
| Women | 10 | 6 | 60% | ||
| All combined | Men | 50 | 21 | 42% | 0.57 ✗ Women appear disadvantaged |
| Women | 50 | 12 | 24% |
Interactive Demonstration: Gender Pay Gap in Policing
In policing, officers are generally paid more than police staff — and officers are more likely to be men, while staff are more likely to be women. Toggle between views to see how this shapes the headline pay gap figure.
📅 A Real-World Twist: Year-on-Year, Everything Got Worse — Yet the Headline Improved
The officers/staff dynamic creates a further paradox when tracked over time. In the following illustrative example — based on a real pattern observed in force-level GPG reporting — the pay gap worsened within both officers and staff year-on-year, yet the aggregate headline figure improved.
| Group | Headcount | Men avg pay | Women avg pay | GPG |
|---|---|---|---|---|
| Officers | 300 (75% M, 25% F) | £22.00 | £21.00 | 4.5% |
| Staff | 200 (30% M, 70% F) | £14.00 | £13.00 | 7.1% |
| All combined | 500 | £20.32 | £15.79 | 22.3% |
| Group | Headcount | Men avg pay | Women avg pay | GPG |
|---|---|---|---|---|
| Officers | 330 (68% M, 32% F) | £22.50 | £20.79 ↓ | 7.6% ↑ worse |
| Staff | 200 (30% M, 70% F) | £14.50 | £13.20 | 9.0% ↑ worse |
| All combined | 530 | £20.82 | £16.45 | 21.0% ↓ better |
How Can This Be True?
The officer pay gap widened because 30 new women joined at entry-level officer pay (£19.00), pulling the average women's officer pay down — even though each individual woman is paid in line with her male equivalent at the same point on the scale. The staff gap widened for separate reasons. Yet the aggregate gap narrowed, because those 30 new female officers — despite being at the bottom of the officer pay scale — still earn more than most staff. Their arrival shifted the overall composition of the female workforce upward, improving the aggregate figure while masking deterioration in both underlying groups.
A headline improvement can therefore coexist with — and actively conceal — worsening conditions in every subgroup. This is precisely why aggregate GPG reporting, mandated under the Equality Act 2010, tells only part of the story.
Simpson's Paradox in Policing: Use of Force
The same paradox can appear in use-of-force data. The key is understanding not just the rates, but the volumes behind them — specifically, which groups are concentrated in which encounter types.
In this illustrative example, vehicle stops involve higher force rates for everyone, and Black people are concentrated in foot patrol; White people are concentrated in vehicle stops.
| Context | Group | Encounters | Force used | Rate (LR) | RLR |
|---|---|---|---|---|---|
| Foot patrol lower force rate type |
White | 200 | 10 | 5.0% | 1.8× Black higher |
| Black | 800 | 72 | 9.0% | ||
| Vehicle stops higher force rate type |
White | 800 | 120 | 15.0% | 1.33× Black higher |
| Black | 200 | 40 | 20.0% | ||
| All combined | White | 1,000 | 130 | 13.0% | 0.86× Black appears LOWER |
| Black | 1,000 | 112 | 11.2% |
Key Takeaways
What practitioners need to know when working with relative likelihood ratios and aggregated equality data.
✅ On Relative Likelihood Ratios
An RLR quantifies disparity — it is the ratio of two groups' Likelihood Ratios, telling you how many times more (or less) likely an outcome is for one group compared to a reference group.
It is a starting point, not a verdict. An RLR of 3.0 demands explanation. It can be lawfully justified by a proportionate means to a legitimate aim — but the burden to demonstrate this increases with the size of the ratio.
Intersectionality matters. Run RLRs for combinations of protected characteristics, not just single-axis comparisons. The greatest disparities often appear at the intersections.
RLRs below 1.0 matter too. When the outcome is positive (promotion, development, pay), an RLR below 1.0 signals under-representation, not safety.
⚠️ On Simpson's Paradox
Never trust aggregate-only data in equality analysis. Always ask: what confounding variable might be lurking in the structure of the data?
Common confounders in equality data include: job level/grade, length of service, department, geographic location, shift pattern, and contract type.
The paradox can work both ways. It can make discrimination invisible (the GPG example), or it can make a neutral system appear discriminatory. Both distortions are dangerous.
Causation vs. composition. Always ask: is this gap because of how people are treated within groups, or because of how groups are distributed across categories? The answer determines the intervention.
The Practitioner's Rule
Report both the aggregate and the disaggregated figures. When they tell different stories, that difference is itself the finding — and often the most important one.
Figures used in this explainer are illustrative and based on published data patterns. They are intended to demonstrate analytical concepts, not to represent exact current statistics.