What Gets Measured Gets Managed — Except in Leadership Development: Why Most Executive Assessments Measure the Wrong Things

A few years ago I sat in on a post-programme debrief with the head of talent at a professional services firm. They had just completed a twelve-month leadership development initiative — external facilitators, 360-degree feedback, psychometric profiling, a residential module, and a final presentation to the executive committee. The cost was substantial. The question on the table was whether it had worked.

The talent director pulled up a slide. Net Promoter Score: 8.2. Participant satisfaction: 94%. "Would recommend to a colleague": 91%. She looked pleased. The executive committee looked reassured. The meeting moved on.

I asked one question: "What were the performance outcomes you were trying to change, and how did you measure them before and after?"

There was a pause. Then: "We didn't have a baseline."

The Measurement Problem Nobody Names

Leadership development is a £366 billion global industry. It is also one of the only industries in which the primary success metric is how much participants enjoyed the experience. We measure satisfaction because satisfaction is easy to measure. We do not measure the things that matter because those things are hard to measure — and because measuring them would require us to confront the possibility that what we are doing is not working.

This is not a new observation. McKinsey published research in 2019 showing that fewer than 25% of organisations report that their leadership development programmes have a measurable impact on business performance. The Brandon Hall Group found that only 8% of organisations say their leadership development programmes are "very effective." The numbers have not improved meaningfully in a decade.

The problem is not the quality of the programmes. Many of them are genuinely excellent — well-designed, well-facilitated, intellectually rigorous. The problem is that we are measuring the wrong things, at the wrong time, in the wrong way.

What the Kirkpatrick Model Gets Right and What It Misses

Most organisations that attempt to measure training effectiveness use some version of the Kirkpatrick model: reaction (did they like it?), learning (did they acquire knowledge?), behaviour (did they change how they work?), and results (did it affect business outcomes?). The model is sound in principle. In practice, most organisations stop at Level 1 or Level 2 and call it done.

The reason is not laziness. It is that Levels 3 and 4 require something most organisations do not have: a baseline. You cannot measure behaviour change without knowing what the behaviour was before the intervention. You cannot measure results impact without knowing what the results were before the intervention. And establishing a baseline requires admitting, at the outset, that the current state is the problem — which is politically uncomfortable in ways that post-programme satisfaction surveys are not.

There is also a timing problem. Most post-programme assessments happen immediately after the programme ends, when participants are energised, motivated, and full of good intentions. The research on behaviour change is unambiguous: intentions measured immediately after an intervention are a poor predictor of sustained behaviour change. The Ebbinghaus forgetting curve tells us that without active reinforcement, most of what people learn is gone within a week. Measuring immediately after is measuring the best possible moment — which is precisely the moment least representative of what will actually happen.

The Specificity Gap

The deeper problem is that most leadership assessments measure general competencies rather than specific performance constraints. They tell you that someone scores 3.2 out of 5 on "strategic thinking" or 4.1 on "communication." These scores are not useless, but they are not actionable in the way that matters.

What actually limits a senior leader's performance is almost never a general competency deficit. It is a specific constraint operating in a specific context. A CFO who struggles to influence the board is not suffering from a general communication deficit. They are suffering from a specific gap between their technical credibility and their ability to translate financial complexity into strategic narrative for a non-financial audience. Those are different problems requiring different interventions.

The assessment tools most organisations use are not designed to identify specific constraints. They are designed to produce normal distributions across a set of pre-defined competencies — which is useful for benchmarking populations but not for designing individual interventions. The result is that the assessment tells you where someone sits relative to a norm, but not what is actually limiting their performance or what would change it.

The Neuroscience of Why Self-Assessment Fails

There is a further complication that most assessment frameworks do not account for: the neurological relationship between stress, cognitive load, and self-awareness.

The prefrontal cortex — the region responsible for self-monitoring, metacognition, and accurate self-assessment — is also the region most compromised by chronic stress and cognitive overload. Senior leaders operating under sustained pressure are, by definition, the people whose self-assessment is least reliable. The 360-degree feedback process attempts to compensate for this by incorporating external perspectives, but it introduces its own distortions: rater bias, political considerations, and the tendency of direct reports to moderate their feedback in ways that protect the relationship.

Research by Tasha Eurich and her team at the University of Colorado found that while 95% of people believe they are self-aware, only 10–15% actually demonstrate the behaviours associated with genuine self-awareness. The gap is not random — it is systematically larger for people in senior positions, where the social feedback mechanisms that normally calibrate self-perception are weaker. People tell senior leaders what they want to hear. The data they receive is curated. Their internal model of their own performance drifts from reality in ways they cannot detect.

This means that any assessment process that relies primarily on self-report or on feedback from people with a relationship stake in the outcome is measuring a distorted signal. The more senior the leader, the more distorted.

What Effective Measurement Actually Requires

Effective leadership assessment requires four things that most current approaches do not provide.

A specific performance hypothesis. Before any assessment begins, there should be a clear statement of what performance outcome is being targeted and why. Not "we want to develop strategic thinking" but "we believe that the CFO's ability to influence the board on capital allocation decisions is currently limited by X, and that improving X will change Y." The hypothesis should be falsifiable. If you cannot specify in advance what evidence would tell you the intervention failed, you are not measuring — you are hoping.

A genuine baseline. This means measuring the target performance indicator before the intervention, not after. For behavioural outcomes, this requires observation data — not self-report, not 360 feedback, but direct evidence of how the leader actually behaves in the specific contexts that matter. This is harder to collect than a survey, but it is the only way to know whether anything changed.

A time-appropriate follow-up. The research on behaviour change suggests that meaningful change takes a minimum of 60–90 days to consolidate. Assessments conducted at 30 days are measuring early-stage adoption, not sustained change. Assessments conducted at 6 months are measuring whether the change has survived the inevitable regression to baseline that occurs when the programme ends and the environment reasserts itself.

Specificity of attribution. Even when performance does improve, the question of whether the improvement is attributable to the intervention is rarely asked. Leaders improve for many reasons — new experiences, changed circumstances, natural development over time. Without a control group or a counterfactual, attributing improvement to the programme is an assumption, not a finding.

The Assessment That Precedes Development

The most consequential measurement in leadership development is not the post-programme evaluation. It is the assessment that happens before development begins — the one that determines whether the right problem is being addressed.

Most organisations skip this step. They identify a development need based on a performance review, a 360 result, or a manager's observation, and they commission a programme to address it. The programme addresses the stated need. The performance problem persists. The conclusion drawn is that the programme was insufficient — when the actual conclusion should be that the diagnosis was wrong.

The presenting problem in leadership performance is rarely the actual problem. A leader who appears to struggle with delegation is often not struggling with delegation — they are struggling with a specific anxiety about loss of control that delegation triggers, which has a different cause and requires a different intervention. A leader who appears to struggle with strategic thinking is often not struggling with strategic thinking — they are struggling with the cognitive load of their current operational responsibilities, which is consuming the working memory capacity that strategic thinking requires.

Getting the diagnosis right before the intervention begins is not a luxury. It is the only way to ensure that what follows is addressing the actual constraint rather than its most visible symptom.

The Question Worth Asking

The question most organisations ask at the end of a leadership development programme is: "Did people find it valuable?" The question they should be asking is: "What specific performance constraint were we trying to address, how did we measure it before we started, and how has it changed?"

If you cannot answer that question, you have not measured the impact of your investment. You have measured how people felt about spending time away from their desks.

The organisations that get the most from their leadership development investment are not the ones with the most sophisticated programmes. They are the ones that are most honest about what they are trying to change, most rigorous about measuring whether it changed, and most willing to act on the answer — including the answer that says the intervention did not work.

That kind of honesty requires a different relationship with assessment than most organisations currently have. It requires treating measurement not as a validation exercise but as a genuine inquiry into whether the work is having the effect it is supposed to have.

Most organisations are not ready for that. The ones that are tend to be the ones whose leadership development actually works.