For comparability’s sake, let’s really measure outcomes

Designing meaningful, appropriate and comparable impact investment performance instruments that do not stop at outputs, but get all the way to outcomes, was never going to be easy. But it is not impossible.

Get in touch

by Tom Adams

Commonly used measurement tools are doing a reasonable job of measuring outputs – the number of people reached, for example – but a poor job of measuring outcomes. ‘Output’ metrics are essentially repurposed from existing sales or operational metrics. At the same time, there is a scarcity of data on the strength or depth of that impact at an individual or household level – so-called ‘outcome’ metrics. It’s high time this changed.

A brief history of how we got here

The purpose of impact measurement is to gain understanding of impact performance, and to use this understanding to improve impact performance. In order to undertake meaningful impact management we need to judge ourselves by comparison with peers. This separates good impact from great impact.

That comparability is essential has been well-recognised since the early days of impact investing. The GIIN established IRIS+, for example, specifically to provide a taxonomy of indicators for comparison.

However something important was overlooked: it is easier to define ‘what’ to measure than it is to solve for the ‘how’ of measurement itself.

‘What’ is a question of listing metrics. ‘How’ requires a repeatable approach to quality social research—the act of sampling and surveying people who experience impact—that balances cost with complexity, rigor with speed. This is essential if it is to be widely adopted and folded into the budgetary and decision-making processes of investors and enterprises alike.

In the absence of a good ‘how’, the more tricky-to-define-and-measure social metrics have been overlooked. As a result, the ‘what’ as encapsulated by IRIS+ has tended to gravitate toward indicators that are easy to measure rather than those that are most meaningful.

Comparing the right things

One of the main risks to comparability is that, in a rush to build it, we develop metrics and scores that provide misleading comparisons.

We made this argument when we first reviewed the GIIN’s flagship performance report titled, “Understanding Impact Performance: Financial Inclusion Investments.” One would expect that such a report would be anchored on (1) the types of impacts that matter to a client of a microfinance or fintech organization; and that, (2) as a “performance report,” it would allow one to understand the relative impact performance of financial inclusion impact investors and financial inclusion companies. At the time it accomplished neither of these two objectives.

The GIIN has made progress on the second point. Its recent impact performance benchmarks do begin to look at relative performance. However, they do so by looking almost entirely at numbers of people reached. Consequently, they still say little (if anything) about the impact on real people’s real, lived-experience, i.e. whether or not the lives of borrowers are actually improved by borrowing.

This is surely the essence of measuring social impact. Glossing over this means we are doing ourselves a serious disservice in terms of how we understand, track and compare our progress toward optimizing social impact.

Talking to customers

Part of the problem might be a lack of attention to the fidelity of the metrics themselves. That they exist and have been codified within the IRIS+ taxonomy into a (dizzyingly long) list seems to be enough? But take the time to look at the actual metrics listed for say financial services and it’s easy to see why this system has struggled to scratch beneath the surface of social impact.

In theory, easy-to-measure metrics – such as the active number of borrowers per loan officer (metric P19250) or the number of claims rejected (metric P13383) – could influence the social impact experienced by the borrower. But these are at best very rough proxies. Though the environmental measures within the taxonomy are better, much of the taxonomy is a case of the comparability cart coming in front of the usefulness horse.

If we are to develop toward a more meaningful standard of impact performance comparison we need to change tack. Rather than consolidating whatever metrics are currently and easily available from the top-down, we instead have to do the hard work of building meaningful measures from the ground-up.

Journalist Simon Clark makes the point forcefully in his book The Key Man, arguing that investors who talk about impact without including the voice and views of the people who do (or don’t) experience that impact have “about as much integrity as a room full of men discussing how to improve gender equity.”

To adopt such a bottom-up approach, we need to first listen to the people who we hope to positively impact through the work we do, so that they can tell us what things change in their lives and, of these, which are most material to their wellbeing (these are concepts captured well by the impact dimensions of the Impact Management Project).

We also need to build and deploy appropriate metrics and survey instruments that can be repeated across whole sectors, year-in, year-out.

With respect to the ‘how’ of impact measurement this is when the rubber really hits the road. Such an approach was at the core of 60 Decibels’ MFI index which included multiple material indicators – on the depth of household impact, business impact, financial resilience, financial management and financial inclusion – and consolidated them into a single impact index.

In our first year, 72 MFIs adopted the index and we spoke with some 17,956 borrowers. In Year Two, we expect almost all of the original MFIs to repeat, plus an even greater number of new adopters. Such data has helped gain a better understanding of the sector as a whole and correct unfair misrepresentations based on anecdotal information.

Recalibrating for impact

Given its importance, practical implementation of the shift from outputs to outcomes warrants considerably more of the industry’s attention, alignment and commitment. This should start with an honest reflection about the strengths and limitations of tools such as IRIS+, and a clear roadmap of how to recalibrate them to list fewer output indicators and many more meaningful outcome measures.

The tools we’re currently using, such as IRIS+, cannot hope to properly measure social performance without a root and branch recalibration. IRIS is a list of indicators, most of which have nothing to do with social impact, but pretending to be a complete system for optimizing impact performance.

Once the whole ecosystem, from asset owners (and their advisors) to asset managers, investors and enterprises, gets behind, and expects, proper measures of impact performance as the essence for impact management, the results would be huge.

We will all get much better – and much faster – at identifying companies and investment strategies that deliver quality social impact performance. The result of all this: greater impact.

This piece was originally published on Impact Alpha. To check out more 60dB pieces in the news, check out our insights page here.

Tom Adams is co-founder and chief strategy officer at 60 Decibels.

Let's talk

We help you understand your social performance by listening to your customers, suppliers, employees, or beneficiaries.