MRP-SMART

Estimating place-based and cross-sectional population outcomes from everyday transactions and behaviours

Author

Roger Beecham

Background / Summary

Figure 1

MRP-SMART is a collaboration between the Financial Data Service (FINDS) and the Healthy and Sustainable Places (HASP) Data Service. The project will demonstrate how reliable, de-biased estimates of neighbourhood-level population outcomes can be generated from non-representative smart data – individual observations describing everyday behaviours and transactions. It will do so via Multilevel Regression and Poststratification (MRP) (Gelman and Little 1997), a technique for small area estimation (SAE) little-used outside of political polling, but with obvious potential for addressing issues of representativeness in smart data research.

MRP-SMART will:

  1. Contribute a collection of data products estimating the financial well-being of populations living in GB neighbourhoods. The data will form a much-needed (JRF 2024) profile of local financial vulnerability, addressing issues of misallocation and poor targeting in recent large-scale interventions to alleviate financial hardship (Ray-Chaudhuri et al. 2023; DWP 2025).

  2. Pioneer the application of MRP for smart data research. A flagship methodological paper will establish a case and framework for implementing MRP on smart datasets; and separately, peer-reviewed open-source code that lowers the barrier of entry to MRP modelling. Together these products will contribute principled yet practical tools to generate de-biased population estimates and communicate them in a responsible, transparent and policy-relevant way.

Why MRP for smart data research?

In social research, samples from surveys are routinely used to estimate the characteristics of a population – e.g. levels of financial distress, dietary and physical activity, or environmental attitudes. With a sufficiently large sample, we can directly estimate the prevalence of target outcomes at subnational level. If we want to look at smaller-scale geographies, direct estimation soon fails and we must turn to model-based approaches, or small area estimation (SAE).

In MRP, the outcome of interest is first modelled (MR) from survey microdata using individual (respondent) demographics and group (area-level) context variables. Next, a ‘poststratification frame’ is constructed, whereby for every small-area unit, joint counts are derived from mainly Census data for the different demographics and area-level variables used in the model. At poststratification (P), predicted probabilities of the outcome are extracted for each row of the postsratification frame. Multiplying these probabilities by the joint counts, we can estimate the extent that the target outcome occurs in the small-area.

The key advantage of MRP lies in its use of Bayesian hierarchical models. It is common in SAE that certain demographic combinations have little or no representation in sample data. In these cases, the model uses information from related groups to make more accurate estimates, known as partial pooling. In a notable example, Gelman et al. (2016) used MRP on a succession of highly unrepresentative polls of Microsoft Xbox gaming console customers, and documented how Xbox-derived forecasts of the 2012 US presidential election were superior to those of leading pollsters using traditionally sampled-surveys.

Why MRP for FINDS?

FINDS works with bank transaction records of over 5 million GB-based customers, and from here derives important ‘financial health’ indicators of those customers: people with persistently overdrawn accounts, living beyond their means, with low emergency resilience. These indicators are available to researchers on the FINDS data catalogue, and also presented to end-users via an interactive web-based dashboard, the Economic Wellbeing Explorer. The tool allows users to subset by geography, demographics and individual financial indicators. While 5 million customers is undoubtedly a large dataset, it is not representative at the sub-population level. There is no guarantee that the customers recorded accurately reflect the financial health of the target sub-populations. Additionally, since direct estimation is used, the sample becomes surprisingly small, unreliable and potentially disclosive at particular combinations of demographics and small-area units.

MRP-SMART is a significant addition to the FINDS data catalogue. Unique estimates of ‘financial health’ could be generated for any combination of small-area spatial unit and demographic combination, without risking disclosure and with interpretable uncertainty ranges for each point estimate.

Why MRP for HASP and SDRUK?

MRP-SMART presents a real methodological opportunity for SDRUK. The current standard for SAE, spatial microsimulation (SPM), relies on strong assumptions around individual sample data and their ability to discriminate target outcomes. MRP not only addresses these assumptions directly (Beecham and Clark 2025), but is well-suited to the general characteristics of smart datasets. While representativeness issues in smart data usually come from a lack of demographic information, geographic context is consistently recorded – as with FINDS transaction data. This is significant as it means that neighbourhood-level variables, which in MRP practice are highly predictive of population outcomes (Gelman et al. 2016; Hanretty 2019), can be incorporated into models. Unlike with SPM, this is made possible without losing information, via partial pooling.

Built into the project is an exchange of learning between data scientists at FINDS and those at HASP, who will implement MRP on HASP’s SPM-derived place-based estimates on the Inclusive Economy and Consumer Vulnerability index (Adcock et al. 2020), as well as datasets describing mobility and physical activity behaviours that are currently being acquired by HASP.

Project team

Roger Beecham (RB), University of Leeds (HASP)

RB is experienced in MRP and Bayesian modelling, and is an international expert in uncertainty visualization and analysis (see Beecham 2025). RB will guide Smart Data Foundry (SDF) data scientists in building the MRP models on FINDS data and incorporating visualization techniques for their communication. He will develop with colleagues at HASP open-source code demonstrations for MRP, and prepare the key paper output – pioneering MRP for smart data research. RB is costed-in at 22% over a 12-month period.

Data Scientists, University of Leeds (HASP)

HASP data scientists will be involved in the exchange of learning with RB and SDF data scientists. Included in our costing is travel budget for 3x HASP data scientists to visit SDF in Edinburgh.

Data Scientists (DS, SDS, CDS) and Engineer (SWE), Smart Data Foundry (FINDS)

From SDF is a Data Scientist (33% over 9 months), Senior Data Scientist (33% over 9 months) and Chief Data Scientist (5% over 9 months). The DS and SDS will work under RB’s and CDS’s guidance to implement the MRP models and develop front-end tools (i.e. via the Economic Wellbeing Dashboard). The SWE (10% over 6 months) will support on pipeline design, handling transformations between ‘raw’ banking data, the derived financial indicators and MRP estimates.

Product Manager (PM) and Digital Content and Storytelling Specialist (DC&S), Smart Data Foundry (FINDS)

SDF’s PM (10% over 4 months) will support the delivery of data products on the FINDS data catalogue, including engaging with end-users on how best to explain MRP-SMART data products, and the DC&S (20% over 2 months) will produce blog posts and data stories.

Aims and objectives

The project delivers on SDR UK objectives insofar as it aims to:

1. Address a methodological challenge in smart data research

MRP has revolutionised political polling. In showing that reliable estimates of population outcomes can be generated from messy and entirely non-representative samples (Gelman et al. 2016), it has potential to do the same for smart data research.

2. Promote trustworthy and responsible use of sensitive data

Each data point will be a modelled value, with interpretable uncertainty ranges, rather than a direct estimate from a sample. Not only does this mean that the small-area estimates are not disclosive of individuals, but coupled with the use of cutting-edge uncertainty visualization, MRP-SMART will be an exemplar for how uncertainty reasoning can be made intrinsic to smart data products.

3. Generate high-impact data products that enhance existing data assets

The enhanced FINDS datasets – de-biased and non-disclosive estimates describing the financial realities of populations at neighbourhood and cross-sectional level – will be a unique resource for analysts, policymakers and service providers.

Expected outputs and deliverables

1. Data products and tool

Estimates of population-level financial outcomes, administered on the FINDS data catalogue and displayed in the Economic Wellbeing Explorer dashboard (e.g. Figure 2).
Alongside the data products will be data stories that explain how MRP-based estimates are generated. These efforts will assist reliable interpretation of model outputs.

Figure 2: Economic Wellbeing Explorer dashboard and uncertainty encodings (Beecham 2025) that will be explored for communicating modelled estimates.

2. Methodological framework paper

A flagship journal paper, in Royal Society Open Science, pioneering MRP for smart data research.

  • Provisional title: From smart data to reliable estimates: A framework for de-biasing and validating estimates of unknown outcomes at small-area level
  • Keywords: Multilevel regression and poststratification; smart data research; non-representative samples; small-area estimation.

3. Peer-reviewed code products

Transferable code products, with tailored guidance on constructing postratification frames.
A demo for end-to-end MRP modelling has been initiated as part of this proposal. The code ‘primer’ for MRP-SMART, developed by RB and HASP data scientists, will be published via the Data:Code section of Environment & Planning B (Arribas-Bel et al. 2021) (EPB), and will form the basis of a HASP training programme course.

Expected impacts

De-biased population estimates for responsible policymaking

In reliably estimating local financial vulnerability, the FINDS data products will fill an identified evidence gap (JRF 2024) and lead to better targeting and evaluation of interventions to address cost-of-living pressures (Ray-Chaudhuri et al. 2023; DWP 2025).

Ground-breaking methods for smart data research

MRP-SMART establishes a foundation – a framework paper and peer-reviewed code products – to de-bias smart data. By giving researchers the tools to apply MRP across new domains, it will provoke a new sub-field in smart data research.

Total cost and justification of resources

The planned activity is organised into four phases (Figure 3), with substantive work running over 12 months, between October 2025 – 2026.

Costs are detailed in the attached spreadsheets, completed separately by HASP (University of Leeds) and FINDS (SDF). The full economic cost of the project is £112,640: £52,407 claimed from University of Leeds; £60,233 from SDF, sub-contracted by University of Edinburgh. In addition to staff time are travel costs: for RB to make three visits to SDF in Edinburgh; the HASP data scientists to make two visits each; and RB to attend AAG 2026, a gathering of c.8,000 international geographers and an important community for adopting MRP and furthering smart data research.

Figure 3: Project workplan.

Dependencies

FINDS microdata

Work on FINDS individual-level data will be performed within the SDF TRE. All SDF data scientists have approval to work in this environment. RB and HASP data scientists will also require TRE access.

SDF are responsible for this dependency and have an established process in place. Each applicant needs to pass a “Standard Disclosure” or “DBS Check”, attend an online training session and sign a TRE user agreement.

Census and area-level context data

The most restrictive Census data will require Safeguarded access. All included researchers already have this level of access. Any other context variables will be collected at LSOA-level. It is very unlikely that dependencies on these data will be prohibitive.

Risks

Code libraries for MRP cannot be used within SDF’s TRE

Mitigation: SDF data scientists have consulted our demo for MRP modelling. The libraries explained in that demo are compatible with SDF’s TRE, but we will further mitigate this risk in Phase 2 through code translation and validation (supported by CDS and SWE).

FINDS microdata is not suitable for MRP modelling

Mitigation: Although FINDS microdata is not attribute-rich in its demographics, containing only age and sex, the fine-grained neighbourhood (LSOA) in which customers live is consistently recorded. MRP is ideally practiced in situations where this level of geographic context is known (Hanretty 2019), so we are confident that MRP can be successfully applied to FINDS microdata.

Ethics

Ethical approval for use of data

All FINDS research leverages best practice frameworks for ethical use of financial data. SDF will complete a ‘Research in the Public Interest’ Data Protection Impact Assessment (DPIA) and seek ethical approval for the entire project through University of Edinburgh.

Data privacy and disclosure

Models using FINDS microdata will be built end-to-end within the SDF TRE and the resulting data products published via SDF FINDS data catalogue, which by default has Safeguarded access. As the data products are derived from model probabilities, they are entirely synthetic and cannot be disclosive of individuals.

References

Adcock, Michael, Nik Lomax, Stephen Clark, Oliver Clark, and Francesca Pontin. 2020. Consumer Vulnerability. Dataset. Version 1.0. Dataset; Healthy; Sustainable Places Data Service. https://doi.org/10.82147/001.
Arribas-Bel, Dani, Seraphim Alvanides, Michael Batty, Andrew Crooks, Linda See, and Levi Wolf. 2021. “Urban Data/Code: A New EP-b Section.” Environment and Planning B: Urban Analytics and City Science 48 (9): 2517–19. https://doi.org/10.1177/23998083211059670.
Beecham, R., and S. Clark. 2025. “Should We Use Multilevel Regression and Post-Stratification When Simulating Area-Level Population Outcomes?” Proceedings of 33rd GISRUK Conference 2025 (Bristol, UK), April. https://eprints.whiterose.ac.uk/id/eprint/226795/.
Beecham, Roger. 2025. Visualization for Social Data Science. CRC Press. https://doi.org/10.1201/9781003292760.
DWP. 2025. Cost of Living Payments Evaluation: Executive Summary. Department for Work; Pensions. https://www.gov.uk/government/publications/cost-of-living-payments-evaluation/executive-summary.
Gelman, Andrew, Sharad Goel, David Rothschild, and Wei Wang. 2016. “Forecasting Elections with Non-Representative Polls.” International Journal of Forecasting 32 (4): 980–91.
Gelman, Andrew, and Thomas C Little. 1997. “Poststratification into Many Categories Using Hierarchical Logistic Regression.” Survey Methodology 23 (2): 127–35.
Hanretty, Chris. 2019. “An Introduction to Multilevel Regression and Post-Stratification for Estimating Constituency Opinion.” Political Studies Review, 1–16.
JRF. 2024. UK Poverty 2024: The Essential Guide to Understanding Poverty in the UK. Joseph Rowntree Foundation. https://www.jrf.org.uk/uk-poverty-2024-the-essential-guide-to-understanding-poverty-in-the-uk.
Ray-Chaudhuri, Sam, Tom Waters, and Xiaowei Xu. 2023. Lump-Sum Cost of Living Payments Poorly Designed to Alleviate Deprivation. Institute for Fiscal Studies. https://ifs.org.uk/news/lump-sum-cost-living-payments-poorly-designed-alleviate-deprivation.