Materials from class on Monday, August 15, 2022


Session Outcomes

By the end of this homework session you should be able to:

  • Calculate grouped statistics and use those statistics for ordering plot outputs.
  • Extract parameters from linear regression models and explore their structure graphically.
  • Make use of functional programming techniques for working over datasets.


This homework requires you to apply the concepts and skills developed in the class session on model building 1.

Task 1: Calculate and plot

Recreate the scatterplots in the class materials displaying the Leave against candidate explanatory variable – faceted by each explanatory variable and ordered by correlation coefficient.

You will need to use the data_for_models data frame. First, calculate correlation coefficients for each explanatory variable, then generate a vector of variable names, stored in an object, ordered according to correlation coefficient. To help you out, I’ve provided a template of dplyr commands that you will want to use and some indicative output. In order to summarise over each explanatory variable in a group_by(), you first need to pivot_longer() so that all explanatory variables are represented in a single column. Of course you are welcome to take a different approach.

expl_cors <- data_for_models %>%
  pivot_longer(...) %>%
  group_by(...) %>%
  summarise(...) %>%

# # A tibble: 10 x 2
#    expl_var           cor
#    <chr>            <dbl>
#  1 degree          -0.772
#  2 professional    -0.565
#  3 younger         -0.538
#  4 eu_born         -0.483
#  5 no_car          -0.396
#  6 white            0.411
#  7 own_home         0.430
#  8 christian        0.487
#  9 not_good_health  0.562
# 10 heavy_industry   0.710

# You will then want to then extract a vector of ordered
# variable names as an object.

# [1] "degree"          "professional"    "younger"         "eu_born"
# [5] "no_car"          "white"           "own_home"        "christian"
# [9] "not_good_health" "heavy_industry"

Now generate the scatterplots with Leave against each explanatory variable, faceted by explanatory variable and with the facets ordered according to correlation coefficient.

Task 2: Explore, model and evaluate

A variable that is intuitively relevant to this study, but that behaves counter to expectation is EU-born – the proportion of residents in a constituency born outside of the UK but within the EU. In the technical element to the session, you generated a data frame of model outputs regressing each candidate explanatory variable on Leave. This is stored in single_model_fits. The second set of scatterplots use information contained in single_model_fits to colour observations according to their residuals from each of these model objects. The scatterplot for EU-born is interesting – some observations with very large negative coefficients and some with very large positive coefficients.

Extract the residuals from the EU-born model object contained in single_model_fits. Then map it in a similar way to the map-lineup – that is, a red-blue colour scheme centred on 0 (not the line-up itself).

What do you notice about the geography of these residuals? Make a few short observations in the template provided.

