Model building 1

Materials from class on Monday, August 15, 2022

Contents

Session Outcomes

By the end of this homework session you should be able to:

  • Calculate grouped statistics and use those statistics for ordering plot outputs.
  • Extract parameters from linear regression models and explore their structure graphically.
  • Make use of functional programming techniques for working over datasets.

Introduction

This homework requires you to apply the concepts and skills developed in the class session on model building 1.

Task 1: Calculate and plot

Recreate the scatterplots in the class materials displaying the Leave against candidate explanatory variable – faceted by each explanatory variable and ordered by correlation coefficient.

You will need to use the data_for_models data frame. First, calculate correlation coefficients for each explanatory variable, then generate a vector of variable names, stored in an object, ordered according to correlation coefficient. To help you out, I’ve provided a template of dplyr commands that you will want to use and some indicative output. In order to summarise over each explanatory variable in a group_by(), you first need to pivot_longer() so that all explanatory variables are represented in a single column. Of course you are welcome to take a different approach.

# Place your code for calculating correlation coefficients by explanatory
# variable here.

expl_cors <- data_for_models %>%
  pivot_longer(...) %>%
  group_by(...) %>%
  summarise(...) %>%
  arrange(...)

expl_cors
# # A tibble: 10 x 2
#    expl_var           cor
#    <chr>            <dbl>
#  1 degree          -0.772
#  2 professional    -0.565
#  3 younger         -0.538
#  4 eu_born         -0.483
#  5 no_car          -0.396
#  6 white            0.411
#  7 own_home         0.430
#  8 christian        0.487
#  9 not_good_health  0.562
# 10 heavy_industry   0.710

# You will then want to then extract a vector of ordered
# variable names as an object.

expl_cors_order
# [1] "degree"          "professional"    "younger"         "eu_born"
# [5] "no_car"          "white"           "own_home"        "christian"
# [9] "not_good_health" "heavy_industry"

Now generate the scatterplots with Leave against each explanatory variable, faceted by explanatory variable and with the facets ordered according to correlation coefficient.

#######################
# Enter your code in the chunk provided.
######################

Task 2: Explore, model and evaluate

A variable that is intuitively relevant to this study, but that behaves counter to expectation is EU-born – the proportion of residents in a constituency born outside of the UK but within the EU. In the technical element to the session, you generated a data frame of model outputs regressing each candidate explanatory variable on Leave. This is stored in single_model_fits. The second set of scatterplots use information contained in single_model_fits to colour observations according to their residuals from each of these model objects. The scatterplot for EU-born is interesting – some observations with very large negative coefficients and some with very large positive coefficients.

Extract the residuals from the EU-born model object contained in single_model_fits. Then map it in a similar way to the map-lineup – that is, a red-blue colour scheme centred on 0 (not the line-up itself).

#######################
# Enter your code in the chunk provided.
######################

What do you notice about the geography of these residuals? Make a few short observations in the template provided.

  • Insight 1
    • Enter your answer in the template
  • Insight 2
    • Enter your answer in the template
  • Insight 3
    • Enter your answer in the template

References