Skip to contents

Introduction

This article reproduces ideas discussed in Beecham and Slingsby 2019, using ggplot2 to create gridmaps analysing the geography of travel-to-work between London boroughs. Gridmaps are encoded in the same way as thematic maps (using geom_sf()); as more information-rich data graphics, using ggplot2’s layered geoms and facet_grid(row~col) to effect the gridmap arrangement; and as gridmap-arranged thematic maps (using facet_grid(row~col) and geom_sf()) to create OD Maps.

Setup

The following libraries are required to run this vignette:

Data

The travel-to-work data contains origin-destination pairs representing every borough-to-borough commute combination: a tibble containing 9,801 observations and five variables:

  • o_bor Borough origin.
  • d_bor Borough destination.
  • occ_type Occupation type of commute: 1_managers_senior, 2_professional, 3_associate_professional, 4_administrative, 5_trade, 6_caring_leisure, 7_sales_customer, 8_machine_operatives, 9_elementary.
  • count Count of number of commutes of occupation type between origin-destination borough pair.
  • is_prof Whether the occupation type is professional or non-professional.
london_ttw <- read_csv("london_ttw.csv")

To analyse commuting between London boroughs by occupation type we differentiate between:

  • jobs located in a borough, accessed by workers living in that or other boroughs in London; and
  • workers living in a borough, who access jobs in that or other boroughs in London
# Jobs located in borough.
jobs <- london_ttw |>
  group_by(d_bor, occ_type) |>
  summarise(count = sum(count), is_prof = first(is_prof), type = "jobs") |>
  ungroup() |>
  select(type, bor = d_bor, occ_type, is_prof, count)
# Workers living in borough.
workers <- london_ttw |>
  group_by(o_bor, occ_type) |>
  summarise(count = sum(count), is_prof = first(is_prof), type = "workers") |>
  ungroup() |>
  select(type, bor = o_bor, occ_type, is_prof, count)

Generate allocation and make grid

We use points_to_grid(), specified with targeted spacers, to get a solution close to the LondonSquared layout. See the vignette Generate gridmaps for a fuller explanation of points_to_grid() and make_grid().

n_row <- 7
n_col <- 8
spacers <- list(
  c(1, 3), c(1, 5), c(1, 6),
  c(2, 2), c(2, 7),
  c(3, 1),
  c(6, 1), c(6, 2), c(6, 7), c(6, 8),
  c(7, 2), c(7, 3), c(7, 4), c(7, 6), c(7, 7)
)
pts <- london_boroughs |>
  st_drop_geometry() |>
  select(area_name, x = easting, y = northing)
solution <- points_to_grid(pts, n_row, n_col, compactness = 1, spacers)

grid <- make_grid(london_boroughs, n_row, n_col) |>
  inner_join(solution)

Gridmap as thematic map (geom_sf)

Since we have created a simple features object for the layout with make_grid(), it is straightforward to generate thematic gridmaps using geom_sf() alongside ggplot2’s standard geom layers and grammar. In the map below we encode counts of jobs and workers by borough as proportional symbols, with geom_point(), and facet_grid() the plot according to two categorical variables: occupation type (collapsed to professional or non-professional) and whether jobs or workers in the borough are counted over.

gridmap |>
  inner_join(bind_rows(jobs, workers), by = c("area_name" = "bor")) |>
  group_by(area_name, is_prof, type) |>
  summarise(count = sum(count), x = first(x), y = first(y)) |>
  mutate(
    is_prof = factor(if_else(is_prof, "professional", "non-professional"), levels = c("professional", "non-professional")),
    type = factor(type, levels = c("workers", "jobs"))
  ) |>
  ggplot(aes(x = x, y = y)) +
  geom_sf(fill = "#EEEEEE") +
  geom_point(aes(size = count, colour = is_prof), alpha = .3) +
  geom_point(aes(size = count, colour = is_prof), fill = "transparent", pch = 21, stroke = .5) +
  facet_grid(is_prof ~ type) +
  scale_fill_manual(values = c("#4d004b", "#00441b"), guide = "none") +
  scale_colour_manual(values = c("#4d004b", "#00441b"), guide = "none")

A quick explanation of the ggplot2 spec:

  1. Data: Counts of workers and jobs (type) by borough, collapsed over professional or non-professional occupation types (is_prof). Casting is_prof and type to factor variables gives us control over the order in which they appear in the plot.
  2. Encoding: the proportional symbols are positioned at the centroids of borough grid cells (x, y), sized according to count of jobs or workers and coloured according to occupation type (is_prof).
  3. Marks: geom_point() for proportional symbols and geom_sf() for grid outline.
  4. Scale: scale_fill/colour_manual() for associating occupation type and scale_size() for controlling size of points that encode counts.
  5. Facets: facet_wrap() on workers/jobs summary type and high-level occupation type (is_prof).

Gridmap as geographically-arranged geoms with facet_grid

With standard ggplot2, we can incorporate more detailed graphics into a geographic arrangement. In the example below are bar charts summarising the number of workers (left-pointing bars) and jobs (right-pointing bars) by occupation type, the full nine standard classes are shown, in each borough with a gridmap arrangement.

This time we create a staged dataset for plotting (plot_data). In the graphic above, jobs and workers are differentiated by varying the direction of bars: pointing to the right for jobs, to the left for workers. This is achieved in data staging by a slight hack – changing the polarity of counts by occupation depending on the summary type. The counts are further locally (borough-level) scaled. For each borough we find its modal category count of jobs/workers irrespective of occupation type and scale all other counts of jobs and workers relative to this modal category. This allows us to distinguish between job-rich boroughs with longer bars pointing to the right (Westminster); resident/worker-rich boroughs with longer bars pointing to the left (Wandsworth); and outer London boroughs that are more self-contained (Hillingdon).

# Combine jobs and workers summaries.
plot_data <- bind_rows(jobs, workers) |>
  # Borough-scaled counts (maximum).
  group_by(bor) |>
  mutate(count = count / max(count)) |>
  ungroup() |>
  # Selectively change sign and reverse factor for bars of descending occ status.
  mutate(count = if_else(type == "jobs", count, -count), occ_type = fct_rev(factor(occ_type)))

The ggplot2 spec for creating the bar charts is:

bor_select <- "Westminster"
plot_data |>
  filter(bor == bor_select) |>
  ggplot() +
  geom_col(aes(x = count, y = occ_type, fill = is_prof), alpha = .5, width = 1) +
  geom_vline(xintercept = 0, linewidth = .4, colour = "#ffffff") +
  annotate("text", x = 1, y = 5, label = str_extract(bor_select, "^.{3}"), alpha = .7, hjust = 1, size = 4) +
  scale_x_continuous(limits = c(-1, 1)) +
  scale_fill_manual(values = c("#00441b", "#4d004b"), guide = "none")
  1. Data: The staged (plot_data) object, filtered on selected boroughs for exploration purposes (identified with bor_select)
  2. Encoding: Bars whose horizontal position (x=) varies according to count and vertical position (y=) according occ_type, filled on high-level (professional / non-professionalis_prof) occupation type.
  3. Marks: geom_col() for bars.
  4. Scale: scale_fill_manual() for associating occupation type, scale_x_continuous() for making sure workers/jobs bars use the same scale.
  5. Setting: annotate() to provide a borough label within the plot space – the middle right of the plot. We use stringr::str_extract() to pull out the first three letters of the borough name. `

Remembering that gridmap layouts created by points_to_grid() define row and column identifiers for each spatial unit in the original geography, it is easy to imagine how this can be turned into a gridmap using ggplot2’s built-in faceting. The only substantive update to the spec is that we supply row and col identifiers to facet_grid(), with a slight hack on the col variable as gridmappr’s origin [min-row, min-col] is the bottom-left cell in the grid whereas for facet_grid() the origin is the top-left. Generating the graphic entirely with standard (built-in) ggplot2 gives many options for plot customisation – adding layers, annotations and other elements as in the example graphic.

plot_data |>
  # Join on solution to IDs for gridmap layout.
  left_join(solution, by = c("bor" = "area_name")) |>
  ggplot() +
  geom_col(aes(x = count, y = occ_type, fill = is_prof), alpha = .5, width = 1) +
  geom_vline(xintercept = 0, linewidth = .4, colour = "#ffffff") +
  geom_text(data = solution, aes(x = 1, y = 5, label = str_extract(area_name, "^.{3}")), alpha = .7, hjust = 1, size = 4) +
  # Pass row col IDs from solution for gridmap layout.
  facet_grid(-row ~ col) +
  scale_x_continuous(limits = c(-1, 1)) +
  scale_fill_manual(values = c("#00441b", "#4d004b"), guide = "none") +
  theme(strip.text = element_blank())

Gridmap as geographically arranged geom_sfs (OD Maps)

So far we have generated gridmap graphics as standard thematic maps using a polygon object and geom_sf(); and as geographically arranged plot objects (glyphmaps) with facet_grid(). Combining both approaches – placing standard thematic maps with a further geographical arrangement (using facet_grid()) – allows the creation of OD Maps. With this map-within-map layout, we can look at flows of commuters between London boroughs.

Again we create a staged dataset (plot_data). Counts of commuter flows by occupation type (professional and non-professional) are calculated along with the proportion of jobs that are professional (51%, global_prop). If we were to randomly sample an OD (borough-borough) commute pair, we might expect this proportion to appear when counting up the number of professional and non-professional occupation types present in that commute. For each origin-destination pair (OD), we generate expected counts by multiplying the total number of commutes present in an OD pair by this global_prop, and from here signed residuals (resid) identifying whether there are greater or fewer professionals commuting that OD pair than would be expected. Note that these are like signed chi-scores in that rather than expressing differences in observed counts as a straight proportion of expected counts (dividing by expected counts), we apply a power transform to the denominator. This has the effect of also giving saliency to differences that are large in absolute terms. You could try varying this exponent (maybe between 0.5-1.5) to see its effect on residuals encoded in the OD Maps.

plot_data <- london_ttw |>
  group_by(o_bor, d_bor, is_prof) |>
  summarise(count = sum(count)) |>
  pivot_wider(names_from = is_prof, values_from = count) |>
  rename(non_prof = `FALSE`, prof = `TRUE`) |>
  ungroup() |>
  mutate(global_prof = sum(prof) / sum(prof + non_prof)) |>
  group_by(d_bor) |>
  mutate(
    count = prof + non_prof + .0001,
    obs = prof + .0001,
    exp = (global_prof * count),
    resid = (obs - exp) / (exp^.7)
  ) |>
  ungroup()

The ggplot2 spec for creating the OD map:

bbox_grid <- st_bbox(gridmap)
width <- bbox_grid$xmax - bbox_grid$xmin
height <- bbox_grid$ymax - bbox_grid$ymin
range_resid <- max(abs(plot_data$resid))

plot <- plot_data |>
  left_join(gridmap |> st_drop_geometry() |> select(area_name, d_col = col, d_row = row), by = c("d_bor" = "area_name")) |>
  left_join(gridmap |> select(o_x = x, o_y = y, area_name), by = c("o_bor" = "area_name")) |>
  mutate(
    bor_label = if_else(o_bor == d_bor, d_bor, ""),
    bor_focus = o_bor == d_bor
  ) |>
  st_as_sf() |>
  ggplot() +
  geom_sf(aes(fill = resid), colour = "#616161", size = 0.15, alpha = 0.9) +
  geom_text(
    data = plot_data %>% filter(bor_focus),
    aes(x = o_x, y = o_y, label = str_extract(o_bor, "^.{1}")),
    colour = "#252525", alpha = 0.9, size = 2.1,
    hjust = "centre", vjust = "middle"
  ) +
  geom_text(
    data = plot_data %>% filter(bor_focus),
    aes(x = bbox_grid$xmax, y = bbox_grid$ymin, label = str_extract(o_bor, "^.{3}")),
    colour = "#252525", alpha = 0.6, size = 3.5,
    hjust = "right", vjust = "bottom"
  ) +
  coord_sf(crs = st_crs(plot_data), datum = NA) +
  facet_grid(-d_row ~ d_col, shrink = FALSE) +
  scale_fill_distiller(
    palette = "PRGn", direction = -1, 
    breaks = c(-range_resid, 0, range_resid), labels = c("low prof", "avg", "high prof"),
    limits = c(-range_resid, range_resid)
    ) +
  theme(
    panel.spacing = unit(0.1, "lines"),
    strip.text.x = element_blank(), strip.text.y = element_blank(),
  )
  • 1. Data:
    • Take the staged dataset and join twice on the gridmap dataset.
    • The first join, on d_bor. The map in the figure is a D-OD map. The larger (focus) grid cells correspond to destinations; the smaller cells are origins coloured according to commuter flows into the larger (focus) cells. When joining plot_data on destination (d_bor) we do not need the geometry data, so drop it, but we do need the cell IDs (col, row) to supply to the ggplot2 facet. Note that these variables are renamed in the select().
    • The second join , on o_bor. As origins are the small maps, we retain the geometry data as well as the geographic centroid of origin borough grid cells (x, y). Again these variables are renamed in the select().
    • Finally in the mutate() we generate a new variable identifying the borough in focus (bor_focus), destination in this case, and a text label variable for annotating plots on this (bor_label).
  • 2. Encoding:
    • Gridmap cells are coloured according to the calculated resdiuals (fill=resid).
    • Text labels for destination (focus) boroughs are drawn in the bottom-right corner of larger cells. Note that the coordinate space here is that from the gridmap dataset and so the x,y location of borough labels are derived from the bounding box object (bbox_grid), calculated during data staging. Single letter annotations are also positioned where small origin cells match the focus (destination) borough. These additional labels are positioned using the grid centroids o_x, o_y from plot_data.
  • 3. Marks: geom_sf() for drawing the small gridcell maps; geom_text() for drawing the labels.
  • 4. Scale: scale_fill_distiller() for a diverging colour scheme using the ColorBrewer PRGn palette and made symmetrical on 0 by manually setting limits().
  • 5. Facets: facet_grid() for effecting the map-within-map layout.