Syllabus

Overview

In modern data analysis, graphics and computational statistics are increasingly used together to explore and identify complex patterns in data and to make and communicate claims under uncertainty. This course will go beyond traditional ideas of charts, graphs, maps (and also statistics!) to equip you with the critical analysis, design and technical skills to analyse and communicate with social science datasets.

The course emphasises real-world applications. You will work with both new, large-scale behavioural datasets, as well as more traditional, administrative datasets located within various social science domains: Political Science, Crime Science, Urban and Transport Planning. As well as learning how to use graphics and statistics to explore patterns in these data, implementing recent ideas from data journalism you will learn how to communicate research findings – how to tell stories with data.

Course objectives

By the end of the course you will:

  1. Describe, process and combine social science datasets from a range of sources
  2. Design statistical graphics that expose structure in social science data and that are underpinned by established principles in information visualization and cartography
  3. Use modern data science and visualization frameworks to produce coherent, reproducible data analyses
  4. Apply modern statistical techniques for analysing, representing and communicating data and model uncertainty

Assessment

For those taking the course for credits, there are two forms of assessment:

  1. In-class exam. A set of questions on core concepts with short written answers to be completed within a single 1-hour sitting. There are no software/coding elements to the in-class exam. This will take place during the afternoon of Thursday 18th August 2022.
  2. Take-home exam. A computational notebook to complete containing a set of applied exercises relevant to the course but that are constrained and specific. There will be specific data to work with, outputs to produce and interpretations to be made in the context of the methods applied. The notebook should be submitted within 2-weeks of the course finishing, by Friday 2nd September 2022. Expect to spend no more than a few hours on this element.

Software

All work in the course – data collection, analysis and reporting – will be completed using R and the RStudio Integrated Development Environment (IDE). Along with Python R is the programming environment for modern data analysis.

Course breakdown

Session 1 Introduction
Session 2 Data fundamentals
Session 3 Visualization fundamentals
Session 4 Exploratory data analysis
Session 5 Exploring spatial networks
Session 6 Model building 1
Session 7 Model building 2
Session 8 Uncertainty analysis
Session 9 Data storytelling

Self-guided learning

The bad news is whenever you’re learning a new tool, for a long time you’re going to suck. It’s going to be very frustrating. But, the good news is that that is typical, it’s something that happens to everyone, and it’s only temporary … [T]here is no way to go from knowing nothing about a subject to knowing something about a subject and being an expert in it without going through a period of great frustration.

Hadley Wickham

From the course overview and outline you will have got the sense that this is a reasonably technical course. You will be introduced to the key components of modern data analysis (Data Science) through doing – the course inevitably requires you to do a fair amount of “coding”, in this case in R.

It is understandable if this feels like a daunting prospect. The barrier to entry is greater than with point-and-click interfaces such as ArcGIS and SPSS. So do expect that this course may require a degree of patience and persistence – but isn’t that true of all things that are worth learning?

In order reduce the pain, I’ve tried to include within the course a balance of content between programming fundamentals, theoretical/conceptual learning and more procedural ‘grunt-work’ with datasets. I have also carefully considered and incorporated ideas from some of the really high quality Resources out there aimed at lowering the barrier to doing Data Science in R.

Slack

A key mechanism through which you can participate is by contributing to the discussion forum. Engaging fully with this will help to foster a sort of collegiate atmosphere on the course that will maximise your learning. I have set up a course Slack, which should provide a useful mechanism for sharing information, resources and importantly posting and discussing code.

If you’ve not used Slack before, then follow these pages on getting started with Slack. You should post all substantive questions associated with the course to Slack. These will get answered. If you wish to discuss more personal matters around your completing the course, then send those directly to me via e-mail.

Create an account on the comp-sds Slack. If you run into any problems, try getting started with Slack.

Asking questions

As some of you will be learning to program in R for the first time, you should expect to be baffled at times and to routinely encounter scary-looking ERROR messages. This is actually to be welcomed. You need to be making mistakes and hitting obstacles on a regular basis if you are to progress.

How you respond to these obstacles is important. In a face-to-face lab, the temptation when hitting a problem is to raise your hand, gesture towards your screen and have a demonstrator ‘de-bug’ for you. Whilst this may initially seem like an efficient solution, you risk learning very little if this is your only course of action.

If you are studying online-only, when you encounter problems working through the material in this course, try to force yourself to spend some time troubleshooting the problem individually. Google your problem or try StackOverflow. If you are not able to resolve the problem on your own, then post your question to the course Slack, which I will monitor regularly. When doing this, make an effort to be specific and unambiguous about your problem. You might wish to consult StackOverflow’s guidance on how to ask a good question.