Syllabus

Overview

In modern data analysis, graphics and computational statistics are increasingly used together to explore and identify complex patterns in data and to make and communicate claims under uncertainty. This course will go beyond traditional ideas of charts, graphs, maps (and also statistics!) to equip you with the critical analysis, design and technical skills to analyse and communicate with geographic datasets.

The course emphasises real-world applications. You will work with both new, large-scale behavioural datasets, as well as more traditional, administrative datasets located within various social science domains: Political Science, Crime Science, Urban and Transport Planning. As well as learning how to use graphics and statistics to explore patterns in these data, implementing recent ideas from data journalism you will learn how to communicate research findings – how to tell stories with data.

Module objectives

By the end of the course you will:

  1. Describe, process and combine geographic datasets from a range of sources
  2. Design statistical graphics that expose structure in geographic data and that are underpinned by established principles in information visualization and cartography
  3. Use modern data science and visualization frameworks to produce coherent, reproducible data analyses
  4. Apply modern statistical techniques for analysing, representing and communicating data and model uncertainty

Assessment

A detailed and formal description of the Assessment for this module can be found on the Minerva pages. The summative assessment consists of:

  • 30% - Portfolio of work from completing session homeworks 2-5 (1500 word equivalent)
  • 70% - Visual data analysis report (2500 word equivalent)

Software

All work in the module – data collection, analysis and reporting – will be completed using R and the RStudio Integrated Development Environment (IDE). Along with Python R is the programming environment for modern data analysis.

Module breakdown

Week 1 Introduction
Week 2 Data fundamentals
Week 3 Visualization fundamentals
Week 4 Exploratory data analysis
Week 5 Exploring spatial networks
Week 6 Model building 1
Week 7 Model building 2
Week 8 Uncertainty analysis
Week 9 Data storytelling

Self-guided learning

The bad news is whenever you’re learning a new tool, for a long time you’re going to suck. It’s going to be very frustrating. But, the good news is that that is typical, it’s something that happens to everyone, and it’s only temporary … [T]here is no way to go from knowing nothing about a subject to knowing something about a subject and being an expert in it without going through a period of great frustration.”

Hadley Wickham

From the module overview and outline you will have got the sense that this is a reasonably technical module. You will be introduced to the key components of modern data analysis (Data Science) through doing – the course inevitably requires you to do a fair amount of “coding”, in this case in R.

It is understandable if this feels like a daunting prospect. The barrier to entry is greater than with point-and-click interfaces such as ArcGIS and SPSS. So do expect that this module may require a degree of patience and persistence – but isn’t that true of all things that are worth learning?

In order reduce the pain, I’ve tried to include within the module a balance of content between programming fundamentals, theoretical/conceptual learning and more procedural ‘grunt-work’ with datasets. I have also carefully considered and incorporated ideas from some of the really high quality Resources out there aimed at lowering the barrier to doing Data Science in R.

In return, I expect you to:

  • Read all course materials
  • Complete the class session tasks, homework and coursework assignment
  • Participate in the discussion forum

Slack

A key mechanism through which you can participate is by contributing to the discussion forum. Engaging fully with this will help to foster a sort of collegiate atmosphere on the module that will maximise your learning. I have set up a course Slack, which should provide a useful mechanism for sharing information, resources and importantly posting and discussing code.

If you’ve not used Slack before, then follow these pages on getting started with Slack. You should post all substantive questions associated with the module to Slack. These will get answered. If you wish to discuss more personal matters around your completing the course, then send those directly to me via e-mail.

Create an account on the vis-for-gds Slack. Be sue to register with you .leeds.ac.uk e-mail. If you run into any problems, try getting started with Slack.

Asking questions

As many of you will be learning to program in R for the first time, you should expect to be baffled at times and to routinely encounter scary-looking ERROR messages. Counterintuitively, this is actually to be welcomed. You need to be making mistakes and hitting obstacles on a regular basis if you are to progress.

How you respond to these obstacles is important. In a face-to-face lab, the temptation when hitting a problem is to raise your hand, gesture towards your screen and have a demonstrator ‘de-bug’ for you. Whilst this may initially seem like an efficient solution, you risk learning very little if this is your only course of action.

When you encounter problems working through the material in this course, try to force yourself to spend 15-20 minutes troubleshooting the problem individually. Google your problem or try StackOverflow. If you are not able to resolve the problem on your own, then post your question to the course Slack, which I will monitor regularly. When doing this, make an effort to be specific and unambiguous about your problem. You might wish to consult StackOverflow’s guidance on how to ask a good question.