class: center, middle, title-slide # GEOG5927 Predictive Analytics : Introduction ### Roger Beecham ### 28 Feb 2022 --- class: center middle .center[ <img src = "img/geog5927.png", width = 40%></img> ] ??? Looked at module website Also on Minerva 1. Introduce Module 2. Say how it will run 3. Say a bit on timings 4. Look at the first bit of subtantive work -- practical 1 --- ## Module content and philosophy * Spatial modelling to simulate and predict consumer behaviour + Exploratory analysis + Microsimulation + Agent-based modelling <br> * Research case studies to evaluate modelling techniques in practice + Practical sessions + Individual data science report + Group data science presentation ??? * Introduces some key theory and techniques in modern data analysis. * Has both technical and applied aspects. * Techniques in which Geog dept specialises * Doing so with datasets and domains relevant to marketing science --- ## Outcomes By the end of this module you should be able to: 1. **Explain** and **critically evaluate** the role of spatial analytics in simulating and predicting consumer behaviours 2. **Apply** geocomputational modelling and simulation techniques on real data sets 3. **Devise** and **employ** spatial modelling tools to address business problems, presenting and justifying recommendations in an appropriate context --- ??? 1 : theory 2 : application 3 : both --- ## Module team <table> <thead> <tr> <th style="text-align:left;background-color: #ffffff !important;font-size: 18px;"> img </th> <th style="text-align:left;background-color: #ffffff !important;font-size: 18px;"> name </th> <th style="text-align:left;background-color: #ffffff !important;font-size: 18px;"> role </th> <th style="text-align:left;background-color: #ffffff !important;font-size: 18px;"> sessions </th> <th style="text-align:left;background-color: #ffffff !important;font-size: 18px;"> assignments </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> <html><body><img src="img/profile_roger.jpg" width="80" height="80"></body></html> </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> Roger Beecham </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> Convenor </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 1,2 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 1 </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> <html><body><img src="img/profile_rachel.jpeg" width="80" height="80"></body></html> </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> Rachel Oldroyd </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> Lecturer </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 3 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> NA </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> <html><body><img src="img/profile_jg.jpg" width="80" height="80"></body></html> </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> Jiaqi Ge </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> Lecturer </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 4,5 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 2 </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> <html><body><img src="img/profile_nik.jpg" width="80" height="80"></body></html> </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> Nick Malleson </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> Lecturer </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 4,5 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> NA </td> </tr> </tbody> </table> --- ## Scheduling <table> <thead> <tr> <th style="text-align:left;background-color: #ffffff !important;font-size: 18px;"> session </th> <th style="text-align:left;background-color: #ffffff !important;font-size: 18px;"> wc </th> <th style="text-align:left;background-color: #ffffff !important;font-size: 18px;"> lecture </th> <th style="text-align:left;background-color: #ffffff !important;font-size: 18px;"> lab </th> <th style="text-align:left;background-color: #ffffff !important;font-size: 18px;"> recap </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 1 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 28 Feb </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> mon 1500-1600 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> wed 1100-1800 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> fri 1700-1800 </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 2 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 7 Mar </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> mon 1500-1600 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> wed 1100-1800 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> fri 1700-1800 </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 3 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 14 Mar </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> mon 1500-1600 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> wed 1100-1800 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> fri 1700-1800 </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 4 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 21 Mar </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> mon 1500-1600 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> wed 1100-1800 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> fri 1700-1800 </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 5 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 25 Apr </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> mon 1500-1600 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> NA - pres </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> NA </td> </tr> </tbody> </table> .small-font[ Additional drop-in sessions via Teams: <br> -- Tuesdays 1:00pm–2:00pm and led by [Will James](https://environment.leeds.ac.uk/geography/staff/1054/dr-will-james). ] ??? Breakdown of activity : short and fat --- ## Outline : Lectures <table> <thead> <tr> <th style="text-align:left;background-color: #ffffff !important;font-size: 18px;"> session </th> <th style="text-align:left;background-color: #ffffff !important;font-size: 18px;"> wc </th> <th style="text-align:left;background-color: #ffffff !important;font-size: 18px;"> academic </th> <th style="text-align:left;background-color: #ffffff !important;font-size: 18px;"> lecture </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 1 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 28 Feb </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> RB </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> Simulating behaviour </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 2 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 7 Mar </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> RB </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> Targeted marketing </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 3 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 14 Mar </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> RO </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> Machine learning </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 4 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 21 Mar </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> JG/NM </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> Agent-based modelling </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 5 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 25 Apr </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> JG/NM </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> Close </td> </tr> </tbody> </table> <br> .small-font[ RB: Roger Beecham, RO: Rachel Oldroyd, JG: Jiaqi Ge, NM: Nick Malleson ] .small-font[ Additional drop-in sessions via Teams: <br> -- Tuesdays 1:00pm–2:00pm and led by [Will James](https://environment.leeds.ac.uk/geography/staff/1054/dr-will-james). ] --- ## Assessment * Assignment 1: 75% + Individual data analysis based on labs 1, 2 (and 3) + 2,000 words, 4 figures + Thu 24th Mar 2022 by 2pm <br> * Assignment 2: 25% + Group presentations based on lab 4 + Presentations for times session Wed 27th Apr 2022 --- ## Assessment <table> <thead> <tr> <th style="text-align:left;color: #616161 !important;background-color: #ffffff !important;font-size: 18px;"> session </th> <th style="text-align:left;color: #616161 !important;background-color: #ffffff !important;font-size: 18px;"> wc </th> <th style="text-align:left;color: #616161 !important;background-color: #ffffff !important;font-size: 18px;"> academic </th> <th style="text-align:left;color: #616161 !important;background-color: #ffffff !important;font-size: 18px;"> lecture </th> <th style="text-align:left;color: #616161 !important;background-color: #ffffff !important;font-size: 18px;"> deadline </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 1 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 28 Feb </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> RB </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> Simulating behaviour </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 2 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 7 Mar </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> RB </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> Targeted marketing </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 3 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 14 Mar </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> RO </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> Machine learning </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 4 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 21 Mar </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> JG/NM </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> Agent-based models </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> Ass 1 </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 5 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 25 Apr </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> JG/NM </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> Close </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> Ass 2 </td> </tr> </tbody> </table> <br> .small-font[ RB: Roger Beecham, RO: Rachel Oldroyd, JG: Jiaqi Ge, NM: Nick Malleson ] .small-font[ Additional drop-in sessions via Teams: <br> -- Tuesdays 1:00pm–2:00pm and led by [Will James](https://environment.leeds.ac.uk/geography/staff/1054/dr-will-james). ] --- ## Assessment <table> <thead> <tr> <th style="text-align:left;color: #616161 !important;background-color: #ffffff !important;font-size: 18px;"> session </th> <th style="text-align:left;color: #616161 !important;background-color: #ffffff !important;font-size: 18px;"> wc </th> <th style="text-align:left;color: #616161 !important;background-color: #ffffff !important;font-size: 18px;"> academic </th> <th style="text-align:left;color: #616161 !important;background-color: #ffffff !important;font-size: 18px;"> lecture </th> <th style="text-align:left;color: #616161 !important;background-color: #ffffff !important;font-size: 18px;"> deadline </th> <th style="text-align:left;color: #616161 !important;background-color: #ffffff !important;font-size: 18px;"> progress </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 1 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 28 Feb </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> RB </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> Simulating behaviour </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> #1 data </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 2 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 7 Mar </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> RB </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> Targeted marketing </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> #1 analysis </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 3 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 14 Mar </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> RO </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> Machine learning </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> #1 analysis </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 4 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 21 Mar </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> JG/NM </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> Agent-based models </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> Ass #1 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> #2 material </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 5 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 25 Apr </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> JG/NM </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> Close </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> Ass #2 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> #2 surgery </td> </tr> </tbody> </table> <br> .small-font[ RB: Roger Beecham, RO: Rachel Oldroyd, JG: Jiaqi Ge, NM: Nick Malleson ] .small-font[ Additional drop-in sessions via Teams: <br> -- Tuesdays 1:00pm–2:00pm and led by [Will James](https://environment.leeds.ac.uk/geography/staff/1054/dr-will-james). ] --- ## How <img src = "img/why_r.png", width = 60%></img> --- ## R Resources .small-font[ * H. Wickham and G. Grolemund, R for Data Science, O’Reilly Media, 2017. + *The primer for doing data analysis with R.* + [Free online](https://r4ds.had.co.nz). * R. Lovelace et al., Geocomputation with R, CRC Press, 2019. + *Comprehensively introduces spatial data handling in R.* + [Free online](https://geocompr.robinlovelace.net). * K. Healy, Data Visualization: A Practical Introduction, Princeton University Press, 2018. + *Integrates ggplot2 code with key Information Visualization theory and using real social science datasets.* + [Free online -- draft version](https://socviz.co). ] --- ## Module Resources * [Module website](https://www.roger-beecham.com/predictive-analytics/) <br><br> * [Module handbook](https://minerva.leeds.ac.uk/ultra/courses/_533807_1/cl/outline) <br><br> * [Assignment briefs](https://minerva.leeds.ac.uk/ultra/courses/_533807_1/cl/outline) <br><br> * [Assignment submission page](https://minerva.leeds.ac.uk/ultra/courses/_533807_1/cl/outline) ??? If not already, spend time on Module Handbook. --- ## Module Website <!-- <img src = "img/web1.png", width = 40%, style = "position:absolute; top: 20%; left: 8%; box-shadow: 3px 5px 3px 1px #00000080;"></img> <img src = "img/web2.png", width = 45%, style = "position:absolute; top: 30%; left: 18%; box-shadow: 3px 5px 3px 1px #00000080;"></img> <img src = "img/web3.png", width = 38%, style = "position:absolute; top: 40%; left: 28%; box-shadow: 3px 5px 3px 1px #00000080;"></img> --> --- ## How to Learn -- * *Come to "remote" lectures and labs* <br> -- * *Engage: Try stuff out, ask questions* <br> -- * *Contribute to the module [Slack](https://predictive-analytics.slack.com)* <br> -- * *Coursework throughout: lectures and labs* --- class: center middle # Introduction to Predictive Analytics ??? * That's introduction to module... * Push on with a bit of theory… * Why Predictive Analytics --- ## Data-driven science **"big data"** and **"data science"** on Google Trends, Oct 2019 <img src = "img/bigdata_trends.png", width = 70%, style = "position:relative; top: 25%; left: 0%; box-shadow: 3px 5px 3px 1px #00000080;"></img> ??? Know already, but from 2010s -- big data is a big industry -- transforming the way businesses, societies run Calling upon new ways of doing science --- ## Data-driven science .small-font[ * 1000 years ago -- **experimental science** + Description of natural phenomena * 100s years ago -- **theoretical science** + Newton’s laws, Maxwell’s Equations * <50 years ago -- **computational science** + Simulate complex phenomena * Today -- **data-intensive science** + Generate knowledge through observation ] <img src = "img/fourth_paradigm.jpeg", width = 35%, style = "position:absolute; top: 25%; left: 55%; box-shadow: 3px 5px 3px 1px #00000080;"></img> ??? Tony Hey : * Experimental science : all we could know about was what we observed directly * Theory : abstracting above observations to generate fundamental laws about how things are * Sumulation : Test theories by simulating real world * Today : Back observation, but deriving new ways in new ways --- ## Data-driven science .small-font[ > *The next generation of scientific discovery will be data-driven as previously unrecognised patterns are discovered by analysing massive and mixed datasets.* > > David Willets MP, 2013, Then Minister for Universities and Science ] -- <img src = "img/end_of_theory.png", width = 50%, style = "position:absolute; top: 45%; left:10%;"></img> --- ## Data-driven science : example <div class="embed-responsive embed-responsive-16by9"> <iframe width="500" height="350" class="embed-responsive-item" src="https://www.youtube.com/embed/6111nS66Dpk" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> </div> ??? Generate knowledge in new ways * Real-time google searches tell us something about what is happening in the world. * Association between how many people search for flu-related topics and how many people actually have flu symptoms. * Can we use these to to estimate how much flu is circulating and predict outbreak? --- ## Data-driven science : example <img src = "img/google_flu_traps.png", width = 60%></img> [`Lazer et al. 2014`](https://science.sciencemag.org/content/343/6176/1203.full) ??? * Biases need to be more thoroughly thought through --- ## Data-driven science : example <img src = "img/epj_title.png", width = 40%, style = "position:relative; top: 5%;"></img> <img src = "img/epj_pic.png", width = 40%, style = "position:relative; top: 15%;"></img> ??? * Generate new types of knowledge — newly opened up for analysis *How* people choose to decorate their homes — is is culturally and geographically distinctive. Or in a globalised world does a typical home London decorate the same as a typical home in Tokyo? Computer Vision algorithms to automatically detect presence of certain ornaments and stylistic objects —see if there is spatial consistency in these… --- ## Data-driven science : example .small-font[ [`fivethirtyeight.com/parole-assessment-simulator/`](https://projects.fivethirtyeight.com/parole-assessment-simulator/) ] <img src = "img/fivethirtyeight.png", width = 11%, style = "position:absolute; top: 30%; left: 8%;"></img> <img src = "img/science_of_sentencing.png", width = 37%, style = "position:absolute; top: 30%; left: 18%;"></img> ??? * Sentencing based typically on a criminal’s past crimes… * But here on *likely* crimes that individuals might commit in the future. * Based on statistical associations in past offender dataset * Using statistical probabilities based on factors such as age, employment history and prior criminal record * *Targeted* intervensions --- ## Data-driven science : this module <br> .pull-left[.small-font[ **Data mining and machine learning** — <br> Detect hidden patterns in data **Information Visualization** — <br> Explore complex structure and patterns in data **Predictive analytics** — <br> Predict, under uncertainty, what will happen in future ] ] <img src = "img/geog5927.png", width = 26%, style = "position:absolute; top: 30%; left: 55%;"></img> --- ## Data-driven science : assignments <br> .pull-left[.small-font[ .small-font[ **Assignment 1 -- ** <br> Generate a large synthetic dataset of customers <br> and look for behavioural and demographic associations between individuals <br> to better *target* marketing activity. <br> **Assignment 2 -- ** <br> Use data and heuristics to explore and predict <br> how customers behave and respond to different store formats. ]] ] <img src = "img/geog5927.png", width = 26%, style = "position:absolute; top: 30%; left: 55%;"></img> --- class: center middle # Session 1 : Simulating Behaviour --- ## Simulating behaviour practical <img src = "img/sim_behaviour_prac.png", width = 60%, style = "position:absolute; top: 30%;"></img> --- ## Spatial microsimulation .pull-left[.right[ `Survey data` .small-font[ individual-level and rich in detail <br> small sample and may be biased ]] ] <img src = "img/icon_survey.jpg", width = 20%, style = "position:absolute; top: 28%; left: 52%;"></img> <br><br><br><br><br><br><br> .pull-right[.left[ `Census data` .small-font[ high-level and low in detail <br> population-level and complete ]] ] <img src = "img/icon_census.jpeg", width = 20%, style = "position:absolute; top: 65%; left: 28%;"></img> ??? Many situations where interested in knowing population that lives in an area: interests, preferences, spending patterns Are population-level datasets that exist.: Census is amazing for counting people according to high-level characteristics... But we only have a limited set of attribute information. Lots missed off from it – commercial setting – interests and preferences. Instead -- rely on comparatively small sample survey data for studying interests and preferences. Spatial microsimulation allows us to match rich individual-level data to a population we know less about. --- ## Spatial microsimulation <br><br><br> .small-font[ > *The creation, analysis and modelling of individual-level data allocated to geographic zones.* > > Lovelace & Dumont 2016 ] --- ## Spatial microsimulation <br><br><br> <img src = "img/individuals_areas.jpg", width = 50%, style = "position:relative"></img> ??? 1. Start: survey of individuals and a geographic area. Each one of these spatial units is a small area of Leeds. 2. Allocate individuals from the survey to the small spatial units of Leeds. Lots of copies of individuals – such that they sum to the entire population in that area. --- ## Spatial microsimulation <br><br><br> <img src = "img/individuals_areas_constraints.jpg", width = 60%, style = "position:relative"></img> ??? Random allocation sounds not very sensible. In microsim do so using some prior information on those spatial units – that you know about from a real dataset -- from the census in this case. If Census says that 60% are orange, 30% black and 10% blue, copy individuals from the survey based data in those same proportions... Notice arrows are varying in thickness to communicate this. Census acts as constraints – give us a better approximation of the populations than just randomly allocating. --- ## Spatial microsimulation <br><br><br><br><br><br> <img src = "img/individuals_areas_2.jpg", width = 70%, style = "position:relative"></img> --- ## Spatial microsimulation <br><br><br><br> <img src = "img/individuals_areas_consraints_2.jpg", width = 75%, style = "position:relative"></img> ??? Blue: 40% Black: 40% Orange: 20% --- ## Spatial microsimulation <br> .pull-left[.right[ Microsimulation does not<br> generate **new data** <br> `-----` ]] -- <br><br> .pull-right[.left[ `-----` <br> But **copies of existing data** ]] --- ## Spatial microsimulation: Examples Health : Smoking <br> .small-font[[Tomintz et al. 2008](https://rgs-ibg.onlinelibrary.wiley.com/doi/abs/10.1111/j.1475-4762.2008.00837.x)] -- .small-font[ .pull-left[.right[ **Why?** <br> Reported in individual surveys, but not population-level and not from place-to-place ]]] -- .small-font[.pull-right[.left[ **Benefits** <br> Could be used to target/locate smoking support clinics ]]] -- .small-font[.pull-right[.left[ **"Benefits"** <br> Could be used by a Tobacco company for targeting investment ]] ] ??? Estimate levels of smoking in a population. --- ## Spatial microsimulation: Examples Economics : Policy Evaluation <br> .small-font[[De Agostini et al. 2016](https://www.econstor.eu/bitstream/10419/197592/1/868840475.pdf)] -- .small-font[ .pull-left[.right[ **Why?** <br> Simulate / spread impacts inferred from individual-level data over an entire country ]]] -- .small-font[ .pull-right[.left[ **Benefits** <br> Quantify (under uncertainty) the impacts of a regressive welfare reform at the country-level ]]] -- .small-font[.pull-right[.left[ **Benefits** <br> Evidence-based decision-making ]]] --- ## Spatial microsimulation: Examples Transport : Simulating travel behaviour <br> .small-font[[Lovelace et al. 2014](https://www.sciencedirect.com/science/article/pii/S0966692313001361)] -- .small-font[ .pull-left[.right[ **Why?** <br> When designing infrastructure, want to know about the distribution of individuals meeting a particular set of characteristics ]]] -- .small-font[ .pull-right[.left[ **Benefits** <br> provide evidence around likely winners and losers of a new infrastructure investment ]] ] --- ## Spatial microsimulation: Assumptions .pull-left[ .small-font[ 1. Individual-level microdata are representative of the study area <br> 2. Target variable is dependent on the constraint variables in a way that is relatively constant over space and time <br> 3. Input microdataset and constraints are sufficiently rich and detailed to reproduce the full diversity of individuals and areas in the study region ]] ??? Want to establish some link between the phenomena that you’re studying and the demographics that you’re using as constraints.. Points 2 and 3. For example , you might be studying cycling propensity. There’s an association between cycling and social-demographics. But is the relationship between socio-demographics and propensity to cycle is the same in London as it is in Newcastle – if we’re simulating from survey data over a country-level. Final point: you want census constraints that discriminate people well. --- ## Simulating behaviour practical .pull-left[ <br> <img src = "img/sim_behaviour_prac.png", width = 100%, style = "position:relative; top: 40%;"></img> ] .pull-right[ <br> .small-font[ `individuals.csv : 15,189 records` <br> `-------` <br> spatial microsimulation <br> `-------` <br> `simulated_oac_age_sex.csv : 320,596 records` ]] --- ## Simulating behaviour practical: survey dataset <br> `individuals.csv` `15,189 records` <table> <thead> <tr> <th style="text-align:left;background-color: #ffffff !important;font-size: 18px;"> var_name </th> <th style="text-align:left;background-color: #ffffff !important;font-size: 18px;"> var_values </th> <th style="text-align:left;background-color: #ffffff !important;font-size: 18px;"> var_type </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 16px;"> age_band </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 16px;"> a24under, ... </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 16px;"> demographic </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 16px;"> income_band </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 16px;"> 11-15k, ... </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 16px;"> demographic </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 16px;"> oac_grp </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 16px;"> 1,2,3,... </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 16px;"> geodemographic </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 16px;"> uk_airport </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 16px;"> MAN, DSA, ... </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 16px;"> preference </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 16px;"> overseas_airport </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 16px;"> TFS, EFL, ... </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 16px;"> preference </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 16px;"> satisfaction_overall </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 16px;"> 1_poor, ... </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 16px;"> preference/attitude </td> </tr> </tbody> </table> ??? People holidaying to destinations… --- ## Simulating behaviour practical : use case <img src = "img/beach.jpeg", width = 50%, style = "position:absolute; top: 30%; left: 10%;"></img> ??? There are different types of holiday destination. --- ## Simulating behaviour practical : use case <img src = "img/beach_people.jpeg", width = 50%, style = "position:absolute; top: 30%; left: 10%;"></img> ??? Different categories of people to different destinations --- ## Simulating behaviour practical : use case .small-font[ `individuals.csv` `15,189 records` <br> `--------` ] <img src = "img/geogs.png", width = 90%, style = "position:relative;"></img> ??? This is just a survey… Imagine we’re a Leeds-based company… Want to know how whole of Leeds is likely to behave Use microsimulation to get to this level --- ## Module Schedule <table> <thead> <tr> <th style="text-align:left;background-color: #ffffff !important;font-size: 18px;"> session </th> <th style="text-align:left;background-color: #ffffff !important;font-size: 18px;"> wc </th> <th style="text-align:left;background-color: #ffffff !important;font-size: 18px;"> lecture </th> <th style="text-align:left;background-color: #ffffff !important;font-size: 18px;"> lab </th> <th style="text-align:left;background-color: #ffffff !important;font-size: 18px;"> recap </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 1 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> 28 Feb </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> mon 1500-1600 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> wed 1100-1800 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;"> fri 1700-1800 </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 2 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 7 Mar </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> mon 1500-1600 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> wed 1100-1800 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> fri 1700-1800 </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 3 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 14 Mar </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> mon 1500-1600 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> wed 1100-1800 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> fri 1700-1800 </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 4 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 21 Mar </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> mon 1500-1600 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> wed 1100-1800 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> fri 1700-1800 </td> </tr> <tr> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 5 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> 25 Apr </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> mon 1500-1600 </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> NA - pres </td> <td style="text-align:left;background-color: #ffffff !important;font-size: 20px;color: #616161 !important;"> NA </td> </tr> </tbody> </table> .small-font[ Additional drop-in sessions via Teams: <br> -- Tuesdays 1:00pm–2:00pm and led by [Will James](https://environment.leeds.ac.uk/geography/staff/1054/dr-will-james). ]