Reproducible geographical information Systems and Science

Andy MacLachlan
The Barlett Centre for Advanced Spatial Analysis, UCL

Talk overview

  • Rationale for teaching and research agenda points

  • Why bother with it?

  • Module and assessment changes

  • Outcomes

  • Future work

Who am i

  • Lecturer in Spatial Data Science and Visualization at CASA, UCL

  • Human Geographer

  • Run our main MSc programme

  • Lead MSc modules in:

    • Geographic information systems and science
    • Remotely sensing cities and environments

Agendas

Teaching

  • Empower students to become leading activists in urban spatial science

  • Equip students with spatial skills for the interpretation of urban data leading to data-informed decisions

  • Instill critical thought based on a solid theoretical and technical foundation to see through buzz words and hype that are often associated with smart cities / urban analytics.

Being able to reproduce content is essential to these

What led me here?

  • Lecture with Carl Howe

2017, 90% of the data in the world today has been created in the last two years alone, at 2.5 quintillion bytes of data a day! - IBM

Data is often misrepresented….

Ok, what about geographic data

A shifting landscape

Paper: Opening practice: supporting reproducibility and critical spatial data science
  • A comparison of Geographical Weighted regression across:
    • 4 open software packages
    • 2 black box / commercial implementations

All of the implementations were tested with the same input data. They all gave the same results except the ESRI/ArcGIS implementation (Li 2018), and although ESRI provide help for the GWR tools, the actual coding is closed—the underlying code is not revealed

A slight tangent

Who has made our boundary data?

Who has made manipulated our boundary data?

Who has manipulated our boundary data?

Redlining

  • 1930s – American Home Owner’s Loan Corporation – prevent missed payments…residential security maps based on race
    • People abandon redlined areas
    • Can’t refinance
    • Less property tax for services

Los Angeles Redlining

Who has manipulated our boundary data?

Gerrymandering

Every 10 years electoral districts are re-drawn “redistricting”– Thomas Hofeller (republican) = PACK and CRACK

  • PACK = put all the democrat voters in 1 district
  • CRACK = sprinkle them out so they never have majority

“Redistricting is democracy at work” - Tom Hofeller

How do I teach reproducibility and data bias awareness


1. Lead by example


1b. Listen to Alumni / employers


1c. Learn by doing


2. Don’t assess it, make it mandatory for the assessment*

Lead by example

  • Traditional labs and were distributed in pdfs, word documents and powerpoints.

  • Used ArcGIS 💰

1. Lead by example

Problems:

  • Static
  • Require updates with software / data
  • Material is in a pdf
  • Not easily searchable, limits it to a one time use
  • A lot of time invested
  • Only used by students on the module

alumni, academia and industry value programmatic and importantly reproducible GIS analysis

1b. Listen to Alumni / employers

Design and outputs

Design and outputs

  • Course divided into two parts:
    • Part 1: GIS tools
    • Part 2: GIS analysis

In part 2 we shift from subject based to problem based learning….

Each practical answers a question….

What are the factors that might lead to variation in Average GCSE point scores across the city?

1c. Design and outputs

It is not enough to simply give students some material and hope they will immediately learn it. Learning happens by doing


Weekly homework that we dedicate time to discussing

  • Week 1-5 tasks
  • Week 6-9 practice exam

1c. Design and output

Part 1: GIS tools…

You need calculate the average percent of science students (in all) grades per county meeting the required standards and produce a map to show where the Country averages are above or below the State of Washington average.

Source: Allison Horst

2. Make it mandatory for the assessment

Previous iterations

Your task is to carry out a mini research project which answers a pertinent or topical geo-spatial question

Note

Students struggled to come up with a question, source data and apply it

  • Plagiarism

  • Poor applications

  • Very unbalanced reports

What are we assessing?

Can they apply the tools / methods with different scenarios and data ?

Practice exam (weeks 5-10), 10%

Final exam (6 hour open book), 90%

2. Make it mandatory for the assessment

Part 2: GIS analysis


Example practice question

New York City wish to conduct a study that aims to prevent people being evicted through understand possible related factors.You have been enlisted as a consultant and tasked to conduct an analysis of their data from 2020.

Data:

2. Make it mandatory for the assessment

2. Make it mandatory for the assessment

GitHub classroom - setup

  • Create a template repository “submission”

  • Import to an “organisation” - shared folder that staff can access

  • GitHub classroom creates a URL for the template

2. Make it mandatory for the assessment

Students

  • Click the URL and generates a new repository

  • Staff can see their work and when they make edits (commit / push)

2. Make it mandatory for the assessment

End of exam

  • Bulk download through the GitHub classroom desktop application

  • Run the student’s code locally

  • See how often the student edited the code

Outcomes

Less (almost no) plagiarism

Marks have improved

All students learn to use reproducibility tools

The tool is not directly assessed

Global use and interaction with the resource

Current advancements


New term 2 module - Remotely sensing cities and environments


Presentation hosted online - Xaringan

Weekly online portfolio - Quarto

Tools are expected not assessed

Tools are expected not assessed

Conclusion


Lead by example


Learn by doing


Tools become the norm


Respond to industry requirements


Consider what needs assessing