Remotely Sensing Cities and Environments

background-image: url("img/CASA_Logo_no_text_trans_17.png")
background-size: cover
background-position: center

# Remotely Sensing Cities and Environments

### Lecture 7: Classification The Big Questions (Lecture 6 continued) and Accuracy

### 28/06/2022 (updated: 07/03/2025)

<a href="https://github.com/andrewmaclachlan" class="github-corner" aria-label="View source on GitHub"><svg width="80" height="80" viewBox="0 0 250 250" style="fill:#fff; color:#151513; position: absolute; top: 0; border: 0; left: 0; transform: scale(-1, 1);" aria-hidden="true"><path d="M0,0 L115,115 L130,115 L142,142 L250,250 L250,0 Z"></path><path d="M128.3,109.0 C113.8,99.7 119.0,89.6 119.0,89.6 C122.0,82.7 120.5,78.6 120.5,78.6 C119.2,72.0 123.4,76.3 123.4,76.3 C127.3,80.9 125.5,87.3 125.5,87.3 C122.9,97.6 130.6,101.9 134.4,103.2" fill="currentColor" style="transform-origin: 130px 106px;" class="octo-arm"></path><path d="M115.0,115.0 C114.9,115.1 118.7,116.5 119.8,115.4 L133.7,101.6 C136.9,99.2 139.9,98.4 142.2,98.6 C133.8,88.0 127.5,74.4 143.8,58.0 C148.5,53.4 154.0,51.2 159.7,51.0 C160.3,49.4 163.2,43.6 171.4,40.1 C171.4,40.1 176.1,42.5 178.8,56.2 C183.1,58.6 187.2,61.8 190.9,65.4 C194.5,69.0 197.7,73.2 200.1,77.6 C213.8,80.2 216.3,84.9 216.3,84.9 C212.7,93.1 206.9,96.0 205.4,96.6 C205.1,102.4 203.0,107.8 198.3,112.5 C181.9,128.9 168.3,122.5 157.7,114.1 C157.9,116.9 156.7,120.9 152.7,124.9 L141.0,136.5 C139.8,137.7 141.6,141.9 141.8,141.8 Z" fill="currentColor" class="octo-body"></path></svg></a><style>.github-corner:hover .octo-arm{animation:octocat-wave 560ms ease-in-out}@keyframes octocat-wave{0%,100%{transform:rotate(0)}20%,60%{transform:rotate(-25deg)}40%,80%{transform:rotate(10deg)}}@media (max-width:500px){.github-corner:hover .octo-arm{animation:none}.github-corner .octo-arm{animation:octocat-wave 560ms ease-in-out}}</style>

---

# How to use the lectures

- Slides are made with [xaringan](https://slides.yihui.org/xaringan/#1)

- Control + F will also search

- Press enter to move to the next result

- <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M410.3 231l11.3-11.3-33.9-33.9-62.1-62.1L291.7 89.8l-11.3 11.3-22.6 22.6L58.6 322.9c-10.4 10.4-18 23.3-22.2 37.4L1 480.7c-2.5 8.4-.2 17.5 6.1 23.7s15.3 8.5 23.7 6.1l120.3-35.4c14.1-4.2 27-11.8 37.4-22.2L387.7 253.7 410.3 231zM160 399.4l-9.1 22.7c-4 3.1-8.5 5.4-13.3 6.9L59.4 452l23-78.1c1.4-4.9 3.8-9.4 6.9-13.3l22.7-9.1v32c0 8.8 7.2 16 16 16h32zM362.7 18.7L348.3 33.2 325.7 55.8 314.3 67.1l33.9 33.9 62.1 62.1 33.9 33.9 11.3-11.3 22.6-22.6 14.5-14.5c25-25 25-65.5 0-90.5L453.3 18.7c-25-25-65.5-25-90.5 0zm-47.4 168l-144 144c-6.2 6.2-16.4 6.2-22.6 0s-6.2-16.4 0-22.6l144-144c6.2-6.2 16.4-6.2 22.6 0s6.2 16.4 0 22.6z"/></svg> In the top right let's you draw on the slides, although these aren't saved.

- Pressing the letter `o` (for overview) will allow you to see an overview of the whole presentation and go to a slide

- Alternatively just typing the slide number e.g. 10 on the website will take you to that slide

- Pressing alt+F will fit the slide to the screen, this is useful if you have resized the window and have another open - side by side.

<div>
<style type="text/css">.xaringan-extra-logo {
width: 50px;
height: 128px;
z-index: 0;
background-image: url(img/casa_logo.jpg);
background-size: contain;
background-repeat: no-repeat;
position: absolute;
top:1em;right:2em;
}
</style>
<script>(function () {
  let tries = 0
  function addLogo () {
    if (typeof slideshow === 'undefined') {
      tries += 1
      if (tries < 10) {
        setTimeout(addLogo, 100)
      }
    } else {
      document.querySelectorAll('.remark-slide-content:not(.title-slide):not(.inverse):not(.hide_logo)')
        .forEach(function (slide) {
          const logo = document.createElement('div')
          logo.classList = 'xaringan-extra-logo'
          logo.href = null
          slide.appendChild(logo)
        })
    }
  }
  document.addEventListener('DOMContentLoaded', addLogo)
})()</script>
</div>
---
# Lecture outline

### Part 1: Landcover classification (contiuned)

### Part 2: Accuracy

]

.pull-right[
<img src="img/satellite.png" width="100%" />
.small[Source:[Original from the British Library. Digitally enhanced by rawpixel.](https://www.rawpixel.com/image/571789/solar-generator-vintage-style)
]]

---
class: inverse, center, middle

# What do we need (current or historic) landcover data for?

---

# Can we just used pre-classified data

---

# Pre-classified data

* GlobeLand30 - 30m for 2000, 2010 and 2020: http://www.globallandcover.com/home_en.html?type=data

* European Space Agency’s (ESA) Climate Change Initiative (CCI) annual global land cover (300 m) (1992-2015): https://climate.esa.int/en/projects/land-cover/data/

* Dynamic World - near real time 10m: https://www.dynamicworld.app/explore/ 
  * A major benefit of an AI-powered approach is the model looks at an incoming Sentinel-2 satellite image and, for every pixel in the image, estimates the degree of tree cover, how built up a particular area is, or snow coverage if there’s been a recent snowstorm, for example

* MODIS: https://modis.gsfc.nasa.gov/data/dataprod/mod12.php

* Google building data: https://sites.research.google/open-buildings/

---

# Dynamic World

* Semi-supervised approach
  * Divided World into regions (Western Hemisphere (160°W to 20°W), Eastern Hemisphere-1 (20°W to 100°E), and Eastern Hemisphere-2 (100°E to 160°W))
  * Divided them into 14 Biomes 
  * Stratified samples based on NASA MCD12Q1 land cover for 2017 + others
]
.panel[.panel-name[Training]

* Expert group labeled approximately 4,000 image tiles  
* Non-expert - 20,000
* 409 image tiles were held back 
* minimum mapping unit of 50 × 50 m (5 × 5 pixels) used Labelbox
* label at least 70% of a tile within 20 to 60 minutes

* skill differential between the non-expert and expert groups
* linearly interpolating the distributions per-pixel from their one-hot encoding, weight on 0.2 experts and 0.3 non-experts

> chat GPT "This suggests that instead of treating the classification as a strict single-label (e.g., "this pixel is 100% 'trees'"), the probabilities of each class are interpolated over time or space to provide a smoother transition."

* ~82% confidence on the true class for experts and ~73% confidence on the true class for the non-expert

]
.panel[.panel-name[Pre-processing]

* Use SR for labelling BUT used TOA (level L1C) for the model as SR only from 2017
* Masked clouds and shadows
* Weights for each pixel (I think these are the probabilities for each pixel based on the user weights)
* Augmentations - rotation of image (rotate them) bands (band ratioing) to improve model
]

* We first log-transform the raw reflectance values to equalize the long tail of highly reflective surfaces

* remap percentiles of the log-transformed values to points on a sigmoid function

* use these output values which reduce the value ranges
]

* Fully Convolutional Neural Network (FCNN)
* Learns a mapping from estimated probabilities back to the input reflectances (synthesis model gradiet)
  * basically means it is re-learning from the output data "the backward" model
* Pass all normalized bands except B1, B8A, B9 and B10 after bilinear upscaling to ee.Model.predictImage.

* Runs automatically after each new image

* It looks blobby as the training data is 50x50m and also CNN*

]

* Convolution Neural Network (ConvNet/CNN) form of **deep learning**

* Deep learning is a sub section of machine learning focused on neural networks with big datasets

* The potential issue here is with the convolution = a moving window filter (see next tab)

* This is the start of the CNN process

* Similar to texture, using a moving window.

Further reading: [A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way](https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53)

]

<img src="img/conv.gif" width=".1%" />
.small[Source:[Original from the British Library. Digitally enhanced by rawpixel.](https://www.rawpixel.com/image/571789/solar-generator-vintage-style)]

]

* Accuracy is assessed through a **Confusion matrix** - see next slides
  * This is a common approach in classification
  
* However, they note that this might not be appropriate:
  * Different products
  * Live updates

]

.small[Visual comparison of Dynamic World (DW) to other global and regional LULC datasets for validation tile locations in (A) Brazil (−11.437°, −61.460°), (B) Norway, (61.724°, 6.484°), and (C) the United States (39.973°, −123.441°). Datasets used for comparison include 300 m European Space Agency (ESA) Climate Change Initiative (CCI); 100 m Copernicus Global Land Service (CGLS) ProbaV Land Cover dataset; 10 m ESA Sentinel-2 Global Land Cover (S2GLC) Europe 2019; 30 m MapBiomas Brazil dataset; and 30 m USGS National Land Cover Dataset (NLCD). Each map chip represents a 5.1 km by 5.1 km area with corresponding true-color (RGB) Sentinel-2 image shown in the first column. All products have been standardized to the same legend used for DW. Note differences in resolution as well as differences in the spatial distribution and coverage of land use land cover classes. Source:[Brown et al. 2022](https://www.nature.com/articles/s41597-022-01307-4#code-availability)]
]

* The training data is online and also used in the ESRI LULC 2020 map: https://doi.pangaea.de/10.1594/PANGAEA.933475?format=html#download

* The code is online - see the paper.
]
.panel[.panel-name[Radiant MLHub]

* At the same time (sadly) [Radiant MLHub](https://mlhub.earth/) launched the first open library dedicated to EO training for machine learning...

<img src="img/landcovernet.png" width="45%" style="display: block; margin: auto;" />
.small[Source:[Radiant MLHub](https://mlhub.earth/datasets?search=landcovernet)]
]
]
---
class: inverse, center, middle

# Before we progress....thoughts on this?

### What was the data (SR, TOA)

### How was it trained

### What are the issues

### Do you think it's any good

---

# Next up

# Object based image analysis and sub pixel analysis

---

# Object based image analysis (OBIA)

* Does a raster cell represent an object on the ground?

* Instead of considering cells we consider shapes based on the similarity (homogeneity) or difference (heterogeneity) of the cells = **superpixels**

* **SLIC** (Simple Linear Iterative Clustering) Algorithm for Superpixel generation is the [most common method](doi:10.1109/TPAMI.2012.120)
  * regular points on the image 
  * work out spatial distance (from point to centre of pixel) = **closeness to centre**
  * colour difference (RGB vs RGB to centre point) = **homogenity of colours**

]

]

---

# Object based image analysis (OBIA) 2

* Each iteration the centre moves- 4-10 is best (based on orignal paper)

* The values can change and the boarders move (like k-means?)

* Doesn't consider connectivity = very small cells

* Can enforce connectivity (remove small areas and merge them)

* `$S$` = distance between initial points

* `$m$` = compactness = balance between physical distance (larger value) and colour (spectral distance, then smaller `$m$`) 
]

* Can only use Euclidean distance in SLIC

.small[Supercells Source:[Darshite Jain](https://darshita1405.medium.com/superpixels-and-slic-6b2d8a6e4f08)]
]

---

# Object based image analysis (OBIA) 3

* **Supercells** package can use any distance measure (e.g. dissimilarity)

* `$k$` = number of super pixels

* `$compactness$` = impact of spatial (higher value = sqaures) vs colour (lower value)

* `$transform$` = not on raw data, but to LAB colour space (L*a*b*, is a color space defined by the International Commission on Illumination (abbreviated CIE))

* We can then take the [average values per object](https://jakubnowosad.com/ogh2021/#24) and classify them using methods we've seen

* Other metrics can also be computed - e.g. length to width ratio (see Jensen p.418)

]

---
# Object based image analysis (OBIA) 3

Note that there are many OBIA classifiers, they all do similar, but slightly different processes - see Jensen page 415

A more advanced package would be [**SegOptim**](https://segoptim.bitbucket.io/docs/) that can use algorithms from other software

---

# Sub pixel analysis

If you have a pixel composed of a range of land cover types should it **be classified as one landcover** or should **we calculate the proportions ?**

.small[Comparison of true colour high spatial resolution data (a) (acquired from 14 March 2007) and Landsat surface reflectance (b) (acquired on 6 October 2007 [path 112]), highlighting the spatial detail captured by high-resolution imagery (c) and the same areas as observed by Landsat (d) for the subset East Beechboro used within this study Source:[MacLachlan et al. 2017](https://www.tandfonline.com/doi/pdf/10.1080/01431161.2017.1346403?needAccess=true&)]

---

# Sub pixel analysis

Termed (all the same): Sub pixel classification, Spectral Mixture Analysis (SMA), Linear spectral unmixing

* SMA determines the **proportion** or **abundance** of landcover per pixel

* the assumption that reflectance measured at each pixel is represented by the linear sum of endmembers weighted by the associated endmember fraction

* Typically we have a few endmembers that are **spectrally pure**
]

.small[Source:[Machado and Small (2013) 2017](https://www.researchgate.net/figure/Perfect-decomposition-with-a-Linear-Spectral-Mixture-Model-LSMM-on-a-30-m-pixel-formed_fig6_259715697)]

]

In R we can [use MESMA from the package RStoolbox](https://jakob.schwalb-willmann.de/blog/spectral-unmixing-using-rstoolbox/) 
---
# Sub pixel analysis 2

* Sum of end member reflectance * fraction contribution to best-fit mixed spectrum

`$$p_\lambda=\sum_{i=1}^{n} (p_{i\lambda} * f_i) + e_\lambda$$`
`$p_\lambda$` =  The pixel reflectance

`$p_i\lambda$` = reflectance of endmember `$i$`

`$f_i$` = fractional cover of end member `$i$`

`$n$` = number of endmembers

`$e_\lambda$` = model error

See, Jensen page 480 - following example taken from there
]

.small[Source:[Plaza et al. (2002)](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1046852)]

]

---

# Sub pixel analysis 3

Not as complicated as it looks...here are some end members for bands 3 and 4

<table>
 <thead>
  <tr>
   <th style="text-align:right;"> Band </th>
   <th style="text-align:right;"> Water </th>
   <th style="text-align:right;"> Vegetation </th>
   <th style="text-align:right;"> Soil </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> 13 </td>
   <td style="text-align:right;"> 22 </td>
   <td style="text-align:right;"> 70 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 4 </td>
   <td style="text-align:right;"> 5 </td>
   <td style="text-align:right;"> 80 </td>
   <td style="text-align:right;"> 60 </td>
  </tr>
</tbody>
</table>

left here is the pixel we are trying to model = end members * the fraction (proportion)

`$$\begin{bmatrix}band 3\\
band 4 \\
sum to 1
\end{bmatrix} =\begin{bmatrix}p water & pveg & psoil\\
p water & pveg & psoil\\
1 & 1 & 1
\end{bmatrix}
\begin{bmatrix}f water\\
f veg\\
f soil
\end{bmatrix}$$`

We take the [inverse matrix (^-1)](https://www.mathsisfun.com/algebra/matrix-inverse.html) of endmembers ...

`$$\begin{bmatrix}13 & 22 & 70\\
5 & 80 & 60\\
1 & 1 & 1
\end{bmatrix} to... \begin{bmatrix}-0.0053 & -0.0127 & 1.1322\\
-0.0145 & 0.0150 & 0.1137\\
0.0198 & -0.0024 & -0.2460
\end{bmatrix}$$`

---
# Sub pixel analysis 4

Then solve ... if our values for the pixel are **25** (band 3) and **57** (band 4)the rows of the first matrix are multiplied by the columns of the second one

`$$\begin{bmatrix}f_{water}\\
f_{veg}\\
f_{soil}
\end{bmatrix}=\begin{bmatrix}-0.0053 & -0.0127 & 1.1322\\
-0.0145 & 0.0150 & 0.1137\\
0.0198 & -0.0024 & -0.2460
\end{bmatrix} \begin{bmatrix}25\\
57\\
1
\end{bmatrix}$$`
This looks like...(from [matrix  calculator](https://matrixcalc.org/en/))

`$$\left(\begin{matrix}
\frac{-53}{10000}*25+\frac{-127}{10000}*57+\frac{5661}{5000}*1 \\
\frac{-29}{2000}*25+\frac{3}{200}*57+\frac{1137}{10000}*1 \\
\frac{99}{5000}*25+\frac{-3}{1250}*57+\frac{-123}{500}*1
\end{matrix}\right)$$`

* Different denominators are just for the different values

---
# Sub pixel analysis 5

And gives...
`$$\begin{bmatrix}0.27\\
0.61\\
0.11
\end{bmatrix}=\begin{bmatrix}-0.0053 & -0.0127 & 1.1322\\
-0.0145 & 0.0150 & 0.1137\\
0.0198 & -0.0024 & -0.2460
\end{bmatrix} \begin{bmatrix}25\\
57\\
1
\end{bmatrix}$$`
This means that within this pixel we have:

* 27% water
* 61% vegetation
* 11% soil

---
# Sub pixel analysis 6

Issues / considerations:

* Pixel purity ?

* Number of End members
  * simplify the process and use the **V-I-S model** in urban areas: Vegetation-Impervious surface-Soil (V-I-S) fractions

* Multiple endmember spectral analysis (MESMA)
  * Increase computation
  * or use a spectral library 
]

.small[Source:[Phinn et al. (2002) Monitoring the composition of urban environments based on the vegetation-impervious surface-soil (VIS) model by subpixel analysis techniques,](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1046852)]

]

---
# Accuracy assessment

After producing and output we need to assign an accuracy value to it (common to machine learning).

In remote sensing we focus on:

* PA Producer accuracy (recall or true positive rate or sensitivity) - vertical
**the classification results meet the expectation of the creator**
* UA User’s accuracy (consumer’s accuracy or precision or positive predictive value) - horizontal
**pixels are incorrectly classified as a known class when they should have been classified as something else**
* OA the (overall) accuracy
]

.small[Source:[Barsi et al. 2018 Accuracy Dimensions in Remote Sensing](https://www.int-arch-photogramm-remote-sens-spatial-inf-sci.net/XLII-3/61/2018/isprs-archives-XLII-3-61-2018.pdf)]
]

---
# Accuracy assessment 2

.small[Source:[Barsi et al. 2018 Accuracy Dimensions in Remote Sensing](https://www.int-arch-photogramm-remote-sens-spatial-inf-sci.net/XLII-3/61/2018/isprs-archives-XLII-3-61-2018.pdf)]

* True positive = model predicts positive class correctly

* True negative = model predicts negative class correctly 
]

* False positive = model predicts positive, but it is negative

* False negative = model predicts negative, but it is positive

]

---

# Accuracy assessment 3

* **producer’s accuracy** defined as the fraction of correctly classified pixels (TP) compared to ground truth data (TP+FN) `$\frac{TP}{TP+FN}$`

* **user’s accuracy** defined as the fraction of correctly classified pixels (TP) relative to all others classified as a particular land cover(TP+FP) `$\frac{TP}{TP+FP}$` - **FP** is different

* **overall accuracy** that represents the combined  fraction of correctly classified pixels (TP +TN) across all land cover types (TP+FP+FN+TN) `$\frac{TP +TN}{TP+FP+FN+TN}$`

]

.small[Source:[Barsi et al. 2018 Accuracy Dimensions in Remote Sensing](https://www.int-arch-photogramm-remote-sens-spatial-inf-sci.net/XLII-3/61/2018/isprs-archives-XLII-3-61-2018.pdf)]
]

---

# Accuracy assessment 4

Example

.small[Source:[Brown et al. 2022 Dynamic World, Near real-time global 10m land use land cover mapping](https://www.nature.com/articles/s41597-022-01307-4.pdf)]

---

# Accuracy assessment 5

* Errors of omission (100-producer's accuracy)
  * Landcover omitted from correct class
  * Type 1 error
  * Urban = (22/22+1) `$\frac{TP}{TP+FN}$`
  * Urban producer = 22/23, 95.65%
   
* Errors of commission (100- user's accuracy)
  * Classified sites for incorrect classifications
  * Urban user = (22/22+9), 70.97% `$\frac{TP}{TP+FP}$`

* Kappa coefficient

]

.small[Source:[Earth Systems Science and Remote Sensing](https://medium.com/@wenzhao.li1989/accuracy-assessment-d164e492274b)]]

---
# Accuracy assessment 6

Producer accuracy ...

> i am pleased that 95.65 % of the urban area that was identified in the reference is urban in the classification

User accuracy ...

> as a user i only find that only 70.97% of the time when i visit an urban area is it acutally urban

Overall accuracy is 77.89%

* is this acceptable for the user?

* there is no single right choice for accuracy measurements

]

.small[Source:[Earth Systems Science and Remote Sensing](https://medium.com/@wenzhao.li1989/accuracy-assessment-d164e492274b)]

* This can also be changed to a fuzzy matrix (e.g. Decidous forest classified as evergreen forest - see Jensen p.575)

]
---
# Accuracy assessment 7

### To Kappa or not to Kappa?

* Designed to express the accuracy of an image compared to the results by chance

* Ranges from 0 to 1

> "Sadly the calls to abandon the use of the kappa coefficient in accuracy assessment seem to have fallen on deaf ears. It may be that the kappa coefficient is still widely used because it has become ingrained in practice and there may be a sense of obligation to use it"

`$k=\frac{p_o - p_e}{1- p_e}$`
]

`$p_o$` is the proportion of cases correctly classified (accuracy) `$\frac{TP +TN}{TP+FP+FN+TN}$`

`$p_e$` expected cases correctly classified by chance (further equations in [Foody 2020](https://reader.elsevier.com/reader/sd/pii/S0034425719306509?token=47B253784FA5346F4A2E26B6DA796DBE71DC53A34AE76AAB1FFA43927EC021937C0C108A42154C4AE774083E4C7BD52F&originRegion=eu-west-1&originCreation=20220707132403)) or [Earth Systems Science and Remote Sensing](https://medium.com/@wenzhao.li1989/accuracy-assessment-d164e492274b)

.small[Source:[Explaining the unsuitability of the kappa coefficient in the assessment and comparison of the accuracy of thematic maps obtained by image classification. Foody 2020](https://reader.elsevier.com/reader/sd/pii/S0034425719306509?token=47B253784FA5346F4A2E26B6DA796DBE71DC53A34AE76AAB1FFA43927EC021937C0C108A42154C4AE774083E4C7BD52F&originRegion=eu-west-1&originCreation=20220707132403)]
---
# Kappa example

.small[Earth Systems Science and Remote Sensing](https://medium.com/@wenzhao.li1989/accuracy-assessment-d164e492274b)

---

# Kappa issues

* What is a good value?

]

* How Kappa values can we have for different levels of accuracy (on x axis )

]

# Have i used Kappa?

### See Jensen page 570

---

# In remote sensing this is typically where we'd stop...but not necessarily in machine learning

---

# A brief overview...

---

# Beyond remote sensing

Beyond **traditional** remote sensing accuracy assessment...

Problem with recall (Producer accuracy) vs Precision (User accuracy)

False positives (Producer) or false negatives (User) more important?

* model with high recall (Producer accuracy) = true positives but some false positives (predicted urban but land cover that isn't urban)

* Model with high precision (User’s accuracy) = actual urban but predicted other landcover

* See [MLU-explain](https://mlu-explain.github.io/precision-recall/) for an interactive example and next slide...
]

.small[Source:[Barsi et al. 2018 Accuracy Dimensions in Remote Sensing](https://www.int-arch-photogramm-remote-sens-spatial-inf-sci.net/XLII-3/61/2018/isprs-archives-XLII-3-61-2018.pdf)]
]

---
class: center, middle

## In the next few slides we will change a decision threshold to see the effects on user's and producer's accuracy.

## Focus on the changing decision threshold...

---
# Our data is not balanced....

We can't have both a high high producer accuracy (recall) **and** a high user's accuracy (precision)

* **user’s accuracy** (precision) ratio of correctly predicted positive classes (TP) to all items predicted to be positive (TP+FP)
 `$\frac{TP}{TP+FP}$` 
 
1/1=100%

**or**

8/19=42% (this is on the next slide)
 
> how precise the model is at positive predictions

> as a user i only find that only x% of the time when i visit a positive point it is actually positive...

]

]

---

# Our data is not balanced....2

We can't have both a high high producer accuracy (recall) **and** a high user's accuracy (precision)

* **producer’s accuracy** (recall) the ratio of correctly predicted positive classes (TP) to all items that are actually positive (TP+FN) `$\frac{TP}{TP+FN}$`

1/8=13% (from previous slide)

**or**

8/8=100%

> how many positive points are correct

> i am pleased that [as a producer] x% of points are correct compared to the reference

]

]

---
# Our data is not balanced....3

**user’s accuracy** (precision)

> I have gone to a site, the model predicted it to be urban, it is not urban...

* How well can the user use the data / classification

**producer’s accuracy** (recall)

> I have gone all the urban sites, they were urban. **BUT** i can see in the distance a site that was predicted to be GRASS but is actually URBAN

* How well did the producer make the data/ classification

---

## Combine them both...into a F1 score

---

# F1 🚗

The F1-Score (or F Measure) combines both recall (Producer accuracy) and Precision (User accuracy):

* `$F1 = \frac{2 * Precision * Recall}{Precision+Recall}$`

Which equals....

* `$F1 = \frac{TP}{TP + \frac{1}{2}*(FP+FN)}$`

* Value from 0 to 1, where 1 is better performance

.small[Source:[Barsi et al. 2018 Accuracy Dimensions in Remote Sensing](https://www.int-arch-photogramm-remote-sens-spatial-inf-sci.net/XLII-3/61/2018/isprs-archives-XLII-3-61-2018.pdf)]
]

---
# F1 🚗 issues

.pull-left[
* No True Negatives (TN) in the equation
  * negative categories that are correctly classified as negative

* Are precision and recall equally important ? 
  * precision (producer): how many positive points are correct
  
  * recall (user): how precise the model is at positive predictions

* What if our data is very unbalanced ? 
  * More negatives than positives?
]

.small[Source:[MLU-EXPLAIN](https://mlu-explain.github.io/precision-recall/)]
]
---
# Receiver Operating Characteristic Curve

Receiver Operating Characteristic Curve (the ROC Curve)

* Receiver Operating Characteristic Curve (the ROC Curve)

* Originates from WW2, USA wanted to minimize noise from radar to identity (true positives) and not miss aircraft...minimizing false positives (clouds)

* **Changing the threshold value of classifier** will change the True Positive rate

* probability that a positive sample is correctly predicted in the positive class...planes predicted to be planes

]

* False positive rate: The probability that a negative sample is incorrectly predicted in the positive class...predicted planes...but are clouds

* Maximise true positives (1) and minimise false positives (0)

* plot starts from right - threshold 0 then TPR and FPR both 1 (right of the plot)

]

---
# Receiver Operating Characteristic Curve

**vertical columns here** - uses whole matrix

* First is True positive

* True positive rate = TP/TP+FN

* Second is False positive rate

* False positive rate = FP/FP+TN
]

.small[Source:[Barsi et al. 2018 Accuracy Dimensions in Remote Sensing](https://www.int-arch-photogramm-remote-sens-spatial-inf-sci.net/XLII-3/61/2018/isprs-archives-XLII-3-61-2018.pdf)]

]

---

# Receiver Operating Characteristic Curve

Receiver Operating Characteristic Curve (the ROC Curve)

* True positive rate = good =every plane is a plane ?
* False positive rate = good = every cloud is predicted as noise (not a plane) ?

when threshold is 0 (everything classified as plane)

> all planes are planes = TPR = 1

but ...

> all clouds are planes = FPR = 1

]

* Maximise true positives (1) and minimise false positives (0)

]
---

# Receiver Operating Characteristic Curve

> Goal: Maximise true positives and minimise false positives...(e.g. urban vs other landcover)

---
# Area Under the ROC Curve

Area Under the ROC Curve (AUC, or AUROC)

* Simply the area under the curve

* Compare models easily (no need to look at the ROC curve)

* Perfect value will be 1, random will be 0.5

> "The AUC is the probability that the model will rank a randomly chosen positive example more highly than a randomly chosen negative example."...

e.g. model always give positive from true negative (so always wrong) = AUC 0
]

* used to determine how well your **binary** classifier stands up to a [gold standard binary classifier. There is also a mutli-class method that follows one vs one or one vs rest discussed last time](https://towardsdatascience.com/multiclass-classification-evaluation-with-roc-curves-and-roc-auc-294fd4617e3a)

]

---
class: inverse, center, middle

# Next time (?)

# How do we get test data for the accuracy assessment?

---

# Remote sensing approach (sometimes)

Same process for all:

* class definition
  * pre-processing
  * training
  * pixel assignment
  * accuracy assessment

**Guidelines**

* Collect training data - suggested as around 250 pixels per class (Foody and Mather, 2006)

* Simply go and collect (or use Google Earth) ground truth data - 50 per class (Congalton, 2001).

* Produce an error matrix

] 
 
  
.pull-right[
<img src="img/supervised-diagram.png" width="100%" style="display: block; margin: auto;" />
.small[Source:[GIS Geography](https://gisgeography.com/supervised-unsupervised-classification-arcgis/)
]

* You would need to consider a **sampling strategy**

* Random sampling
  * Systematic sampling
  * Stratified sampling
  * Jensen p. 565
]

---

# Problems?

### Chapter 13 in Jensen (p.557) cover accuracy assessment...but not the following

---

# Good approach - train and test split

* This is simply holding back a % of the original data used to train the model to then test it at the end

* See the [validation section (10.6.7)](https://andrewmaclachlan.github.io/CASA0005repo_20192020/advanced-r-maup-and-more-regression.html) for an example in linear regression

<img src="img/train_test_split.png" width="100%" style="display: block; margin: auto;" />
.small[Source:[Michael Galarnyk](https://towardsdatascience.com/understanding-train-test-split-scikit-learn-python-ea676d5e3d1)
]

---

# Best approach - cross validation

Really classification of imagery is a machine learning task...

..So why can't we apply the same methods?

Perhaps as it is meant to be iterative...

...e.g. the classifier underpredicts urban then you can go and adjust the training data...

We might take the mean accuracy from the cross validation.

Leave one out cross validation is an extreme version where the folds (often 10) equals the number of samples in the data minus 1...next slide...

]

.pull-right[
<img src="img/cross.jpg" width="80%" style="display: block; margin: auto;" />
.small[Source:[Wikipedia](https://en.wikipedia.org/wiki/Cross-validation_%28statistics%29)
]
]
---
# Leave one out cross validation

* An extreme version of cross validation

* Only for smaller datasets

* Uses all the data except 1, this 1 is the testing set

* Repeats though all of it

<img src="img/leave_one_out_CV.png" width="60%" style="display: block; margin: auto;" />
.small[Source:[Rahil Shaikh](https://towardsdatascience.com/cross-validation-explained-evaluating-estimator-performance-e51e5430ff85)
]

---

# BUT..."Spatial autocorrelation between training and test sets"

## Remember spatial autocorrelation?

## A measure of similarity between nearby data...

---

# Best approach - cross validation

Waldo Tobler's first Law of Geography...

> "everything is related to everything else, but near things are more related than distant things."

* Are training and testing points too close in geographic space?

* How can we deal with taking a sample of training data for testing when they are possibly from the same polygon of training data...

> ‘Training’ observations near the ‘test’ observations can provide a kind of ‘sneak preview’: information that should be unavailable to the training dataset.

---

<img src="img/more_related.jpg" width="60%" style="display: block; margin: auto;" />
.small[Source:[Spatial is Special](https://www.e-education.psu.edu/maps/l2_p2.html)]

---

# Spatial dependence....

Karasiak et al. 2022, [Spatial dependence between training and test sets: another pitfall of classification accuracy assessment in remote sensing](https://link.springer.com/article/10.1007/s10994-021-05972-1)

<img src="img/spatial_dependance.png" width="55%" style="display: block; margin: auto;" />
.small[Average overall accuracy based on the RF classifier for each cross-validation strategy (k-fold CV, LOO CV, SLOO CV) at pixel and object levels. Models were fitted with reference samples of Herault-34 and repeated 10 times (i.e. the y-axis provides the average OA value ± standard deviation). The premature stopping of the pixel-based LOO and SLOO CV approaches was due to excessive computational time Source:[Karasiak et al. 2022](https://towardsdatascience.com/understanding-train-test-split-scikit-learn-python-ea676d5e3d1)
]

(1) a k-fold cross-validation (k-fold-CV) based on random splitting
(2) a non-spatial leave-one-out cross-validation (LOO CV) 
(3) a spatial leave-one-out cross-validation (SLOO CV) using a distance-based buffer relying on Moran’s I statistics.

---

# Why do the pixels perform better than the objects?

---

# Spatial cross validation

---

#  Spatial cross validation

* spatially partition the folded data, folds are from cross validation

* disjoint (no common boundary) using k -means clustering (number of points and a distance)

* same as cross validation but with clustering to the folds...

* stops our training data and testing data being near each other...
  
  > in other words this makes sure all the points (or pixels) we train the model with are far away from the points (or pixels) we test the model with

]

.pull-right[
<img src="img/13_partitioning.png" width="100%" style="display: block; margin: auto;" />
.small[Spatial visualization of selected test and training observations for cross-validation of one repetition. Random (upper row) and spatial partitioning (lower row). Source:[Lovelace et al. 2022](https://geocompr.robinlovelace.net/spatial-cv.html)
]
]

---

## How does this compare to ideas we saw in CASA0005 (e.g. lag, error, GWR)

## Did we test our models there

## How did we deal with nearby points being related ?

## Did we need to generalise for new data or not ?

### If not then we don't need to test the model

---

#  Spatial cross validation 2

Lovelace et al. (2022) use a Support Vector Machine classifier that requires hyperparameters (set before the classification)

**this is on the training data** (e.g. band 4 between grass and urban land cover)

Standard SVM then the classifier will try to **overfit** = perfect for the current data but useless for anything else...

Cortes and Vapnik - **soft margin**, permit misclassifications = controlled with **C**

<img src="img/overfit.png" width="60%" style="display: block; margin: auto;" />
.small[Source:[Soner Yildirim](https://towardsdatascience.com/hyperparameter-tuning-for-support-vector-machines-c-and-gamma-parameters-6a5097416167)
]
]

<img src="img/soft_SVM.png" width="60%" style="display: block; margin: auto;" />
.small[Source:[Soner Yildirim](https://towardsdatascience.com/hyperparameter-tuning-for-support-vector-machines-c-and-gamma-parameters-6a5097416167)
]

]
* **C** = adds penalty (proportional to distance from decision line) for each classified point. Small = image on right, large = image on left. **changes the slope**

.small[Source:[Soner Yildirim](https://towardsdatascience.com/hyperparameter-tuning-for-support-vector-machines-c-and-gamma-parameters-6a5097416167)
]

---

#  Spatial cross validation 3

Lovelace et al. (2022) use a Support Vector Machine classifier that requires hyperparameters (set before the classification)

* **Gamma (or also called Sigma)** = controls the influence of a training point within the classified data
  * low = big radius and many points in same group
  * high = low radius and many groups
  
  
.pull-left[

<img src="img/low_gamma.png" width="60%" style="display: block; margin: auto;" />
.small[Source:[Soner Yildirim](https://towardsdatascience.com/hyperparameter-tuning-for-support-vector-machines-c-and-gamma-parameters-6a5097416167)
]
]

<img src="img/high_gamma.png" width="60%" style="display: block; margin: auto;" />
.small[Source:[Soner Yildirim](https://towardsdatascience.com/hyperparameter-tuning-for-support-vector-machines-c-and-gamma-parameters-6a5097416167)
]
]

.small[Source:[Soner Yildirim](https://towardsdatascience.com/hyperparameter-tuning-for-support-vector-machines-c-and-gamma-parameters-6a5097416167)
]
---

#  Spatial cross validation 4

* **Performance level** each spatial fold (taken from our first k-means cross validation fold division). = Top row below, a typical cross validation fold...

* **Tuning level** each fold (outer) is then divided into 5 again (inner fold).= Bottom row below. Test data is from the fold, not instance test data.

*  **Performance estimation**  Use the 50 randomly selected hyperparameters in each of these inner subfolds, i.e., fit 250 models with random **C** and **Gamma** use the best values to outer fold, based on **AUROC** with inner fold testing data. Note, this is not the main testing data.

<img src="img/13_cv.png" width="55%" style="display: block; margin: auto;" />
.small[Schematic of hyperparameter tuning and performance estimation levels in CV. (Figure was taken from Schratz et al. (2019).. Source:[Lovelace et al. 2022](https://geocompr.robinlovelace.net/spatial-cv.html)
]

---
class: inverse, center, middle

## See the figure on the previous slide..."Using the same data for the performance assessment and the tuning would potentially lead to overoptimistic results"

##...this means tuning on a "normal cross validation fold" is not representative...

## Here tuning of parameters is made on a different subset of the data within each fold...

---
#  Spatial cross validation 6

**Here we have..**.

* 1 outer fold has 5 inner folds with 50 randomly selected hyper parameters = 250 models for **C** and **Gamma**

It's 5 over folds

* Each repetition =  1,250 (250 * 5)

It's 100 times repeated (5 fold cross validation)

* 125,500 models for best hyperparameters

]

**So .... what**

<img src="img/boxplot-cv-1.png" width="100%" style="display: block; margin: auto;" />
.small[Boxplot showing the difference in GLM AUROC values on spatial and conventional 100-repeated 5-fold cross-validation. Source:[Lovelace et al. 2022](https://geocompr.robinlovelace.net/spatial-cv.html)
]

]

---

# Question: What happens if a classificaiton model doesn't consider spatial autocorrelation ?

--
## The model will have better accuracy that it actually does

---

# Question: What methods can we use to deal with it?

---

# Summary

* How should we consider / process EO data, **objects, pixels, mixels (mixed pixel), mixed objects ?**

* What **data should we use assess the accuracy** of our classification models 
  * New dataset to test the output with
  * Train / split the training data
  * Cross validation

* When we have a test dataset **how do we assess the accuracy** 
  * Error matrix 
  * Kappa 
  
* When training and testing our classification models do we need to **consider spatial autocorrelation?** Do the following help

* Object based image analysis 
  * Spatial cross validation

---

# Reading

[Land Use Cover Datasets and Validation Tools](https://link.springer.com/content/pdf/10.1007/978-3-030-90998-7.pdf)

https://doodles.mountainmath.ca/blog/2019/10/07/spatial-autocorrelation-co/

https://geocompr.robinlovelace.net/spatial-cv.html

https://link.springer.com/article/10.1007/s10994-021-05972-1

https://machinelearningmastery.com/loocv-for-evaluating-machine-learning-algorithms/#:~:text=fold%20Cross%2DValidation-,Leave%2Done%2Dout%20cross%2Dvalidation%2C%20or%20LOOCV%2C,has%20the%20maximum%20computational%20cost.

https://machinelearningmastery.com/k-fold-cross-validation/

---