+ - 0:00:00
Notes for current slide
Notes for next slide

Introduction to Spatial Data Analysis in R

useR! 2022

Adithi R Upadhya, Meenakshi Kushwaha, and Pratyush Agrawal

ILK Labs

20th June 2022

1 / 79

Hi, Logistics alert

GitHub

Slides

Read Me / Notes / Questions

We will start in 5 minutes.

2 / 79

A very brief intro to Map Projections

3 / 79

We will start with a very brief intro to map projections. Hopefully, it is just enough to help you get comfortable with basic concepts of spatial analysis in R

Map Projections

2-D representation of 3-D earth

https://gistbok.ucgis.org/bok-topics/map-projections
4 / 79

Earth is a 3D object and when we are making a map we need to represent this in 2-D form i.e. on a flat surface. This is called MAP projections. As you can see in the figure here. There are several ways to do this. Think of keeping a paper either on the poles and project each point on that paper. Depending on how you project, your final map will be different.

Coordinate Reference Systems (CRS)

Tell us how a map is related to real locations on earth

Geographic vs Projected CRS

Source:ESRI

5 / 79

For example, a location of (130, 12) is not meaningful if you do know where the origin is and what the units are.

Broadly, there are two categories of CRS. Geographic or unprojected, and it defines where the data is located on the earth’s surface as on the left. A projected CRS tells us how to draw on a 2-D flat surface, like on a paper map or a computer screen.

A GCS is round, and so records locations in angular units (usually degrees). A PCS is flat, so it records locations in linear units (usually meters).

CRS in R

  • We need to know if the data is projected
    • which projection system is used
  • if not, you need to project it
    • use the correct system
  • most common projection is WGS84
    • good for mapping global data
  • When data with different CRS are combined
    • need to transform them to a common CRS so they align with one another
6 / 79

CRS in R

  • We need to know if the data is projected
    • which projection system is used
  • if not, you need to project it
    • use the correct system
  • most common projection is WGS84
    • good for mapping global data
  • When data with different CRS are combined
    • need to transform them to a common CRS so they align with one another

Why so many CRS?

  • There are many different models of earth and so different algorithms for projection
  • Choice often based on minimum distortion for region of interest/parameter

https://www.esri.com/arcgis-blog/products/arcgis-pro/mapping/gcs_vs_pcs/

6 / 79

Commonly used by organizations that provide GIS data for the entire globe or many countries. CRS used by Google Earth So, how to think about CRS in when you are doing your spatial analysis. . This is similar to making sure that units are the same when measuring volume or distances. It is usually impossible to preserve all characteristics at the same time in a map projection. This means that when you want to carry out your analysis, you need to use a projection that provides the best characteristics for your analyses.

CRS in R

Components of a CRS

  • projection
  • ellipsoid
    • defines general shape of earth
  • datum
    • defines origin

https://www.nceas.ucsb.edu/

7 / 79

So, that sums up the theory part for this session. With this knowledge we hope that you'll have a good sense of handling projection of your data.

Types of Spatial Data

8 / 79

A different kind of data structure

Vector data is stored in shapefiles

which is a set of files

that store information about

  • location
  • shape
  • attributes of features

file extensions

  • .shp, .shx, .dbf
  • .xml, .prj, .sbn, and .sbx
    • optional
10 / 79

A different kind of data structure

Vector data is stored in shapefiles

which is a set of files

that store information about

  • location
  • shape
  • attributes of features

file extensions

  • .shp, .shx, .dbf
  • .xml, .prj, .sbn, and .sbx
    • optional

`

Image: reddit

10 / 79

In this format, information is distributed in multiple files. And as a beginner i have always done this.

Raster data

"gridded" data

  • saved in pixels

file extensions

  • .jpeg, .tiff, .png

Source: National Ecological Observatory Network (NEON)

11 / 79

On the right, is a terresterial image, and we zoom in we see that each pixel repsents 1m2 and has a value.

Quiz

Which of this is NOT a shape file

  • plants.csv
  • plants.shp
  • plants.dbf
  • plants.shx
12 / 79

Quiz

Which of this is NOT a shape file

  • plants.csv
  • plants.shp
  • plants.dbf
  • plants.shx
13 / 79

Quiz

Which of these is NOT a raster file

  • plants.xlsx
  • plants.png
  • plants.tiff
  • plants.jpeg
14 / 79

Quiz

Which of these is NOT a raster file

  • plants.xlsx
  • plants.png
  • plants.tiff
  • plants.jpeg
15 / 79

Summary of Theory

  • Map Projections

  • Coordinate Reference Systems (CRS)

  • Types of spatial data

    • Vector
    • Raster
16 / 79

R packages

sf

raster

stars

17 / 79

sf

18 / 79

sf package

Simple features are implemented using simple data structures (S3 classes, lists, matrix, vector).

Similar to PostGIS, all functions and methods in sf that operate on spatial data are prefixed by st_.

Typical use involves reading, manipulating and writing of sets of features, with attributes and geometries.

19 / 79

Use the sf package for simpler spatial data analysis with geometries that stick to attributes. As attributes are typically stored in data.frame objects (or the very similar tbl_df), we will also store feature geometries in a data.frame column. Since geometries are not single-valued, they are put in a list-column, a list of length equal to the number of records in the data.frame, with each list element holding the simple feature geometry of that feature. The three classes used to represent simple features are:

sf, the table (data.frame) with feature attributes and feature geometries, which contains sfc, the list-column with the geometries for each feature (record), which is composed of sfg, the feature geometry of an individual simple feature.

Geometry types

POINT: a single point

MULTIPOINT: multiple points

LINESTRING: sequence of two or more points connected by straight lines

MULTILINESTRING: multiple lines

POLYGON: a closed ring with zero or more interior holes

MULTIPOLYGON: multiple polygons

GEOMETRYCOLLECTION: any combination of the above types

20 / 79

Advantages of sf

Is treated as a data.frame

Can be used with a pipe operator

Can be linked directly to the GDAL, GEOS, and PROJ libraries that provide the back end for reading spatial data, making geographic calculations, and handling coordinate reference systems

21 / 79

Load the libraries

library(sf)
library(raster)
library(tidyverse)
library(stars)
library(leaflet)
library(ggplot2)
library(ggspatial)
22 / 79

Data used here

  • Vector data collected from lizards spread across urban Bangalore in 2022.
  • Raster data is temperature data downloaded from Google Earth Engine for the month of March 2022.
23 / 79

st_as_sf

Convert foreign object to an sf object.

data <- read.csv("data/spotted.csv", sep = ",")
prj4string <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
my_projection <- st_crs(prj4string)
# Create sf object
data_sf <- st_as_sf(data, coords = c("Longitude", "Latitude"), crs = my_projection)

The order of Longitude, Latitude, and the exact column names are important.
PROJ4 formatted string list - https://proj.org/apps/proj.html. EPSG numeric code - https://proj.org/apps/proj.html.

24 / 79

Demo

25 / 79

Map

ggplot() +
geom_sf(data = data_sf) +
coord_sf()
26 / 79

Demo

27 / 79

st_is_valid

Checks whether a geometry is valid, or makes an invalid geometry valid.

st_is_valid(data_sf)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
28 / 79

st_write and st_read

Write simple features object to file or database.

Read a simple features object.

st_write(data_sf, "data/spotted_sf.shp")
# To save the XY or the coordinates use layer_options
st_write(data_sf, "data/spotted_sf.shp", layer_options = "GEOMETRY=AS_XY")
data_sf <- st_read("data/spotted_sf.shp")
29 / 79

st_point

Create simple feature from a numeric vector, matrix or list.

p1 <- st_point(c(2, 3))
p2 <- st_point(c(1, 4))
class(p1)
## [1] "XY" "POINT" "sfg"
30 / 79

Geometry types

st_point numeric vector (or one-row matrix)

st_multipoint numeric matrix with points in row

st_linestring numeric matrix with points in row

st_multilinestring list with numeric matrices with points in rows

st_polygon list with numeric matrices with points in rows

st_multipolygon list of lists with numeric matrices

st_geometrycollection list with (non-geometrycollection) simple feature objects

31 / 79

st_crs and st_geometry_type

Retrieve coordinate reference system from sf object.

Set or replace retrieve coordinate reference system from object.

st_crs(data_sf)
## Coordinate Reference System:
## User input: +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs
## wkt:
## GEOGCRS["unknown",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]],
## ID["EPSG",6326]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8901]],
## CS[ellipsoidal,2],
## AXIS["longitude",east,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433,
## ID["EPSG",9122]]],
## AXIS["latitude",north,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433,
## ID["EPSG",9122]]]]
st_geometry_type(data_sf)
## [1] POINT POINT POINT POINT POINT POINT POINT POINT POINT
## 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE
32 / 79

st_transform

Transform or convert coordinates of simple feature.

data_sf_projected <- st_transform(data_sf, 32643)
33 / 79

32643 is UTM projection with units m.

Demo

34 / 79

Geometric measurements

35 / 79

st_distance

Compute Euclidian or great circle distance between pairs of geometries; compute, the area or the length of a set of geometries.

If the coordinate reference system of x was set, these functions return values with unit of measurement.

st_distance(data_sf_projected[c(1), ], data_sf_projected[c(5), ])
## Units: [m]
## [,1]
## [1,] 17016.08
36 / 79

st_area

Returns the area of a geometry, in the coordinate reference system used; in case x is in degrees longitude/latitude, st_geod_area is used for area calculation.

Here we will use projected data to calculate area.

37 / 79

Geometric unary operations

38 / 79

st_buffer

Buffer is determination of a zone around a geographic feature containing location that are within a specified distance of that feature.

Computes a buffer around this geometry/each geometry.

Returns a new geometry.

Specify the radius of the buffer.

buf <- st_buffer(data_sf_projected, dist = 3000)
39 / 79

Demo

40 / 79

More

st_boundary returns the boundary of a geometry

st_centroid gives the centroid of a geometry

st_reverse reverses the nodes in a line

st_simplify simplifies lines by removing vertices

st_triangulate triangulates set of points (not constrained)

st_polygonize creates polygon from lines that form a closed ring. In case of st_polygonize, x must be an object of class LINESTRING or MULTILINESTRING, or an sfc geometry list-column object containing these

st_segmentize adds points to straight lines

st_union combines several feature geometries into one, without unioning or resolving internal boundaries

41 / 79

Geometric binary predicates

42 / 79

st_intersects

To find whether pairs of simple feature geometries intersect.

Returns TRUE or FALSE.

st_intersects(data_sf_projected[1, ], data_sf_projected[2, ])
## Sparse geometry binary predicate list of length 1, where the predicate
## was `intersects'
## 1: (empty)
43 / 79

Demo

44 / 79

More

Geometric binary predicates on pairs of simple feature geometry sets. Return a sparse matrix with matching (TRUE) indexes, or a full logical matrix.

st_intersects test whether Geometries/Geography spatially intersect

st_disjoint returns TRUE if the Geometries do not spatially intersect - if they do not share any space together

st_touches returns TRUE if the geometries have at least one point in common, but their interiors do not intersect

st_crosses returns TRUE if the supplied geometries have some, but not all, interior points in common

st_within returns TRUE if the first geometry is completely within the second geometry

st_contains returns TRUE if the second geometry is completely contained by the first geometry

st_overlaps returns TRUE if the Geometries share space, are of the same dimension, but are not completely contained by each other

st_equals returns true if the given geometries represent the same geometry. Directionality is ignored

st_is_within_distance returns TRUE if the geometries are within the specified distance (radius) of one another

45 / 79

Other operations

46 / 79

st_join

data_join <- read.csv("data/spotted_join.csv", sep = ",")
# Create sf object
data_join_sf <- st_as_sf(data_join, coords = c("Longitude", "Latitude"), crs = my_projection)
# Join using predicate
joined <- st_join(data_sf, data_join_sf, join = st_equals)
47 / 79

Demo

48 / 79

49 / 79

Summary of the vector data analysis till now.

BREAK - 5 minutes

50 / 79

raster

51 / 79

Raster

A RasterLayer can easily be created from scratch using the function raster. The default settings will create a global raster data structure with a longitude/latitude coordinate reference system and 1 by 1 degree cells. You can change these settings by providing additional arguments such as xmn, nrow, ncol, res, and/or crs, to the function. You can also change these parameters after creating the object.

temp <- raster("data/Ban_Temp_Mar2022.tif")
y <- raster(ncol = 36, nrow = 18, xmn = -1000, xmx = 1000, ymn = -100, ymx = 900)
52 / 79

Demo

53 / 79

Summary and Resolution

aggregate - aggregates a Raster object to create a new RasterLayer or RasterBrick with a lower resolution (larger cells).

res(temp)
## [1] 0.1000005 0.1000005
summary(temp)
## Ban_Temp_Mar2022
## Min. 297.7071
## 1st Qu. 298.3609
## Median 298.6290
## 3rd Qu. 299.0802
## Max. 300.2286
## NA's 0.0000
temp_agg <- aggregate(temp, 2)
res(temp_agg)
## [1] 0.2000009 0.2000009
res(y)
## [1] 55.55556 55.55556
summary(y)
## layer
## Min. NA
## 1st Qu. NA
## Median NA
## 3rd Qu. NA
## Max. NA
## NA's NA
54 / 79

projectRaster

To transform a RasterLayer to another coordinate reference system (projection)

# To transform
temp_proj <- projectRaster(temp, crs = "+proj=utm +zone=43 +datum=WGS84")
# To set the coordinate system
projection(y) <- "+proj=utm +zone=48 +datum=WGS84"
y
## class : RasterLayer
## dimensions : 18, 36, 648 (nrow, ncol, ncell)
## resolution : 55.55556, 55.55556 (x, y)
## extent : -1000, 1000, -100, 900 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=48 +datum=WGS84 +units=m +no_defs
# To transform
y_prof <- projectRaster(y, crs = "+proj=longlat +datum=WGS84 +no_defs")
55 / 79

Demo

56 / 79

Summarizing functions

Use cellStats to obtain a summary for all cells of a single raster object.

cellStats(temp, mean)
## [1] 298.7602
57 / 79

Raster algebra

When used with a raster object as first argument, normal summary statistics functions such as min, max, and mean return a RasterLayer (usually for multiple layers).

r1 <- temp
r2 <- r1
s <- stack(r1, r2)
plot(max(s, na.rm = TRUE))

58 / 79

Extract

Extract values from a raster object at the locations of spatial vector data

raster::extract(temp, data_sf)
## [1] 298.2247 298.2657 298.2442 298.2657 298.4747 298.2442 298.2442 298.6700
## [9] 298.2657
59 / 79

Mask by buffer

Mask values from a raster object at the locations of spatial vector data

data_sf_subset <- data_sf[1, ]
data_sf_buffer <- st_buffer(data_sf_subset, dist = 5000)
masked_raster <- mask(x = temp, mask = data_sf_buffer)
60 / 79

Demo

61 / 79

Raster operation

Returns a geographic subset of an object as specified by an Extent object (or object from which an extent object can be extracted/created)

62 / 79

Calculate area

  • Computes the approximate surface area of cells in an unprojected (longitude/latitude) raster object
  • returns all values or the values for a number of rows of a raster object. Values returned for a RasterLayer are a vector
sum(getValues(area(temp_proj)))
## [1] 21818160000
63 / 79

Demo

64 / 79

Rasterize using stars package

raster_data <- st_rasterize(data_sf)
65 / 79

Demo

66 / 79

Vectorize using stars

tif <- read_stars("data/Ban_Temp_Mar2022.tif")
summary(tif)
class(tif)
# Convert a foreign object to sf
temp_sf <- st_as_sf(tif, as_points = TRUE)
67 / 79

Demo

68 / 79

69 / 79

Static Map

bangalore_boundary <- st_read("data/BBMP_Boundary.shp")
bangalore_boundary <- st_transform(bangalore_boundary, crs = my_projection)
ggplot() + geom_sf(data = bangalore_boundary, color = "green", size = 2, alpha = 0.1, fill = "green") +
geom_sf(data = data_sf, color = "blue", size = 2) + theme_bw() +
annotation_scale(location = "bl", width_hint = 0.5) +
annotation_north_arrow(location = "bl", which_north = "true",
pad_x = unit(0.75, "in"), pad_y = unit(0.5, "in"),
style = north_arrow_fancy_orienteering) +
coord_sf(xlim = c(77.4, 77.8), ylim = c(12.8, 13.2))
70 / 79

Demo

71 / 79

Interactive Map

leaflet(bangalore_boundary) %>%
addTiles() %>%
addPolygons(color = "green") %>%
addCircles(
data = data,
lng = ~ Longitude,
lat = ~ Latitude,
radius = 20,
weight = 20,
fill = TRUE)
72 / 79

Demo

73 / 79

Data Sources

74 / 79

Sometimes..

  • Making maps in R is not always easy.
  • It is slow.
  • Map transformations can be a problem.
  • Adding annotations can be tedious.
75 / 79

Excersise - 20 minutes

78 / 79

Questions?

79 / 79

Hi, Logistics alert

GitHub

Slides

Read Me / Notes / Questions

We will start in 5 minutes.

2 / 79
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow