Working with Spatial and Raster Data in R

Part 0: Useful packages
Part 1: Spatial Data
Part 2: Raster Data
Part 4: Closing

Part 0: Useful packages

There are some packages that are very helpful for working with raster and spatial data. A few are listed here in decreasing order of usefulness (to me).

sp (provides the classes for spatial data; makes coordinate transformations simple)
raster (general-purpose raster functions; reads raster objects into R)
rgdal (reads shape files into R)
rgeos (a collection of excellent methods for working with spatial objects)
spatstat (nitty-gritty spatial calculations in compiled C code; model-fitting)
adehabitatHR (tools for home range estimation, such as 95% MCP)
maptools (conversion between spatstat and spatial objects)
adehabitatMA (functions that make it easier to work rasters)
mapdata (a collection of map databases, though you might find higher-resolution maps elsewhere)

Part 1: Spatial Data

We have movement data on a Finnish wolf named Niki. Our goal is to 1) calculate the distance from each of Niki’s positions to the closest primary road, and 2) calculate the density (in km / km^2) of primary road within Niki’s 100% MCP.

Step 1: Loading and Preparing Data

First, we need to load a shapefile of the road data. A shapefile is “a digital vector storage format for storing geometric location and associated attribute information.” Really, a shapefile is a collection of 6-7 individual files that are stored in the same directory and have a common name. To load shape files into R, we use the function readOGR from the package rgdal. It’s not a problem if readOGR throws a warning about “z dimension discarded,” since our data are 2-dimensional. While we’re at it, we load the wolf data, which has been stored as an R object.

setwd("~TimBarry/Box Sync/WolfDispersal/EliesClass")
require(sp)
require(rgdal)
load("niki.rda")
niki <- niki[!duplicated(niki[c("X","Y")]),] #eliminate duplicate points
kainuuUTM35 <- readOGR(dsn = "Road Shape File", layer = "DR_LINKKI")

## OGR data source with driver: ESRI Shapefile 
## Source: "Road Shape File", layer: "DR_LINKKI"
## with 116743 features
## It has 23 fields

The variable kainuuUTM35 is an object of the class SpatialLinesDataFrame, which is basically an ordered collection of line segments with associated attributes. We can explore the important features of this object.

head(kainuuUTM35@data)

##   LINK_ID LINK_MMLID KUNTAKOODI HALLINN_LK TOIMINN_LK TIENUMERO TIEOSANRO
## 0  587698 1110519748        105          3          6        NA        NA
## 1  587699 1110523754        105          3          5        NA        NA
## 2  587700  312275776        105          3          5        NA        NA
## 3  587701  597011941        105          3          5        NA        NA
## 4  587702  312275782        105          3          5        NA        NA
## 5  587703  598276179        105         99          7        NA        NA
##   LINKKITYYP SILTA_ALIK AJOSUUNTA TIENIMI_SU TIENIMI_RU TIENIMI_SA
## 0          3          0         2       <NA>       <NA>       <NA>
## 1          3          0         2    Kehä IV       <NA>       <NA>
## 2          3          0         2    Kehä IV       <NA>       <NA>
## 3          3          0         2    Kehä IV       <NA>       <NA>
## 4          3          0         2    Kehä IV       <NA>       <NA>
## 5         12          0         2       <NA>       <NA>       <NA>
##   ENS_TALO_O ENS_TALO_V VIIM_TAL_O VIIM_TAL_V          MUOKKAUSPV SIJ_TARK
## 0         NA         NA         NA         NA 08.01.2015 00:00:00     3000
## 1          2          1          2          3 08.01.2015 00:00:00     3000
## 2         NA         NA         NA         NA 08.01.2015 00:00:00     3000
## 3         NA         NA         NA         NA 08.01.2015 00:00:00     3000
## 4          4          5         10          5 08.01.2015 00:00:00     3000
## 5         NA         NA         NA         NA 02.07.2015 17:20:21     3000
##   KOR_TARK ALKU_PAALU LOPP_PAALU GEOM_FLIP
## 0   100001          0     57.476         0
## 1   100001          0    114.605         0
## 2   100001          0     17.342         0
## 3   100001          0      6.014         0
## 4   100001          0     70.150         0
## 5   100001          0    128.385         1

This data frame gives the attribute data for each line segment, with each row corresponding to a different segment. We are particularly interested in the attribute called “TOIMINN_LK,” which means “functional link” in Finnish. For our purposes, codes 1-5 represent primary roads, codes 6-7 represent forest roads, and code 8 represents paths.

Let’s determine the coordinate reference system of this road object.

kainuuUTM35@proj4string

## CRS arguments:
##  +proj=utm +zone=35 +ellps=GRS80 +units=m +no_defs

We can see that the reference system is UTM zone 35. Unfortunately, our wolf data are in a different coordinate reference system. The proj4string of our wolf data happens to be “+proj=tmerc +lat_0=0 +lon_0=27 +k=1 +x_0=3500000 +y_0=0 +ellps=intl +units=m +no_defs”. We must project our road data into the wolf’s coordinate reference system before we can do anything. We use the function spTransform from sp to do this. (Note that we could have transformed the wolf observations into UTM zone 35 instead; I don’t like doing this, since I like using the animal’s coordinate reference system as the “base” system.)

kainuuTrans <- spTransform(kainuuUTM35, CRS("+proj=tmerc +lat_0=0 +lon_0=27 +k=1 +x_0=3500000 +y_0=0 +ellps=intl +units=m +no_defs"))

It’s a good idea to save the road object at this stage using save or saveRDS. It’s a lot faster to load an R object into the workspace than it is to load a shape file!

Next, we will create a SpatialPoints object from the wolf data. A SpatialPoints object is nothing more than a collection of points with an associated coordinate reference system. We then calculate a 100% MCP of the homerange using mcp from the package adeHabitatHR.

require(adehabitatHR)
nikiSpatial <- SpatialPoints(niki[c("X","Y")], CRS("+proj=tmerc +lat_0=0 +lon_0=27 +k=1 +x_0=3500000 +y_0=0 +ellps=intl +units=m +no_defs"))
nikiMCP <- mcp(nikiSpatial, 100)

We will crop the road network according to Niki’s MCP using crop from the package raster. The crop function relies on the package rgeos.

require(raster)
require(rgeos)
roadsInMCP <- crop(kainuuTrans, extent(nikiMCP))

Let’s visualize our data using spplot from sp. We must specify the attribute we wish to display (in our case, “TOIMINN_LK”) as an argument to zcol. Note that spplot plots in lattice, so we will need the package latticeExtra to add Niki’s observations to the plot.

require(latticeExtra)
plot(spplot(roadsInMCP, zcol = "TOIMINN_LK") + 
     layer(panel.points(niki[c("X","Y")], pch = 19, cex = 0.5, col = "black")))

We can see that the observations are clustered around apparent den sites, which in turn are situated relatively far away from priary roads. Since we are interested in primary roads, let’s subset the roadsInMCP object so that “TOIMINN_LK” takes on values ranging from 1-5.

primaryRoads <- roadsInMCP[roadsInMCP@data$TOIMINN_LK %in% 1:5,]

Step 2: Road Distances

We will use the package spatstat to calculate the distance from each observation to the closest primary road. We need to convert the wolf observations and road object into data types that spatstat can understand. PSP is the spatstat analogue of a SpatialLinesDataFrame object, and PPP is the spatstat analogue of a SpatialPoints object (or, more accurately, a SpatialPointsDataFrame object). We use the package mapTools to convert the road object into PSP, and we convert the wolf observations into PPP on our own.

require(spatstat)
require(maptools)
primRoadPsp <- as.psp.SpatialLines(primaryRoads, marks = "primary")
nikiPpp <- with(niki, ppp(X,Y, owin(range(X), range(Y))))
plot(primRoadPsp, main = "SpatStat Plot")
plot(nikiPpp, add = T, pch = 19, cex = 0.5)

The spatstat function project2segment calculates the distance from each point to the closest line segment. It does so in compiled C code and is very fast.

ds <- project2segment(nikiPpp, primRoadPsp)
distances <- ds$d
head(distances)

## [1]  252.3153  164.4448 1096.3846 1308.4927 1395.8908 1391.9686

In the following plot, we see an illustration of how project2segment works on a sample of 20 points.

randExample <- sample(1:nrow(niki), 20, FALSE)
plot(primRoadPsp, main = "A few projected points")
plot(nikiPpp[randExample], add = T, pch = 19, cex = 0.5)
arrows(nikiPpp$x[randExample], nikiPpp$y[randExample], ds$Xproj$x[randExample], ds$Xproj$y[randExample], length = 0.05, col = "blue")

If we simply want to determine whether a point lies on the road, we can use buffers. For our purposes, a point lies on the road if it is within 45 m of the road. We will use the function gBuffer from rgeos to create a 45 m buffer around the primary roads. The function returns a SpatialPolygon object, which is a collection of polygons with an associated coordinate reference system.

buffer <- gBuffer(primaryRoads, width = 45)

Now, we use the incredibly useful function gIntersection from rgeos to find all points that lie inside of the SpatialPolygon buffer. By dividing the number of observations on the road by the total number of observations, we are able to compute the proportion of Niki’s observations that are on primary roads.

pointsOnRoad <- gIntersection(nikiSpatial, buffer)
plot(nikiSpatial, cex = 0.5, pch = 19, col = "grey90")
plot(buffer, border = "red", add = T)
plot(pointsOnRoad, pch = 19, cex = 0.5, add = T)

propOnRoad <- nrow(pointsOnRoad@coords)/nrow(nikiSpatial@coords)
propOnRoad

## [1] 0.007000206

We see that Niki spends roughly 0.007 of the time on primary roads. Notice that we could have arrived at this answer by determining the number of distances that are less than 45 m, then dividing by the total number of distances.

propOnRoadAlternate <- sum(distances <= 45)/length(distances)
propOnRoadAlternate

## [1] 0.007000206

Step 3: Road Density

Next, we want to compute the density of primary roads (in km / km^2) within the MCP.

plot(primaryRoads, col = "red")
plot(nikiMCP, add = T, border = "blue")
plot(nikiSpatial, add = T, pch = 19, cex = 0.5)

First, we intersect the road object with the MCP using gIntersection.

croppedRoads <- gIntersection(nikiMCP, primaryRoads)
plot(croppedRoads, col = "red")
plot(nikiMCP, border = "blue", add = T)
plot(nikiSpatial, add = T, pch = 19, cex = 0.5)

Then, we use functions from rgeos to calculate total road length (gLength) and MCP area (gArea). We arrive at density by dividing length by area.

totalLength <- gLength(croppedRoads)
McpArea <- gArea(nikiMCP)
density <- totalLength/McpArea * 1e3 # convert to units km / km^2
density

## [1] 0.2131982

Part 2: Raster Data

Again, we will work with the trajectory of Niki. Using raster data, we will 1) extract the habitat type at each of Niki’s positions, and 2) compute the closest house to each location.

Step 1: Loading and Preparing Data

First, we need to load the raster into the workspace. In storage, our raster is saved as a tiff file. We will use the function raster (from the package of the same name) to load the tiff file into the workspace and convert it into an R raster object.

FinlandRaster <- raster("FinlandRaster.tif")
plot(FinlandRaster)

In R, a raster object is basically a super-efficient array. We can index rasters like we can arrays. We can also query the number of rows and columns of a raster, just like we can an array.

FinlandRaster[10,40]

##    
## 38

FinlandRaster@ncols

## [1] 7520

FinlandRaster@nrows

## [1] 5554

The values of a raster correspond to different habitat types. In our case, the raster takes on the values 1-44, as well as 255. Roughly, codes 1-11 represent artificial surfaces, codes 12-22 represent agricultural areas, codes 23-34 represent forests, codes 35-44 represent water, and code 255 represents outside of Finland. In particular, codes 1-3 represent houses. This information is available in the file CLC2006_LandClasses.pdf.

We observe that the raster is in the correct coordinate system, so we don’t need to worry about any projection here. If, however, the raster were in the wrong coordinate system, we would use the function projectRaster from the raster package.

FinlandRaster@crs

## CRS arguments:
##  +proj=tmerc +lat_0=0 +lon_0=27 +k=1 +x_0=3500000 +y_0=0
## +ellps=intl +units=m +no_defs

We crop the raster according to Niki’s trajectory. To do so, we use the function crop from the package raster.

bitOfFinland <- crop(FinlandRaster, extent(nikiSpatial))
plot(bitOfFinland)

points(nikiSpatial, col = "black", pch = 19, cex = 0.5)

Step 2: Extracting Habitat Types

To extract the habitat type at each observed location, we use the function extract from the raster package. Note that we could extract habitat types by carefully indexing the raster object, but that would be much more difficult.

habTypes <- extract(bitOfFinland, niki[c("X","Y")])
table(habTypes)

## habTypes
##    2    4    7   14   15   17   20   21   23   24   25   28   29   30   33 
##    2    7    5    1    2   13 1404 1443  380  340    1   47  363  431    3 
##   38   43 
##  391   23

Niki seems to have spent a lot of time in agricultural zones (the pale yellow in the raster), and very little time in water and human-modified areas (blue and red in the raster, respectively).

Step 3: Closest House at Each Location

To compute the distance from each point to the closest house pixel, we first need to determine the coordinates of each house pixel. We change all raster values ranging from 1-3 (i.e., those corresponding to houses) to TRUE, and we change all others to FALSE. We obtain the indeces of the TRUE cells using the raster function Which. Finally, we obtain the actual coordinates of the TRUE cells using the raster function xyFromCell. This process leaves us with the coordinates of the houses. In the plot, wolf locations are black and houses are red.

 bitOfFinland[bitOfFinland %in% 1:3] <- TRUE
 bitOfFinland[bitOfFinland >= 4] <- FALSE
 idx <- Which(bitOfFinland, cells = TRUE)
 keyXY <- xyFromCell(bitOfFinland, idx, spatial = TRUE)
 
 plot(nikiSpatial, pch = 19, cex = 0.1, col = "black")
 plot(keyXY, pch = 19, cex = 0.1, col = "red", add = T)

We compute the distance between each observation and the closest house using a small chunk of code written by Elie. Converting coordinates into complex numbers is a useful trick for working with 2-dimensional spatial data.

 z.wolf <- nikiSpatial@coords[,1] + 1i*nikiSpatial@coords[,2]
 z.houses <- keyXY@coords[,1] + 1i * keyXY@coords[,2]
 getMinDistance <- function(x,y) {
   min(Mod(x-y))
 }
ds <- sapply(z.wolf, getMinDistance, y = z.houses)
names(ds) = NULL
head(ds)

## [1] 1610.666 1537.986 2040.159 2074.295 2159.306 2159.044

Part 4: Closing

Originally, I wasted a lot of time attempting to re-invent the wheel by writing many spatial functions on my own. This was time consuming, and my code was often inefficient or simply incorrect. Fortunately, R has a rich ecosystem of spatial functions, methods, and packages. Now, when I need a certain spatial function, I poke around the spatial packages before attempting to write something myself. Typically, I’m able to find a highly optimized function that works quite well.

Here are some additional functions that I find useful.

spsample from sp: randomly sample some number of points from a spatial polygon; useful for generating null sets of absences.
rasterToPolygons from raster: create a polygon from a raster object; useful for generating polygons that bound various raster features.
%dopar% from foreach and doParallel: perform computations in parallel; useful for completing multiple time-intensive spatial tasks simultaneously, such as loading multiple shape files at once.

Working with Spatial and Raster Data in R

T Barry

August 17, 2016

Part 0: Useful packages

Part 1: Spatial Data

Step 1: Loading and Preparing Data

Step 2: Road Distances

Step 3: Road Density

Part 2: Raster Data

Step 1: Loading and Preparing Data

Step 2: Extracting Habitat Types

Step 3: Closest House at Each Location

Part 4: Closing