Loading important packages for later:

pcks <- list("plyr", "lubridate", "rgdal", "sp", "raster", "magrittr")
sapply(pcks, require, char = TRUE)

## [1] TRUE TRUE TRUE TRUE TRUE TRUE

1 Introduction

Let’s ask a few questions of the (strictly) movement data. For example: 1. How does activity vary across seasons / herds / individuals? 2. How do ranging aread vary across seasons / herds / individuals?

As we work on these questions, we’ll learn a few additional skills

First, load the caribou data:

load("./_data/caribou.rda")

2 Prepping the data

2.1 Daily means

For data these large, if you are not interested in sub-daily movements, it can simplify things considerable to reduce the data to a daily mean. This is done quite easily with the help of ddply and related tools.

First, we add columns of Year, Month, and DOY (day of year) to the caribou data using lubridate commands:

caribou <- mutate(caribou, Year = year(Time), DOY = yday(Time))

Next, we average locations for every combination of ID, Year and DOY:

caribou.daily <- ddply(caribou, c("ID", "Year", "DOY", "Herd", "nickname"), 
    summarize, Latitude = mean(Latitude), Longitude = mean(Longitude), X = mean(X), 
    Y = mean(Y))

Now we have a much smaller data frame:

dim(caribou)

## [1] 106895     18

dim(caribou.daily)

## [1] 32853    11

Also - the data will be more regular and we can make more straightforward comparisons across datasets.

The new data frame also has somewhat fewer columns:

head(caribou.daily)

##              ID Year DOY               Herd nickname Latitude Longitude
## 1 BWCA-WL-13-01 2015  91 Hay River Lowlands     HR.1 60.88161 -116.8018
## 2 BWCA-WL-13-01 2015  92 Hay River Lowlands     HR.1 60.88113 -116.7998
## 3 BWCA-WL-13-01 2015  93 Hay River Lowlands     HR.1 60.87949 -116.7991
## 4 BWCA-WL-13-01 2015  94 Hay River Lowlands     HR.1 60.87852 -116.7989
## 5 BWCA-WL-13-01 2015  95 Hay River Lowlands     HR.1 60.87812 -116.7981
## 6 BWCA-WL-13-01 2015  96 Hay River Lowlands     HR.1 60.87830 -116.7982
##          X       Y       Date Month
## 1 510759.7 6749617 2015-04-01     4
## 2 510870.2 6749563 2015-04-02     4
## 3 510908.8 6749382 2015-04-03     4
## 4 510916.4 6749274 2015-04-04     4
## 5 510959.9 6749229 2015-04-05     4
## 6 510954.4 6749248 2015-04-06     4

We’ve lost the date itself, but we can recreate that from the Year and DOY using the ddays command - and tack on the month of year too. This isn’t pretty, but it works:

caribou.daily <- mutate(caribou.daily, Date = ymd(paste(Year - 1, 12, 31)) + 
    ddays(DOY), Month = month(Date))

Note that, unfortunately, ddply strips the data frame of the projection attribute that we’ve been carrying around. If you want it there - you’ll have to recreate it:

attr(caribou.daily, "projection") <- attr(caribou, "projection")

2.2 Displacements and angles

One of the most basic movement statistics is simply displacement of each step (or step length). There are a few tools out there. For example if you convert your data to the native format of the adehabitatLT package (the ltraj) these are computed, and the move package has tools that compute them quickly. But there’s a “simple trick” to getting these variables very quickly, I refer you to this link for a detailed explanation.

In the meantime, here’s an example using PP.5:

pp5 <- subset(caribou.daily, nickname == "PP.5")
Z <- pp5$X + (0+1i) * pp5$Y
StepLength <- Mod(diff(Z))

Of course, step lengths will depend on time intervals, so best to control for that and estimate “Speed” (i.e. daily displacement):

dT <- as.numeric(diff(pp5$Date))
table(dT)

## dT
##   1   2   3   7 
## 715   2   1   1

Speed <- StepLength/dT

Here’s a summary of those daily displacements - in km/day:

hist(Speed/1000, breaks = 50, col = "grey")

summary(Speed/1000)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##  0.01549  0.45929  1.11753  2.06176  2.35550 28.96717

Here’s a function that adds all those variables to the data frame:

computeDD <- function(data) {
    Z <- data$X + (0+1i) * data$Y
    StepLength <- c(NA, Mod(diff(Z)))
    dT <- c(NA, as.numeric(diff(data$Date)))
    Speed <- StepLength/dT
    return(data.frame(data, StepLength, dT, Speed))
}

And here we apply it to all of the caribou in the daily data set:

caribou.daily <- ddply(caribou.daily, "ID", computeDD)

3 Movement rates

Now we can, for example, quickly see how movement dates vary, e.g, by month using a simple box plot:

boxplot((Speed/1000) ~ Month, data = subset(caribou.daily, Speed > 0), log = "y", 
    ylim = c(0.001, 100), ylab = "km/day", pch = 19, cex = 0.5, col = rgb(0, 
        0, 0, 0.5))

We could try to visualize this even better against day of year across herds with ggplot. Note: faceting against herds allows us to compare herds nicely, and the stat_smooth() feature draws a visual guide for the shifts.

require(ggplot2)
ggplot(subset(caribou.daily, Speed > 1), aes(x = DOY, y = Speed/1000)) + facet_grid(~Herd) + 
    geom_point(alpha = 0.3) + stat_smooth() + scale_y_log10()

Ignoring Buffalo West, we can clearly see a consistent pattern across all the herds: Higher activity in winter, low right before and right after the spring migration, with a clear peak around day 100, a ramp up. This is totally exploratory, but we might use these results - roughly - as guides

4 Home ranges and utilization distributions

There is a an extensive and complex literarure on what constitites a “home range” (HR) or a “utilization distribution” (UD). Of relatively recent note are definitions provided by Brownian Bridges (see Kranstauber) and auto-correlated kernel density estimates (akde’s) (see Chris Fleming’s ctmm package).

4.1 Minimum convex polygon

The simplest definition - by far - is the minimum convex polygon (MCP), which simply draws a tight polygon around the observations. This is a problematic definition - mainly because it has a tendency to encompass space that is clearly not used by the animal. But it is redeemed somewhat by its incredible simplicity.

Here is how to compute the MCP’s usign the adehabtitatHR package, focussing on one elk:

require(adehabitatHR)
load("./_data/caribou.rda")
load("./_data/habitatdata.rda")

# extact one elk, convert 'xy' to Spatial Points object
mycaribou <- subset(caribou, nickname == "PP.5")
xy.spatial <- SpatialPoints(mycaribou[, c("X", "Y")])
myelevation <- crop(elevation, extent(xy.spatial))

# plot elevations and line
image(myelevation, asp = 1, col = terrain.colors(100))
with(mycaribou, points(X, Y, type = "o", pch = 19, cex = 0.5, col = rgb(0, 0, 
    0, 0.2)))

# the key command:
mycaribou.mcp <- mcp(xy.spatial, 100)
lines(mycaribou.mcp, col = "red", lwd = 2)

mycaribou.mcp

## Object of class "SpatialPolygonsDataFrame" (package sp):
## 
## Number of SpatialPolygons:  1
## 
## Variables measured:
##   id     area
## a  a 225232.6

Lets also get some random points from within the MCP. The key function: spsample:

random.points <- spsample(mycaribou.mcp, 1000, "random")
image(myelevation, col = terrain.colors(100), asp = 1)
plot(lines2010, add = TRUE, col = "lightblue")
points(random.points, col = rgb(1, 0, 0, alpha = 0.5), pch = 19, cex = 0.5)
lines(mycaribou.mcp, col = "red", lwd = 2)
points(mycaribou$X, mycaribou$Y, type = "o", cex = 0.5, col = rgb(0, 0, 0, 0.5), 
    pch = 19)

This will be useful later when we do Resource Selection Functions.

4.2 Kernel density estimates

… are also fun, and relatively simple to obtain. Note that autocorrelation in the movement data will bias your results a lot, so it it better to take a subsample of the data as much as possible. For example, for the MacKenzie herd:

MK.xy <- subset(caribou, Herd == "Mackenzie")[sample(1:36000, 1000), c("X", 
    "Y")]
plot(elevation)
points(MK.xy, col = rgb(0, 0, 0, 0.2), pch = 19)

Using the ks function in kde in ks and plotting the 50% and 95% intervals:

require(ks)
MK.kde <- kde(x = MK.xy, compute.cont = TRUE)
plot(elevation, xlim = range(MK.xy$X), ylim = range(MK.xy$Y))
points(MK.xy, col = rgb(0, 0, 0, 0.2), pch = 19, cex = 0.5)
plot(MK.kde, add = TRUE, col = "blue", cont = c(50, 95), lwd = 2)

This is useful for picking out a few key ranges (note - its on a locally elevated area). But it is a bit cryptic what the automatically selected bandwidth is. The answer is:

Hpi(MK.xy)

##         [,1]     [,2]
## [1,] 6617531  5126071
## [2,] 5126071 14194805

But it’s worth looking into the what the underlying assnmptions are (I think its an underestimate). But from this output, you can get a sense of the key ranges. Note, that they are on a locally elevated area. Also, with a bit more fussy code, you can also quantify the area of that range. In short - what this code does is: extract the contour lines from the 2D density, find the contour line that corresponds to 95% area covered, and computes the area using the polyarea function in pracma.

contour.95 <- with(MK.kde, contourLines(x = eval.points[[1]], y = eval.points[[2]], 
    z = estimate, levels = cont["5%"])[[1]])
library(pracma)
with(contour.95, abs(polyarea(x, y))) * 1e-06

## [1] 2172.152

So - approximately - an over 2000 square km total for the “core” range.

4.3 Kernel densities by season

As usual, you can run this analysis on any data set you choose to subsample. And, as usual, its best to write a little function that’ll just do it.

Here’s a function that combines the step above:

computeKDE <- function(xy, percent = 95) {
    require(pracma)
    mykde <- kde(x = xy, compute.cont = TRUE)
    mycontour <- with(mykde, contourLines(x = eval.points[[1]], y = eval.points[[2]], 
        z = estimate, levels = cont[percent]))
    
    myarea = sum(sapply(mycontour, function(c) abs(polyarea(c$x, c$y)))) * 1e-06
    return(list(contour = mycontour, area = myarea))
}

Maybe we can now compare the MacKenzie core ranges during the winter and summer season, defining those (somewhat arbitrarily) as Dec-Feb and June-Aug, respectively.

So … first assign seasons to the data:

caribou.daily$season <- factor("shoulder", levels = c("winter", "shoulder", 
    "summer"))
caribou.daily$season[caribou.daily$Month >= 6 & caribou.daily$Month <= 8] <- "summer"
caribou.daily$season[caribou.daily$Month == 12 | caribou.daily$Month <= 2] <- "winter"

Let’s see if there’s any separation at all:

MK <- subset(caribou.daily, Herd == "Mackenzie")
xy.spatial <- SpatialPoints(MK[, c("X", "Y")])
MK.elevation <- crop(elevation, extent(xy.spatial))

plot(MK.elevation)
with(MK, points(X, Y, col = alpha(c("blue", NA, "red")[season], 0.05), cex = 0.5, 
    asp = 1, pch = 19))

There doesn’t look like much separation, but the summer range is certainly smaller. Let’s quantify this with our function.

xy.summer <- subset(MK, season == "summer")[, c("X", "Y")]
xy.winter <- subset(MK, season == "winter")[, c("X", "Y")]

# subsample, for less autocorrelation
xy.summer <- xy.summer[sample(1:nrow(xy.summer), 400), ]
xy.winter <- xy.winter[sample(1:nrow(xy.winter), 400), ]

# compute!
MK.kde.summer <- computeKDE(xy.summer, percent = 95)
MK.kde.winter <- computeKDE(xy.winter, percent = 95)

# plot

plot(MK.elevation)
. <- sapply(MK.kde.winter$contour, polygon, col = alpha("blue", 0.2))
. <- sapply(MK.kde.summer$contour, polygon, col = alpha("red", 0.2))
legend("bottomleft", fill = alpha(c("blue", "red"), 0.5), legend = c("winter", 
    "summer"), title = "95% UD")

# compare the areas
MK.kde.winter$area

## [1] 2757.427

MK.kde.summer$area

## [1] 1202.307

So - considerable overlap but a much smaller summer range.

4.4 Seasonal overlap indices

To quantify the actual overlap, Fieberg and Kochanny (2005) suggest computing a so-called UDOI (UD overlap index): \[ UDOI = A_{1,2} \int_{-\infty}^\infty \int_{-\infty}^\infty UD_1(x,y) \, UD_2(x,y) dxdy\]. Where \(A_{1,2}\) is the “overlap” between the ranges. We’ll do something a bit simpler, the Bhattracharyya index (BA): \[ BA = \int_{-\infty}^\infty \int_{-\infty}^\infty \sqrt{UD_1(x,y)} \, \sqrt{UD_2(x,y)} dxdy\] which ranges from 0 (no overlap) to 1 (perfect overlap). This is actually easier to compute than the SDOI from our results, as long as we have the same basis for our kde.
So:

computeBA <- function(xy.summer, xy.winter) {
    xrange <- range(xy.winter[, 1], xy.summer[, 1])
    yrange <- range(xy.winter[, 2], xy.summer[, 2])
    dx <- diff(xrange)/150
    dy <- diff(yrange)/150
    
    kde.summer <- kde(x = xy.summer, xmin = c(xrange[1], yrange[1]), xmax = c(xrange[2], 
        yrange[2]), gridsize = c(150, 150))
    
    kde.winter <- kde(x = xy.winter, xmin = c(xrange[1], yrange[1]), xmax = c(xrange[2], 
        yrange[2]), gridsize = c(150, 150))
    
    BA <- sum(sqrt(kde.summer$estimate) * sqrt(kde.winter$estimate)) * dx * 
        dy
    return(BA)
}
computeBA(xy.summer, xy.winter)

## [1] 0.6858689

As suspected … a relatively high amount of overlap!

4.5 Analyzing all herds

Again, we can compact the code above and make some computations for all of the herds (except Buffalo West which has no summer observations). And, again, we start with writing a function:

computeSummerWinterRangeArea <- function(myherd, cols = c("blue", "red"), plotme = TRUE) {
    mydata <- subset(caribou.daily, Herd == myherd)
    xy.summer <- subset(mydata, season == "summer")[, c("X", "Y")]
    xy.winter <- subset(mydata, season == "winter")[, c("X", "Y")]
    xy.summer <- xy.summer[sample(1:nrow(xy.summer), 200), ]
    xy.winter <- xy.winter[sample(1:nrow(xy.winter), 200), ]
    my.kde.summer <- computeKDE(xy.summer, percent = 95)
    my.kde.winter <- computeKDE(xy.winter, percent = 95)
    if (plotme) {
        sapply(my.kde.winter$contour, polygon, col = alpha(cols[1], 0.5))
        sapply(my.kde.summer$contour, polygon, col = alpha(cols[2], 0.5))
    }
    BA <- computeBA(xy.summer, xy.winter)
    return(data.frame(Herd = myherd, winter.area = my.kde.winter$area, summer.area = my.kde.summer$area, 
        BA = BA))
}

UDstats <- data.frame(NULL)
winter.cols <- c("red", "darkblue", "black", "forestgreen")
summer.cols <- c("orange", "blue", "white", "green")

with(caribou.daily, plot(X, Y, asp = 1, type = "n"))
herds <- levels(caribou.daily$Herd)[-2]
for (i in 1:length(herds)) UDstats <- rbind(UDstats, computeSummerWinterRangeArea(herds[i], 
    cols = c(winter.cols[i], summer.cols[i]), plotme = TRUE))

legend("bottomleft", fill = winter.cols, legend = herds, title = "Winter", cex = 0.8, 
    bty = "n")
legend("bottomright", fill = summer.cols, legend = herds, title = "Summer", 
    cex = 0.8, bty = "n")

And, the table of computed stats:

UDstats

##                 Herd winter.area summer.area          BA
## 1       Buffalo Lake    730.2461    722.3034 0.001403826
## 2 Hay River Lowlands  10511.9142  10225.1750 0.858832165
## 3          Mackenzie   2912.3780   1284.8687 0.684740915
## 4         Pine Point   1012.1842    414.2093 0.502809438

One of these is NOT like the others. Maybe - interestingly?

5 Elevation across seasons

Let’s compare the elevation across seasons. This is - in fact - very straightforward. We need only obtain the elevation at each point - a process sometimes referred to as “annotating” a movement dataset. In R, this is done from a raster with the extract function. We can simply add a column to the caribou.daily object:

caribou.daily$elevation <- raster::extract(elevation, caribou.daily[, c("X", 
    "Y")])

ggplot(subset(caribou.daily, Herd != "Buffalo West" & season != "shoulder"), 
    aes(Herd, elevation, col = season)) + geom_boxplot() + facet_grid(Year ~ 
    .)

Wow - fairly inconsistent patterns!

5.1 Quicky mixed effects model

To test whether the herds are significantly different, we fit an interaction model, a linear additive, and Herd only main effects linear mixed effects model (with ID as a random effect and 1st order autocorrelation in the observations).

require(nlme)
cd <- subset(caribou.daily, Herd != "Buffalo West" & season != "shoulder")
lme.fit0 <- lme(elevation ~ Herd, random = ~1 | ID, correlation = corAR1(), 
    data = cd, method = "ML")
lme.fit1 <- lme(elevation ~ season, random = ~1 | ID, correlation = corAR1(), 
    data = cd, method = "ML")
lme.fit2 <- lme(elevation ~ (Herd + season), random = ~1 | ID, correlation = corAR1(), 
    data = cd, method = "ML")
lme.fit3 <- lme(elevation ~ (Herd * season), random = ~1 | ID, correlation = corAR1(), 
    data = cd, method = "ML")

fits <- list(lme.fit0, lme.fit1, lme.fit2, lme.fit3)
ldply(fits, function(l) data.frame(formula = paste(as.character(formula(l))[3], 
    collapse = " "), aic = AIC(l))) %>% mutate(dAIC = aic - min(aic))

##           formula      aic     dAIC
## 1            Herd 120842.2 812.8368
## 2          season 120832.9 803.5223
## 3 (Herd + season) 120816.5 787.1230
## 4 (Herd * season) 120029.4   0.0000

So - in short - the herds are all rather different, with two of the herds (Buffalo Lake and Pine Point) spending more time in higher elevations in the winter than in the summer, and MacKenzie (anyways, a low-elevation population) doing the opposite.

5.2 Before we forget ..

Let’s save caribou.daily. It’s a pretty useful little data frame!

save(caribou.daily, file = "./_data/caribou.daily.rda")

Lab 4: Exploratory Analysis of Movement Data

Elie Gurarie