Loading important packages for later:
pcks <- list("plyr", "lubridate", "rgdal", "sp", "raster", "magrittr")
sapply(pcks, require, char = TRUE)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE
Let’s ask a few questions of the (strictly) movement data. For example: 1. How does activity vary across seasons / herds / individuals? 2. How do ranging aread vary across seasons / herds / individuals?
As we work on these questions, we’ll learn a few additional skills
First, load the caribou data:
load("./_data/caribou.rda")
For data these large, if you are not interested in sub-daily movements, it can simplify things considerable to reduce the data to a daily mean. This is done quite easily with the help of ddply and related tools.
First, we add columns of Year, Month, and DOY (day of year) to the caribou data using lubridate
commands:
caribou <- mutate(caribou, Year = year(Time), DOY = yday(Time))
Next, we average locations for every combination of ID, Year and DOY:
caribou.daily <- ddply(caribou, c("ID", "Year", "DOY", "Herd", "nickname"),
summarize, Latitude = mean(Latitude), Longitude = mean(Longitude), X = mean(X),
Y = mean(Y))
Now we have a much smaller data frame:
dim(caribou)
## [1] 106895 18
dim(caribou.daily)
## [1] 32853 11
Also - the data will be more regular and we can make more straightforward comparisons across datasets.
The new data frame also has somewhat fewer columns:
head(caribou.daily)
## ID Year DOY Herd nickname Latitude Longitude
## 1 BWCA-WL-13-01 2015 91 Hay River Lowlands HR.1 60.88161 -116.8018
## 2 BWCA-WL-13-01 2015 92 Hay River Lowlands HR.1 60.88113 -116.7998
## 3 BWCA-WL-13-01 2015 93 Hay River Lowlands HR.1 60.87949 -116.7991
## 4 BWCA-WL-13-01 2015 94 Hay River Lowlands HR.1 60.87852 -116.7989
## 5 BWCA-WL-13-01 2015 95 Hay River Lowlands HR.1 60.87812 -116.7981
## 6 BWCA-WL-13-01 2015 96 Hay River Lowlands HR.1 60.87830 -116.7982
## X Y Date Month
## 1 510759.7 6749617 2015-04-01 4
## 2 510870.2 6749563 2015-04-02 4
## 3 510908.8 6749382 2015-04-03 4
## 4 510916.4 6749274 2015-04-04 4
## 5 510959.9 6749229 2015-04-05 4
## 6 510954.4 6749248 2015-04-06 4
We’ve lost the date itself, but we can recreate that from the Year and DOY using the ddays
command - and tack on the month of year too. This isn’t pretty, but it works:
caribou.daily <- mutate(caribou.daily, Date = ymd(paste(Year - 1, 12, 31)) +
ddays(DOY), Month = month(Date))
Note that, unfortunately, ddply strips the data frame of the projection attribute that we’ve been carrying around. If you want it there - you’ll have to recreate it:
attr(caribou.daily, "projection") <- attr(caribou, "projection")
One of the most basic movement statistics is simply displacement of each step (or step length). There are a few tools out there. For example if you convert your data to the native format of the adehabitatLT
package (the ltraj
) these are computed, and the move
package has tools that compute them quickly. But there’s a “simple trick” to getting these variables very quickly, I refer you to this link for a detailed explanation.
In the meantime, here’s an example using PP.5:
pp5 <- subset(caribou.daily, nickname == "PP.5")
Z <- pp5$X + (0+1i) * pp5$Y
StepLength <- Mod(diff(Z))
Of course, step lengths will depend on time intervals, so best to control for that and estimate “Speed” (i.e. daily displacement):
dT <- as.numeric(diff(pp5$Date))
table(dT)
## dT
## 1 2 3 7
## 715 2 1 1
Speed <- StepLength/dT
Here’s a summary of those daily displacements - in km/day:
hist(Speed/1000, breaks = 50, col = "grey")
summary(Speed/1000)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01549 0.45929 1.11753 2.06176 2.35550 28.96717
Here’s a function that adds all those variables to the data frame:
computeDD <- function(data) {
Z <- data$X + (0+1i) * data$Y
StepLength <- c(NA, Mod(diff(Z)))
dT <- c(NA, as.numeric(diff(data$Date)))
Speed <- StepLength/dT
return(data.frame(data, StepLength, dT, Speed))
}
And here we apply it to all of the caribou in the daily data set:
caribou.daily <- ddply(caribou.daily, "ID", computeDD)
Now we can, for example, quickly see how movement dates vary, e.g, by month using a simple box plot:
boxplot((Speed/1000) ~ Month, data = subset(caribou.daily, Speed > 0), log = "y",
ylim = c(0.001, 100), ylab = "km/day", pch = 19, cex = 0.5, col = rgb(0,
0, 0, 0.5))
We could try to visualize this even better against day of year across herds with ggplot. Note: faceting against herds allows us to compare herds nicely, and the stat_smooth()
feature draws a visual guide for the shifts.
require(ggplot2)
ggplot(subset(caribou.daily, Speed > 1), aes(x = DOY, y = Speed/1000)) + facet_grid(~Herd) +
geom_point(alpha = 0.3) + stat_smooth() + scale_y_log10()
Ignoring Buffalo West, we can clearly see a consistent pattern across all the herds: Higher activity in winter, low right before and right after the spring migration, with a clear peak around day 100, a ramp up. This is totally exploratory, but we might use these results - roughly - as guides
There is a an extensive and complex literarure on what constitites a “home range” (HR) or a “utilization distribution” (UD). Of relatively recent note are definitions provided by Brownian Bridges (see Kranstauber) and auto-correlated kernel density estimates (akde’s) (see Chris Fleming’s ctmm
package).
The simplest definition - by far - is the minimum convex polygon (MCP), which simply draws a tight polygon around the observations. This is a problematic definition - mainly because it has a tendency to encompass space that is clearly not used by the animal. But it is redeemed somewhat by its incredible simplicity.
Here is how to compute the MCP’s usign the adehabtitatHR
package, focussing on one elk:
require(adehabitatHR)
load("./_data/caribou.rda")
load("./_data/habitatdata.rda")
# extact one elk, convert 'xy' to Spatial Points object
mycaribou <- subset(caribou, nickname == "PP.5")
xy.spatial <- SpatialPoints(mycaribou[, c("X", "Y")])
myelevation <- crop(elevation, extent(xy.spatial))
# plot elevations and line
image(myelevation, asp = 1, col = terrain.colors(100))
with(mycaribou, points(X, Y, type = "o", pch = 19, cex = 0.5, col = rgb(0, 0,
0, 0.2)))
# the key command:
mycaribou.mcp <- mcp(xy.spatial, 100)
lines(mycaribou.mcp, col = "red", lwd = 2)
mycaribou.mcp
## Object of class "SpatialPolygonsDataFrame" (package sp):
##
## Number of SpatialPolygons: 1
##
## Variables measured:
## id area
## a a 225232.6
Lets also get some random points from within the MCP. The key function: spsample
:
random.points <- spsample(mycaribou.mcp, 1000, "random")
image(myelevation, col = terrain.colors(100), asp = 1)
plot(lines2010, add = TRUE, col = "lightblue")
points(random.points, col = rgb(1, 0, 0, alpha = 0.5), pch = 19, cex = 0.5)
lines(mycaribou.mcp, col = "red", lwd = 2)
points(mycaribou$X, mycaribou$Y, type = "o", cex = 0.5, col = rgb(0, 0, 0, 0.5),
pch = 19)
This will be useful later when we do Resource Selection Functions.
… are also fun, and relatively simple to obtain. Note that autocorrelation in the movement data will bias your results a lot, so it it better to take a subsample of the data as much as possible. For example, for the MacKenzie herd:
MK.xy <- subset(caribou, Herd == "Mackenzie")[sample(1:36000, 1000), c("X",
"Y")]
plot(elevation)
points(MK.xy, col = rgb(0, 0, 0, 0.2), pch = 19)
Using the ks
function in kde
in ks
and plotting the 50% and 95% intervals:
require(ks)
MK.kde <- kde(x = MK.xy, compute.cont = TRUE)
plot(elevation, xlim = range(MK.xy$X), ylim = range(MK.xy$Y))
points(MK.xy, col = rgb(0, 0, 0, 0.2), pch = 19, cex = 0.5)
plot(MK.kde, add = TRUE, col = "blue", cont = c(50, 95), lwd = 2)
This is useful for picking out a few key ranges (note - its on a locally elevated area). But it is a bit cryptic what the automatically selected bandwidth is. The answer is:
Hpi(MK.xy)
## [,1] [,2]
## [1,] 6617531 5126071
## [2,] 5126071 14194805
But it’s worth looking into the what the underlying assnmptions are (I think its an underestimate). But from this output, you can get a sense of the key ranges. Note, that they are on a locally elevated area. Also, with a bit more fussy code, you can also quantify the area of that range. In short - what this code does is: extract the contour lines from the 2D density, find the contour line that corresponds to 95% area covered, and computes the area using the polyarea
function in pracma
.
contour.95 <- with(MK.kde, contourLines(x = eval.points[[1]], y = eval.points[[2]],
z = estimate, levels = cont["5%"])[[1]])
library(pracma)
with(contour.95, abs(polyarea(x, y))) * 1e-06
## [1] 2172.152
So - approximately - an over 2000 square km total for the “core” range.
As usual, you can run this analysis on any data set you choose to subsample. And, as usual, its best to write a little function that’ll just do it.
Here’s a function that combines the step above:
computeKDE <- function(xy, percent = 95) {
require(pracma)
mykde <- kde(x = xy, compute.cont = TRUE)
mycontour <- with(mykde, contourLines(x = eval.points[[1]], y = eval.points[[2]],
z = estimate, levels = cont[percent]))
myarea = sum(sapply(mycontour, function(c) abs(polyarea(c$x, c$y)))) * 1e-06
return(list(contour = mycontour, area = myarea))
}
Maybe we can now compare the MacKenzie core ranges during the winter and summer season, defining those (somewhat arbitrarily) as Dec-Feb and June-Aug, respectively.
So … first assign seasons to the data:
caribou.daily$season <- factor("shoulder", levels = c("winter", "shoulder",
"summer"))
caribou.daily$season[caribou.daily$Month >= 6 & caribou.daily$Month <= 8] <- "summer"
caribou.daily$season[caribou.daily$Month == 12 | caribou.daily$Month <= 2] <- "winter"
Let’s see if there’s any separation at all:
MK <- subset(caribou.daily, Herd == "Mackenzie")
xy.spatial <- SpatialPoints(MK[, c("X", "Y")])
MK.elevation <- crop(elevation, extent(xy.spatial))
plot(MK.elevation)
with(MK, points(X, Y, col = alpha(c("blue", NA, "red")[season], 0.05), cex = 0.5,
asp = 1, pch = 19))
There doesn’t look like much separation, but the summer range is certainly smaller. Let’s quantify this with our function.
xy.summer <- subset(MK, season == "summer")[, c("X", "Y")]
xy.winter <- subset(MK, season == "winter")[, c("X", "Y")]
# subsample, for less autocorrelation
xy.summer <- xy.summer[sample(1:nrow(xy.summer), 400), ]
xy.winter <- xy.winter[sample(1:nrow(xy.winter), 400), ]
# compute!
MK.kde.summer <- computeKDE(xy.summer, percent = 95)
MK.kde.winter <- computeKDE(xy.winter, percent = 95)
# plot
plot(MK.elevation)
. <- sapply(MK.kde.winter$contour, polygon, col = alpha("blue", 0.2))
. <- sapply(MK.kde.summer$contour, polygon, col = alpha("red", 0.2))
legend("bottomleft", fill = alpha(c("blue", "red"), 0.5), legend = c("winter",
"summer"), title = "95% UD")
# compare the areas
MK.kde.winter$area
## [1] 2757.427
MK.kde.summer$area
## [1] 1202.307
So - considerable overlap but a much smaller summer range.
To quantify the actual overlap, Fieberg and Kochanny (2005) suggest computing a so-called UDOI (UD overlap index): \[ UDOI = A_{1,2} \int_{-\infty}^\infty \int_{-\infty}^\infty UD_1(x,y) \, UD_2(x,y) dxdy\]. Where \(A_{1,2}\) is the “overlap” between the ranges. We’ll do something a bit simpler, the Bhattracharyya index (BA): \[ BA = \int_{-\infty}^\infty \int_{-\infty}^\infty \sqrt{UD_1(x,y)} \, \sqrt{UD_2(x,y)} dxdy\] which ranges from 0 (no overlap) to 1 (perfect overlap). This is actually easier to compute than the SDOI from our results, as long as we have the same basis for our kde.
So:
computeBA <- function(xy.summer, xy.winter) {
xrange <- range(xy.winter[, 1], xy.summer[, 1])
yrange <- range(xy.winter[, 2], xy.summer[, 2])
dx <- diff(xrange)/150
dy <- diff(yrange)/150
kde.summer <- kde(x = xy.summer, xmin = c(xrange[1], yrange[1]), xmax = c(xrange[2],
yrange[2]), gridsize = c(150, 150))
kde.winter <- kde(x = xy.winter, xmin = c(xrange[1], yrange[1]), xmax = c(xrange[2],
yrange[2]), gridsize = c(150, 150))
BA <- sum(sqrt(kde.summer$estimate) * sqrt(kde.winter$estimate)) * dx *
dy
return(BA)
}
computeBA(xy.summer, xy.winter)
## [1] 0.6858689
As suspected … a relatively high amount of overlap!
Again, we can compact the code above and make some computations for all of the herds (except Buffalo West which has no summer observations). And, again, we start with writing a function:
computeSummerWinterRangeArea <- function(myherd, cols = c("blue", "red"), plotme = TRUE) {
mydata <- subset(caribou.daily, Herd == myherd)
xy.summer <- subset(mydata, season == "summer")[, c("X", "Y")]
xy.winter <- subset(mydata, season == "winter")[, c("X", "Y")]
xy.summer <- xy.summer[sample(1:nrow(xy.summer), 200), ]
xy.winter <- xy.winter[sample(1:nrow(xy.winter), 200), ]
my.kde.summer <- computeKDE(xy.summer, percent = 95)
my.kde.winter <- computeKDE(xy.winter, percent = 95)
if (plotme) {
sapply(my.kde.winter$contour, polygon, col = alpha(cols[1], 0.5))
sapply(my.kde.summer$contour, polygon, col = alpha(cols[2], 0.5))
}
BA <- computeBA(xy.summer, xy.winter)
return(data.frame(Herd = myherd, winter.area = my.kde.winter$area, summer.area = my.kde.summer$area,
BA = BA))
}
UDstats <- data.frame(NULL)
winter.cols <- c("red", "darkblue", "black", "forestgreen")
summer.cols <- c("orange", "blue", "white", "green")
with(caribou.daily, plot(X, Y, asp = 1, type = "n"))
herds <- levels(caribou.daily$Herd)[-2]
for (i in 1:length(herds)) UDstats <- rbind(UDstats, computeSummerWinterRangeArea(herds[i],
cols = c(winter.cols[i], summer.cols[i]), plotme = TRUE))
legend("bottomleft", fill = winter.cols, legend = herds, title = "Winter", cex = 0.8,
bty = "n")
legend("bottomright", fill = summer.cols, legend = herds, title = "Summer",
cex = 0.8, bty = "n")
And, the table of computed stats:
UDstats
## Herd winter.area summer.area BA
## 1 Buffalo Lake 730.2461 722.3034 0.001403826
## 2 Hay River Lowlands 10511.9142 10225.1750 0.858832165
## 3 Mackenzie 2912.3780 1284.8687 0.684740915
## 4 Pine Point 1012.1842 414.2093 0.502809438
One of these is NOT like the others. Maybe - interestingly?
Let’s compare the elevation across seasons. This is - in fact - very straightforward. We need only obtain the elevation at each point - a process sometimes referred to as “annotating” a movement dataset. In R, this is done from a raster with the extract
function. We can simply add a column to the caribou.daily
object:
caribou.daily$elevation <- raster::extract(elevation, caribou.daily[, c("X",
"Y")])
ggplot(subset(caribou.daily, Herd != "Buffalo West" & season != "shoulder"),
aes(Herd, elevation, col = season)) + geom_boxplot() + facet_grid(Year ~
.)
Wow - fairly inconsistent patterns!
To test whether the herds are significantly different, we fit an interaction model, a linear additive, and Herd only main effects linear mixed effects model (with ID as a random effect and 1st order autocorrelation in the observations).
require(nlme)
cd <- subset(caribou.daily, Herd != "Buffalo West" & season != "shoulder")
lme.fit0 <- lme(elevation ~ Herd, random = ~1 | ID, correlation = corAR1(),
data = cd, method = "ML")
lme.fit1 <- lme(elevation ~ season, random = ~1 | ID, correlation = corAR1(),
data = cd, method = "ML")
lme.fit2 <- lme(elevation ~ (Herd + season), random = ~1 | ID, correlation = corAR1(),
data = cd, method = "ML")
lme.fit3 <- lme(elevation ~ (Herd * season), random = ~1 | ID, correlation = corAR1(),
data = cd, method = "ML")
fits <- list(lme.fit0, lme.fit1, lme.fit2, lme.fit3)
ldply(fits, function(l) data.frame(formula = paste(as.character(formula(l))[3],
collapse = " "), aic = AIC(l))) %>% mutate(dAIC = aic - min(aic))
## formula aic dAIC
## 1 Herd 120842.2 812.8368
## 2 season 120832.9 803.5223
## 3 (Herd + season) 120816.5 787.1230
## 4 (Herd * season) 120029.4 0.0000
So - in short - the herds are all rather different, with two of the herds (Buffalo Lake and Pine Point) spending more time in higher elevations in the winter than in the summer, and MacKenzie (anyways, a low-elevation population) doing the opposite.
Let’s save caribou.daily
. It’s a pretty useful little data frame!
save(caribou.daily, file = "./_data/caribou.daily.rda")