In this tutorial, we illustrate how to use R (specically, our in-house above
package) to query the Movebank EnvData database to obtain environmental covariates in space and time. There are a bunch of datasets available, including modeled climate / weather / wind, land cover and elevation models, MODIS land, ocean and snow, and more. A tidy table is here: https://www.movebank.org/node/7471. A complete (but undocumented) list is here: http://www.bioinfo.mpg.de/orn-gateway/services2.jsp. At that site, there are simple forms that allow you to request annotations for (correctly formatted) location data (http://www.bioinfo.mpg.de/orn-gateway/submit-track.jsp), as well as gridded data (http://www.bioinfo.mpg.de/orn-gateway/submit-grid.jsp).
In this example, we obtain the NDVI data for an area surrounding Washington DC every week for the year 2014, essentially replicating the “gridded” analysis. We do this using some functions in the above
package, which simplifies the code. Do explore the code within the functions to “look under the hood”.
First, load the latest version of the above
package (from GitHub):
## require(devtools)
## install_github("ABoVE-AotM/above")
require(above)
There are three steps (and - basically - only three functions) to this process:
.csv
file that contains the lat-long and timestamps to be annotated, using createEnvDataGrid()
.xml
file that contains the data type and interpolation to be requested. Use createEnvDataRequest()
uploadEnvDataRequest()
Ok, there are a few more steps:
Wait
Download the annotated data, using downloadEnvData()
Finally:
The data file has to be a text csv with several constraints on the formatting:
timestamp
,location-long
,location-lat
,height-above-ellipsoid
2007-04-20 0:00:00.000
. They will be interpreted as UTC,You will get errors if you use capital letters, spaces, quotations, or leave out the decimal seconds in the timestamps
This is more or less the format in which Movebank provides movement data - the annotation of which is the primary purpose of the EnvData machinery. We, however, would like a grid around the Washington DC area. This is obtained using the createEnvDataGrid()
function, which asks for a vector of latitudes and a vector of longitudes, a start and end time (which can be left blank) and temporal interval. In the example below we take a fairly tight box around Washington DC and generate a dataset with :
lats <- seq(38.8, 39.05, length = 40)
lons <- seq(-77.15, -76.85, length = 40)
start <- ymd("2014-01-01")
finish <- ymd("2014-12-31")
dt <- ddays(7)
DC.example <- createEnvDataGrid(lats, lons, start, finish, dt = dt, savefile = FALSE)
head(DC.example)
## timestamp location.long location.lat
## 1 2014-01-01 00:00:00.000 -77.15 38.80000
## 2 2014-01-01 00:00:00.000 -77.15 38.80641
## 3 2014-01-01 00:00:00.000 -77.15 38.81282
## 4 2014-01-01 00:00:00.000 -77.15 38.81923
## 5 2014-01-01 00:00:00.000 -77.15 38.82564
## 6 2014-01-01 00:00:00.000 -77.15 38.83205
## height.above.ellipsoid
## 1
## 2
## 3
## 4
## 5
## 6
Here’s where this grid is located (Potomac river running south, Chesapeake Bay is to the east):
require(maps);
require(mapdata)
maps::map('worldHires',
xlim = range(lons) + c(-.35,.35), ylim = range(lats) + c(-.25,.25),
fill = TRUE, col="grey", bor=NA); box(); axis(1); axis(2)
title("DC / Maryland / Virgina")
with(DC.example, points(location.long, location.lat, cex=0.2, col=rgb(0,0,.5,.1), pch = 19))
We need to create a file that saves this grid. This is done automatically by createEnvDataGrid
. If you don’t provide a file name, it will prompt you. Thus:
DC.example <- createEnvDataGrid(lats, lons, start, finish, dt = dt,
savefile = TRUE, fileout = "DC2014grid.csv")
When you manually enter your data request at the EnvData website, it generates a script in xml
which provides the instructions to the server regarding the annotation. You have to create this script in R and save it is a separate .xml
file on your computer.
The createEnvDataRequest()
function does just that. There are a few more variables to fill out here:
createEnvDataRequest(request_name = "DC2014request",
type_name = "modis-land/MOD13Q1.005",
variable_name = "250m 16 days NDVI",
interpolation_type = "inverse-distance-weighted",
dist_func_spatial = "geodetic",
email = "egurarie@umd.edu",
savefile = FALSE)
## <AnnotationRequest annotationType="track2" name="DC2014request" notificationEmail="egurarie@umd.edu">
## <Elements>
## <AnnotationRequestElement variableLabel="modis-land/MOD13Q1.005/250m 16 days NDVI">
## <Properties>
## <AnnotationRequestElementProperty name="interpolation-type" value="inverse-distance-weighted"/>
## <AnnotationRequestElementProperty name="type-name" value="modis-land/MOD13Q1.005"/>
## <AnnotationRequestElementProperty name="type" value="simple"/>
## <AnnotationRequestElementProperty name="distance-function-spatial" value="geodetic"/>
## <AnnotationRequestElementProperty name="variable-names" value="250m 16 days NDVI"/>
## </Properties>
## </AnnotationRequestElement>
## </Elements>
## </AnnotationRequest>
Comments:
request_name
is just a name you come up withtype_name
comes from the list of available services here: http://www.bioinfo.mpg.de/orn-gateway/services2.jsp. They must be entered exactly as seen with no spaces. Here we’re using the MODIS land 13Q1 products.variable_name
is the specific data we are interested in, inside of the relevant folder (here - in http://www.bioinfo.mpg.de/orn-gateway/variables2.jsp?typeName=modis-land/MOD13Q1.005 there is a 250m 16 days NDVI
. Again - must be entered preciselyinterpolation_type
- there’s a great page and visualization here: https://www.movebank.org/node/6400. Basically - if there are potential gaps (as in MODIS) the “inverse-distance-weighted” is your best bet. Other options are “bilinear” and “nearest-neighbor”.dist_func_spatial
is the function used for distance computation … the “geodetic” default is probably generally fine.email
is the email address that a notification will be sent to youAs before, we want to save this as a file, this time with an .xml
extention. The function will prompt you for a filename if you don’t provide it. Thus:
createEnvDataRequest(request_name = "DCrequest", type_name = "modis-land/MOD13Q1.005",
variable_name = "250m 16 days NDVI",
interpolation_type = "inverse-distance-weighted",
dist_func_spatial = "geodetic", email = "egurarie@umd.edu",
savefile = TRUE, fileout = "DC2014request.xml")
This is very straightforward: just provide the names of the csv and xml files you created above. Also you’ll need your login and password (which are NOT the same as the Movebank logins) - if not available, you will be prompted for them. Also, you can provide the name of a file to save
DC2014status <- uploadEnvDataRequest(csv.file = "DC2014grid.csv",
xml.file = "DC2014request.xml",
login = NULL, password = NULL,
link.file = "DCstatus.url")
Several things are going to happen - but be patient, this can be slow, depending on the data that has or has not been previously :
* Trying 130.183.17.84...
* Connected to www.bioinfo.mpg.de (130.183.17.84) port 80 (#0)
> POST /orn-gateway/request-annotation-xml-2.jsp HTTP/1.1
Host: www.bioinfo.mpg.de
Accept: */*
Content-Length: 6068309
Expect: 100-continue
Content-Type: multipart/form-data; boundary=------------------------a9ac772a8cefc47f
< HTTP/1.1 100 Continue
< HTTP/1.1 302 Found
< Date: Fri, 13 Oct 2017 03:44:34 GMT
< Server: Apache
< Set-Cookie: JSESSIONID=B67EC77297AB51412D8AAA601F143AE4; Path=/orn-gateway
< Location: http://www.bioinfo.mpg.de/orn-gateway/status.jsp?access-key=1384734324402597834
< Content-Type: text/html
< Content-Length: 0
* HTTP error before end of send, stop sending
<
* Closing connection 0
That’s it! Next comes waiting. And if you provided your email, you will get a message letting you know when the data product is ready.
If you click on the link created by the upload function - you will be prompted to download the annotated data. In our case: http://www.bioinfo.mpg.de/orn-gateway/download.jsp?access-key=4736001142410832953
If you’d like to automate this process in R, you can use the downloadEnvData()
function, which takes as an argument the output of uploadEnvDataRequest
:
## DC2014.ndvi <- downloadEnvData(DC2014status, filename = "DC2014.ndvi.csv")
Here’s the data head:
head(DC2014.ndvi)
## timestamp location.long location.lat
## 1 2014-01-01 00:00:00.000 -77.15 38.80000
## 2 2014-01-01 00:00:00.000 -77.15 38.80641
## 3 2014-01-01 00:00:00.000 -77.15 38.81282
## 4 2014-01-01 00:00:00.000 -77.15 38.81923
## 5 2014-01-01 00:00:00.000 -77.15 38.82564
## 6 2014-01-01 00:00:00.000 -77.15 38.83205
## height.above.ellipsoid modis.land.MOD13Q1.005.250m.16.days.NDVI
## 1 NA 0.4086726
## 2 NA 0.4393350
## 3 NA 0.4098175
## 4 NA 0.4281713
## 5 NA 0.5146234
## 6 NA 0.4373630
Now comes the fun part. The following code makes imageplots of the NDVI around Washington DC for each week of the year:
# simplify names and modify table
names(DC2014.ndvi) <- c("time", "lon", "lat", "height", "ndvi")
dc.ndvi <- mutate(DC2014.ndvi, lat = round(lat, 5), lon = round(lon, 5), day = as.Date(DC2014.ndvi$time))
# set range and breakes
ndvi.range <- range(dc.ndvi$ndvi)
breaks <- seq(-.2, 1, length=21)
times <- unique(dc.ndvi$time)
lats <- unique(dc.ndvi$lat) %>% sort
lons <- unique(dc.ndvi$lon) %>% sort
par(mar = c(0,0,0,0), mfrow = c(4,13), oma = c(2,2,2,2))
for(i in 1:length(times[-1])){ #for each distinct time
ndvi.today <- subset(dc.ndvi, time == times[i]) #ndvi.today = the NDVI for that time
M.ndvi.today <- t(matrix(ndvi.today$ndvi, nrow = length(lats)))
plot.new(); plot.window(xlim = range(lons), ylim = range(lats))
maps::map("worldHires", xlim = range(lons), ylim = range(lats), fill=TRUE, col="grey", add=TRUE)
image(lons, lats, M.ndvi.today, col=topo.colors(20, alpha = 0.6), add=TRUE, breaks = breaks)
mtext(side = 3, as.Date(times[i]), line = -1, cex=0.8)
}
You can very clearly see the almost perfect low-vegetation diagonal square of the city, surrounded by much greener surroundings.
And a time series of NDVI across the year (with quartiles):
q <- aggregate(ndvi~day, dc.ndvi, quantile, c(.25,.5,.75))
q <- data.frame(q[,1], q[,2])
names(q) <- c("day", "low", "med", "high")
with(q,{
plot(day, med, ylim = range(q[,2:4]), type="l", lwd=2, ylab = "NDVI", xlab="", main = "DMV - 2014")
lines(day, low, ylim = range(q[,2:4]), type="l")
lines(day, high, ylim = range(q[,2:4]), type="l")
})
I first moved out to Maryland at the end of January, 2014, so I remember well those amazing cold and snowy weeks, and that you could cross-country ski, basically, on any of the side streets. Sadly, no winter since has come close.