4. Extract data along a turtle track
Last updated
Last updated
This tutorial will teach you how to plot a loggerhead turtle track on a map.
That turtle was raised in captivity in Japan, then tagged and released on 05/04/2005 in the Central Pacific. It transmitted for over 3 years and went all the way to the Southern tip of Baja California!
The track data can be downloaded here: https://oceanwatch.pifsc.noaa.gov/files/25317_05.dat
Then we'll extract chlorophyll concentration and SST at each location along the track, and plot the data.
library(mapdata)
library(maptools)
library(date)
Let's load the track data:
T=read.csv('25317_05.dat',header=TRUE)
names(T)=c('lon','lat','year','month','day')
xlim=c(120,255)
ylim=c(15,50)
land <- maps::map('world2', fill=TRUE, xlim=xlim, ylim=ylim, plot=FALSE)
ids <- sapply(strsplit(land$names, ":"), function(x) x[1])
bPols <- map2SpatialPolygons(land, IDs=ids, proj4string=CRS("+proj=longlat +datum=WGS84"))
plot(bPols, col="grey", axes=FALSE,xlim=xlim,ylim=ylim,cex.axis=3, main='Turtle #25317')
x=seq(120,260,10)
axis(1,x,sapply(paste(-x+360,'ºW',sep=''), function(x) x[1]))
y=seq(10,50,10)
axis(2,las=1,y,sapply(paste(y,'ºN',sep=''), function(x) x[1]))
box()
lines(T$lon,T$lat,lwd=2)
points(T$lon[1],T$lat[1],pch=25,col=2,bg=2)
We are going to grab data from ERDDAP, so we need to set up the ERDDAP URL using the datasets ID and the name of the variable we are interested in. Note that we are requesting the data as .csv
MOD.d = "https://oceanwatch.pifsc.noaa.gov/erddap/griddap/aqua_chla_1d_2018_0.csv?chlor_a"
Ideally, we would work with daily data since we have one location per day. But chlorophyll data is severely affected by clouds (i.e. lots of missing data), so you might need to use weekly or even monthly data to get sufficient non-missing data. We will start with the monthly chl-a data since it contains fewer gaps. As a separate exercise, you can run all 3 of them, and plot a time-series of each to compare.
MOD.w = "https://oceanwatch.pifsc.noaa.gov/erddap/griddap/aqua_chla_8d_2018_0.csv?chlor_a"
MOD.m = "https://oceanwatch.pifsc.noaa.gov/erddap/griddap/aqua_chla_monthly_2018_0.csv?chlor_a"
lon = T$lon
lat = T$lat
We need to format the dates in a way that ERDDAP understands, i.e. 2010-12-15:
dates=mdy.date(T$month,T$day,T$year)
dates2 = format(as.Date(dates), "%Y-%m-%d")
For each date and location, we'll extract a value of monthly chl-a concentration. To do this, we need to pass those parameters (which dataset, which date, which lon, and which lat) to ERDDAP by building the URL in a loop for each point of the track.
NOTE: Because this is a very long track, running the loop on the entire track will take a while (about 15 mins). We are printing the index 'i' at each step so you can gauge progress. If it takes too long, just run the loop on the first 100 points of the track for example, by changing the 'for' statement:
for (i in 1:100) {
The full loop:
tot = rep(NA, 4)
for (i in 1:dim(T)[1]) {
print(c(i, dim(T)[1]))
#this is where the URL is built:
url = paste(MOD.m, "[(", dates2[i], "):1:(", dates2[i], ")][(", lat[i], "):1:(", lat[i], ")][(", lon[i], "):1:(", lon[i], ")]", sep = "")
new = read.csv(url, skip = 2, header = FALSE)
tot = rbind(tot, new)
}
If the process breaks in the middle of the loop (sometimes, there is a connection issue to the ERDDAP server, which will cause an error and interrupt the loop), get the value of 'i', which will tell you which index the loop stopped at, and restart the loop there. For example, if the loop stopped at i=163, restart the loop this way:
for (i in 163:dim(T)[1]) {
(Do not rerun the line above the 'for' statement. If you rerun this: tot = rep(NA, 4)
, it will erase all the chl-a values that were extracted until now).
Once the loop is done running, let's format our results:
tot = tot[-1, ]
names(tot) = c("date", "matched_lat", "matched_lon", "matched_chl.m")
result = data.frame(T, tot)
write.csv(result, 'turtle-track-chl.m.csv', row.names = FALSE)
We now have a value of monthly chlorophyll-a concentration for each location/date combination along the turtle track.
Exercise 1: Repeat the steps above with a different dataset. For example, extract sea surface temperature data using the following dataset: https://oceanwatch.pifsc.noaa.gov/erddap/griddap/CRW_sst_v1_0.html
Exercise 2: Go to an ERDDAP of your choice, find a dataset of interest, generate the URL, copy it and edit the script above to run a match up on that dataset. To find other ERDDAP servers, you can use this search engine:
Note! some ERDDAPs are slower than others, so this could take a lot longer. If it takes too long, adjust the for
loop to request data for only the first 100 days of our track.
Let's plot the track, color coded using values of monthly chlorophyll concentration. Most of the mapping code is the same, except we are using the layout function to split the graphics window since we'll be adding a color scale.
xlim=c(120,255)
ylim=c(15,50)
layout(matrix(c(1,2,3,0,4,0), nrow=1, ncol=2), widths=c(7,1), heights=4)
layout.show(2)
par(mar=c(3,3,3,1))
land <- maps::map('world2', fill=TRUE, xlim=xlim, ylim=ylim, plot=FALSE)
ids <- sapply(strsplit(land$names, ":"), function(x) x[1])
bPols <- map2SpatialPolygons(land, IDs=ids, proj4string=CRS("+proj=longlat +datum=WGS84"))
plot(bPols, col="grey", axes=FALSE,xlim=xlim,ylim=ylim,cex.axis=3, main='Turtle #25317')
x=seq(120,260,10)
axis(1,x,sapply(paste(-x+360,'ºW',sep=''), function(x) x[1]))
y=seq(10,50,10)
axis(2,las=1,y,sapply(paste(y,'ºN',sep=''), function(x) x[1]))
box()
Now we need to create a color code for each value of chlorophyll:
jet.colors <-colorRampPalette(c("#00007F", "blue", "#007FFF", "cyan","#7FFF7F", "yellow", "#FF7F00", "red", "#7F0000"))
Let's look at the range of log of monthly chlorophyll values:hist(log(result$matched_chl.m),30)
The range is -2.9 to 2.2 but most of the values are between -2.9 and 0. We use the log because the range of chlorophll values can be pretty big, with lots of very low values, and a few very high values. (close the histogram graphics window).
breaks=c(seq(-2.9,0,0.1),2.2)
n=length(breaks)-1
c=jet.colors(n)
for (i in 1:dim(result)[1]) {
I=which(breaks>log(result$matched_chl.m[i]))
points(result$lon[i],result$lat[i],pch=19,col=c[I[1]-1])
}
Plot the color scale using the scale.R file:
source('scale.R')
par(mar=c(3,1,3,3))
image.scale(result$matched_chl.m, col=c, breaks=breaks, horiz=FALSE, yaxt="n",xlab='',ylab='',main='log(chl)')
axis(4,las=1)
box()
Exercise 3: plot the track, color coded using values of monthly sea surface temperature. Note: you do not need to use a log scale for SST.
We are going to use the 'rxtracto' function to repeat this exercise. The rerddapXtracto package was written by Roy Mendelssohn (SWFSC/ERD), based on code originally developed by Dave Foley for Matlab. The package was specifically designed to work with ERDDAP data in R. It's a bit less customizable, but the code is much simpler.
library(rerddapXtracto)
library(plotdap)
library(date)
T=read.csv('25317_05.dat')
names(T)=c('lon','lat','year','month','day')
xcoord=T$lon
ycoord=T$lat
dates=mdy.date(T$month,T$day,T$year)
dates2 = format(as.Date(dates), "%Y-%m-%d")
tcoord=dates2
dataset <- 'aqua_chla_monthly_2018_0'
#Use rerddap to get dataset metadata
#if you encounter an error reading the nc file clear the rerrdap cache:
rerddap::cache_delete_all(force = TRUE)
dataInfo <- rerddap::info(dataset, url="https://oceanwatch.pifsc.noaa.gov/erddap/")
#Display the metadata
dataInfo
aqua_chla_monthly_2018_0
Base URL: https://oceanwatch.pifsc.noaa.gov/erddap/
Dimensions (range):
time: (2002-07-16T12:00:00Z, 2020-08-16T12:00:00Z)
latitude: (-89.97918, 89.97916)
longitude: (0.02083333, 359.9792)
Variables:
chlor_a:
Units: mg m^-3
Double check dataInfo to make sure the dataset covers the time, longitude, and latitude ranges in your XYT data.
Use the name of the chlorophyll parameter that was displayed above in dataInfo: parameter <- “chlorophyll”
Use the xcoord, ycoord, and tcoord vectors you extracted from the marlin tag file.
Some datasets have an altitude dimension. If so, then zcood must be included in the rxtracto call. The “aqua_chla_monthly_2018_0
” dataset does not include an altitude dimension.
Define the search “radius” for the gridded data. The rxtracto function allows you to set the size of the box used to collect data around the track points using the xlen and ylen arguments. The values for xlen and ylen are in degrees. For our example we will use 0.1 degrees for both arguments. Note: You can also submit vectors for xlen and ylen, as long as they are the same length as xcoord, ycoord, and tcoord
Run the rxtracto function to extract the data from ERDDAP.
parameter <- 'chlor_a'
xlen <- 0.1
ylen <- 0.1
Some datasets have an altitude dimension. If so, then zcoord must be included in the rxtracto call.
If the dataInfo shows an altitude dimension, uncomment "zcoord <- 0" and include zcoord=zcoord in the rxtracto call.
#zcoord <- 0.
chl <- rxtracto(dataInfo, parameter=parameter, xcoord=xcoord, ycoord=ycoord, tcoord=tcoord, xlen=xlen, ylen=ylen)
For the map, we want to plot the log of chl-a, because there are a lot of very low values and a few very high values. This can be done by making a custom function 'myFunc', and using it in the 'plotTrack' function.
myFunc=function(x) log(x)
plotTrack(chl, xcoord, ycoord, tcoord, plotColor = 'algae',myFunc=myFunc,name="log_chla",size=5)
The 'size' parameter controls the size of each point.
You can select different color palettes as argument to the 'plotColor' parameter. Available choices are from the cmocean project: https://matplotlib.org/cmocean/