# 3. Extract data within a shapefile using ERDDAP

This tutorial will teach you how to extract and display SST values for a particular time period or average SST over the whole time-series available within a shapefile. The shapefile for the NOAA Marine National Monument and sanctuaries boundaries can be downloaded here: http://sanctuaries.noaa.gov/library/imast_gis.html.
We are going to extract SST data for the Papahanaumokuakea Marine National Monument (PMNM) in Hawaii. However, because the Monument boundaries cross the dateline, the shapefile provided on the website is tricky to work with. We'll work with a cleaned up version, available here: https://oceanwatch.pifsc.noaa.gov/files/PMNM_bounds.csv
This tutorial is also available as a Jupyter notebook.

```import pandas as pd import numpy as np import urllib.request import xarray as xr import netCDF4 as nc from matplotlib import pyplot as plt from matplotlib.colors import LinearSegmentedColormap from shapely.geometry import Point, Polygon import geopandas as gpd np.warnings.filterwarnings('ignore')```

`df=pd.read_csv('PMNM_bounds.csv')`
Transform the boundary to a Polygon
```geometry = [Point(xy) for xy in zip(df.lon, df.lat)] poly = Polygon([(p.x, p.y) for p in geometry])```
`poly` ## Data extraction

The example below extracts monthly 5km CoralTemp SST data within the monument boundary.
• We are going to download data from ERDDAP for the smallest bounding box that contains our polygon
```xcoord1 = (np.min(df.lon), np.max(df.lon)) ycoord1 = (np.min(df.lat), np.max(df.lat))```
• let's select a date range:
`tcoord = ("2019-01-15", "2019-12-15")`
• and let's build our ERDDAP URL:
`url='``https://oceanwatch.pifsc.noaa.gov/erddap/griddap/CRW_sst_v1_0_monthly.nc?analysed_sst[('+`` tcoord +'):1:('+ tcoord +')][('+ str(ycoord1) +'):1:('+ str(ycoord1) +')][(' + str(xcoord1) +'):1:('+ str(xcoord1) +')]'`
`url`
`'https://oceanwatch.pifsc.noaa.gov/erddap/griddap/CRW_sst_v1_0_monthly.nc?analysed_sst[(2019-01-15):1:(2019-12-15)][(19.2345832):1:(31.79786423)][(177.84422):1:(198.9827)]'`
`urllib.request.urlretrieve(url, "sst.nc")`
• and load it as an xarray dataset:
`ds = xr.open_dataset('sst.nc',decode_cf=False)`
`ds.analysed_sst.shape`
`(12, 252, 424)`
We now have data for a box around our polygon, for 12 monthly time steps (= 1 year).

## Masking the data outside the Monument boundary

The `.within()` function from the `shapely`package checks if a point is within a polygon. We are using it to create a mask which will take the value 1 within the polygon boundary, and NaN outside.
(This takes about 1min or less to run).
```mask=np.empty((len(ds.latitude.values),len(ds.longitude.values))) mask[:]=np.NaN for i in range(len(ds.latitude.values)): for j in range(len(ds.longitude.values)): p=Point(ds.longitude.values[j],ds.latitude.values[i],) if int(p.within(poly))==1: mask[i,j]=int(p.within(poly))```
`plt.contourf(ds.longitude,ds.latitude,mask)` `SST=ds.analysed_sst*mask`
`np.min(SST),np.max(SST)`
`array(16.863333), array(28.78)`
```levs = np.arange(16, 29, 0.05) jet=["blue", "#007FFF", "cyan","#7FFF7F", "yellow", "#FF7F00", "red", "#7F0000"] cm = LinearSegmentedColormap.from_list('my_jet', jet, N=len(levs))```
`country = gpd.read_file("gz_2010_us_outline_20m.json")`
```country.plot(figsize=(12,8),color='black') plt.xlim(-183,-153) plt.ylim(18,32) cs=plt.contourf(ds.longitude-360,ds.latitude,SST[3,:,:],levs,cmap=cm) cbar=plt.colorbar(fraction=0.022) cbar.ax.tick_params(labelsize=12) cs.ax.tick_params(labelsize=12) plt.title('SST - April 2019', fontsize=20)``` 