---
title: "Fecal Indicator Bacteria"
csl: stylefile.csl
bibliography: fibrefs.bib
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
    toc: true
vignette: >
  %\VignetteIndexEntry{Fecal Indicator Bacteria}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  message = F, warning = F, 
  fig.align = 'center', 
  cache = F
)

# libraries
library(tbeptools)
library(dplyr)

# spelling::spell_check_files("vignettes/fib.Rmd")
```

## Background

Fecal Indicator Bacteria (FIB) are used to track concentrations of pathogens in surface waters that may be detrimental to human health and the environment.  Exposure risk is commonly measured with select indicators that are present in the human gut and can enter the environment through wastewater discharges, stormwater, or other illicit sources.  Common indicators include concentrations of *E. coli*, *Enterococcus*, or Fecal Coliform as the number of colony forming units (CFU) per 100 mL of water.

Many monitoring programs routinely measure FIB concentrations at select locations. The tbeptools package has several functions for importing and reporting these data. Two workflows are available:

1) Functions that use data exclusively from the Environmental Protection Commission (EPC) of Hillsborough County
2) Functions that use data from several monitoring programs for baywide reporting focusing exclusively on *Enterococcus*

This vignette is organized around these two workflows.  For both, the assessments are meant to inform progress remediating fecal impairments or to support prioritization of areas for further investigation.  They are not meant to support beach monitoring efforts or closures for recreational uses - alternative reporting products are available for that purpose (see [FLDOH Healthy Beaches](https://www.floridahealth.gov/environmental-health/beach-water-quality/county-detail.html?County=Pinellas&Zip=33701-3109){target="_blank"}). 

## EPC reporting

The Environmental Protection Commission (EPC) of Hillsborough County has been tracking FIB indicators for several decades as part of their long-term monitoring.  Functions in tbeptools can be used to download EPC FIB data, analyze the results, and create summary maps or plots.  This sections describes use of these functions. Most of these functions are focused on reporting for the Hillsborough River fecal coliform impairment and the associated Basin Management Action Plan (BMAP). These tools can be used to track long-term changes in FIBs in this basin to assess progress in reducing fecal coliform levels.   

Data collected from the monitoring program are processed and maintained in a spreadsheet titled `RWMDataSpreadsheet_ThroughCurrentReportMonth.xlsx` available for direct download [here](https://epcbocc.sharepoint.com/:x:/s/Share/EYXZ5t16UlFGk1rzIU91VogBa8U37lh8z_Hftf2KJISSHg?e=8r1SUL&download=1) and viewable [here](https://epcbocc.sharepoint.com/:f:/s/Share/EiypSSYdsEFCi84Sv_6-t7kBUYaXiIqN0B1n2w57Z_V3kQ?e=NdZQcU). These data include observations at all stations and for all parameters throughout the period of record. FIB data are collected at most stations where additional water quality data are collected.  This is the same dataset used for reporting on water quality indicators in Tampa Bay (see the [water quality data](https://tbep-tech.github.io/tbeptools/articles/intro.html) vignette).  The functions in tbeptools can be used to import and analyze these data.

### Read

The main function for importing FIB data is `read_importfib()`.  This function downloads the latest file if one is not already available at the location specified by the `xlsx` input argument.  The function operates similarly as `read_importwq()` for importing water quality data.  Please refer to the [water quality data](https://tbep-tech.github.io/tbeptools/articles/intro.html) vignette for additional details on the import function.  

The FIB data can be downloaded as follows: 

```{r, eval = F}
fibdata <- read_importfib('vignettes/current_data.xlsx', download_latest = T)
```

A data object called `fibdata` is also provided with the package, although it may not contain the most current data available from EPC. View the [help file](https://tbep-tech.github.io/tbeptools/reference/fibdata.html) for the download date. 

```{r}
fibdata
```

The `fibdata` object includes monthly samples for FIB data at select stations in the Hillsborough River basin. Some stations include samples beginning in 1972. The default output for `read_importfib()` returns all stations with FIB data from EPC.  If `all = F` for `read_importfib()`, only stations with `AreaName` as Hillsborough River, Hillsborough River Tributary, Alafia River, Alafia River Tributary, Lake Thonotosassa, Lake Thonotosassa Tributary, and Lake Roberta are returned. Values are returned for *E. coli* (`ecoli`), *Enterococcus* (`entero`), Fecal Coliform (`fcolif`), and Total Coliform (`totcol`).  Units are # of colonies per 100 mL of water (`#/100mL`). Qualifier columns for each are also returned with the `_q` suffix.  Consult the source spreadsheet for interpretation of these codes. Concentrations noted with `<` (below detection) or `>` (above detection) in the raw data are reported as the detection limit. 

The `fibdata` object can be used for the remaining FIB functions. 

### Analyze

Several analysis functions are provided for working with the EPC data. These functions are used internally by the `show` functions described below, but are presented here for an explanation of how the data are processed.  

The `anlz_fibmap()` function assigns categories to each observation in `fibdata` for a selected month and year.  These results are then mapped using `anlz_fibmap()` (see below). The categories are specific to *E. coli* or *Enterococcus* and are assigned based on the station class as freshwater (`class` as 1 or 3F) or marine (`class` as 2 or 3M), respectively.  A station is categorized into one of four ranges defined by the thresholds as noted in the `cat` column of the output, with corresponding colors appropriate for each range as noted in the `col` column of the output.

```{r}
anlz_fibmap(fibdata)
```

The ranges (number of samples / 100 mL) are from EPC and are as follows for *E. coli* or *Enterococcus*.

```{r, echo = F}
fibdata %>%
  anlz_fibmap() %>% 
  select(ind, cat) %>% 
  unique() %>% 
  na.omit() %>%
  mutate(
    Color = case_when(
      cat %in% c("< 35", "< 126") ~ 'Green',
      cat %in% c("35 - 129", "126 - 409") ~ 'Yellow',
      cat %in% c("130 - 999", "410 - 999") ~ 'Orange',
      cat %in% c("> 999") ~ 'Red'
    ), 
    Color = factor(Color, levels = c('Green', 'Yellow', 'Orange', 'Red'))
  ) %>% 
  arrange(ind, Color) %>% 
  mutate(
    ind = ifelse(duplicated(ind), '', ind), 
    Color = case_when(
      Color == 'Green' ~ '<span style="color: #2DC938">Green</span>',
      Color == 'Yellow' ~ '<span style="color: #E9C318">Yellow</span>',
      Color == 'Orange' ~ '<span style="color: #EE7600">Orange</span>',
      Color == 'Red' ~ '<span style="color: #CC3231">Red</span>'
    )
  ) %>% 
  select(Indicator = ind, Color, Range = cat) %>% 
  knitr::kable()
```

The `yrsel` and `mosel` arguments can be used to filter results by year and month.  Not specifying these arguments will return results for the entire period of record. 

```{r}
anlz_fibmap(fibdata, yrsel = 2023, mosel = 7)
```

The `areasel` argument can indicate either `"Alafia"` or `"Hillsborough"` to select data for the corresponding river basins, where rows in `fibdata` are filtered based on the selection.  All stations are returned if this argument is `NULL` (default). The Alafia River basin includes values in the `area` column of `fibdata` as `"Alafia River"` and `"Alafia River Tributary"`.  The Hillsborough River basin includes values in the `area` column of `fibdata` as `"Hillsborough River"`, `"Hillsborough River Tributary"`, `"Lake Thonotosassa"`, `"Lake Thonotosassa Tributary"`, and `"Lake Roberta"`.  Not all areas may be present based on the selection for `yrsel` and `mosel`. All valid options for `areasel` include `"Alafia River"`, `"Hillsborough River"`, `"Big Bend"`, `"Cockroach Bay"`, `"East Lake Outfall"`, `"Hillsborough Bay"`, `"Little Manatee"`, `"Lower Tampa Bay"`, `"McKay Bay"`, `"Middle Tampa Bay"`, `"Old Tampa Bay"`, `"Palm River"`, `"Tampa Bypass Canal"`, or `"Valrico Lake"`.

```{r}
anlz_fibmap(fibdata, yrsel = 2023, mosel = 7, areasel = 'Hillsborough River')
```

The `anlz_fibmatrix()` function creates a summary of FIB categories by station and year as output for the `show_fibmatrix()` function described below.  The function assigns Microbial Water Quality Assessment (MWQA) letter categories for each station and year based on the likelihood that fecal coliform concentrations will exceed 400 CFU / 100 mL.  By default, the results for each year are based on a right-centered window that uses the previous two years and the current year to calculate probabilities from the monthly samples (`lagyr = 3`).  The columns for each station and year include the estimated geometric mean of fecal coliform concentrations (`gmean`) and a category indicating a letter outcome based on the likelihood of exceedences (`cat`).  The `indic` argument must be set explicitly as `'fcolif'` to indicate the indicator as fecal coliform for the EPC data.

```{r}
anlz_fibmatrix(fibdata, indic = 'fcolif')
```

### Show

The `show_fibmap()` function creates a map of FIB sites and thresholds based on output from `anlz_fibmap()`.  The same arguments that apply to `anlz_fibmap()` also apply to `show_fibmap()` such that freshwater and marine stations categorized by relevant thresholds are plotted by a selected year, month, and area.  Unlike `anlz_fibmap()`, the `yrsel` and `mosel` arguments are required.

```{r, out.width="100%"}
show_fibmap(fibdata, yrsel = 2023, mosel = 7, areasel = NULL)
```

Sites for the Hillsborough or Alafia river basins can be shown using the `areasel` argument.

```{r, out.width="100%"}
show_fibmap(fibdata, yrsel = 2023, mosel = 7, areasel = 'Hillsborough River')
show_fibmap(fibdata, yrsel = 2023, mosel = 7, areasel = 'Alafia River')
```

Additional information about a site can be seen by placing the cursor over a location.  A map inset can also be seen by clicking the arrow on the bottom left of the map.  

The `show_fibmatrix()` function creates a stoplight graphic of summarized FIB data at selected stations for each year of available data [@Morrison09].  The matrix colors are based on the likelihood that fecal indicator bacteria concentrations exceed 400 CFU / 100 mL (using Fecal Coliform, `fcolif` in `fibdata`).  The likelihoods are categorized as A, B, C, D, or E (Microbial Water Quality Assessment or MWQA categories) with corresponding colors, where the breakpoints for each category are <10%, 10-30%, 30-50%, 50-75%, and >75% (right-closed). Methods and rationale for this categorization scheme are provided by the Florida Department of Environmental Protection, Figure 8 in @pbsj08 and @Morrison09.

```{r, fig.height = 8, fig.width = 3}
show_fibmatrix(fibdata)
```

By default, the results for each year are based on a right-centered window that uses the previous two years and the current year to calculate probabilities from the monthly samples (`lagyr = 3`).  This example shows results using only the monthly observations in each year. 

```{r, fig.height = 8, fig.width = 3}
show_fibmatrix(fibdata, lagyr = 1)
```

The default stations are those used in TBEP report #05-13 [@tbep0513] for the Hillsborough River Basin Management Action Plan (BMAP) subbasins.  These include Blackwater Creek (WBID 1482, EPC stations 143, 108), Baker Creek (WBID 1522C, EPC station 107), Lake Thonotosassa (WBID 1522B, EPC stations 135, 118), Flint Creek (WBID 1522A, EPC station 148), and the Lower Hillsborough River (WBID 1443E, EPC stations 105, 152, 137).  Other stations in `fibdata` can be plotted using the `stas` argument.

```{r, fig.height = 8, fig.width = 1.5}
show_fibmatrix(fibdata, stas = c(115, 116))
```

The `yrrng` argument can also be used to select a year range, where the default is 1985 to the most current year of data in `fibdata`.  

```{r, fig.height = 8, fig.width = 3}
show_fibmatrix(fibdata, yrrng = c(1990, 2020))
```

If preferred, the matrix can also be returned in an HTML table that can be sorted and scrolled. Only the first ten rows are shown by default.  The default number of rows (10) can be changed with the `nrows` argument.  Use a sufficiently large number to show all rows.

```{r}
show_fibmatrix(fibdata, asreact = TRUE)
```

A plotly (interactive, dynamic plot) object can be returned by setting the `plotly` argument to `TRUE`. 

```{r, fig.height = 8, fig.width = 3}
show_fibmatrix(fibdata, plotly = TRUE)
```

## Baywide reporting

The second workflow uses a baywide approach to summarize FIB data.  Select stations were identified at downstream locations that drain into Tampa Bay and considered important watershed endpoints for FIB monitoring.  *Enterococcus* is the primary indicator because these stations are located at terminal downstream locations that are tidally influenced.  The functions are organized similarly as the EPC reporting workflow, with some unique functions for working with data from these locations and other functions repeated from the EPC workflow that differ in the output depending on the data inputs. 

### Read

The main function for importing *Enterococcus* data is `read_importentero()`. This function retrieves data from the USEPA water quality portal using their API.  The three arguments are `stas`, `startDate`, and `endDate`.  The `stas` argument can be left as `NULL` (default) to retrieve data from all stations based on those in the `catchprecip` data object, described below.  The `startDate` and `endDate` arguments specify the date ranges for retrieving data, where the input format for each is a character string as `'YYYY-MM-DD'`.  

```{r, eval = F}
read_importentero(startDate = '1995-01-01', endDate = '2023-12-31')
```

The data request can take some time and the `enterodata` data object is provided with the package for use with all downstream functions.  This dataset includes all data from the 53 selected stations from 1995-2023.  

```{r}
head(enterodata)
```

The downstream functions also require precipitation data obtained using the `read_importrain()` function.  This function downloads daily precipitation data from the Southwest Florida Water Management District (SWFWMD) rainfall [FTP website](ftp://ftp.swfwmd.state.fl.us/pub/radar_rainfall/Daily_Data/).  For each station, daily cumulative rainfall is summarized for each upstream catchment, where the catchments for each site are defined by pixel locations used to describe the SWFWMD rainfall data. This information is available in the `catchpixels` data object.

Rainfall data is downloaded by defining years and months of interest.

```{r, eval = F}
read_importrain(2021, catchpixels, mos = 1:12, quiet = F)
```

As for the `enterodata` data object, the `catchprecip` object is provided with the package for use with all downstream functions.  This dataset includes daily rainfall data (inches) for the 53 selected stations from 1995-2023.  The rainfall data is used to define *Enterococcus* samples as "wet" or "dry" based on default or user-defined thresholds described below. 

```{r}
head(catchprecip)
```

### Analyze  

Several analysis functions are provided for working with *Enterococcus* data.  These functions are used internally by the `show` functions described below, but are presented here for an explanation of how the data are processed.  

Each function uses *Enterococcus* and precipitation data provided by the `enterodata` and `catchprecip` data objects. The latter dataset is used to define "wet" or "dry" samples with the premise that *Enterococcus* concentrations are higher in wet weather and it may be useful to distinguish these samples to assess progress in achieving water quality restoration goals, i.e., rainfall may confound an assessment of management efforts to reduce fecal contamination. 

Each `anlz` function has optional arguments that define the `temporal_window` and `wet_threshold` for defining "wet" or "dry" samples, which are passed to the `anlz_fibwetdry()` function.  These arguments define a period of time preceding a sample date and cumulative rainfall threshold within the time period that must be met to define a sample as "wet".  These arguments default to two days and half an inch, such that samples are defined as "wet" if they have greater than half an inch of cumulative rainfall in the two days preceding and including the sample date.  The time and rainfall thresholds can be changed by the user.  Additionally, the `anlz` functions can also treat all samples equally by ignoring any rainfall data by setting `wetdry = FALSE`, which is the default behavior. 

The `anlz_fibwetdry()` function defines "wet" or "dry" samples as described above and returns the original input dataset with three additional columns describing the total rain (inches) on the day of sampling (`rain_sampleDay`), the total rain in the period defined by the `temporal_window` argument (`rain_total`), and whether the sample is "wet" or not as a logical value (`wet_sample`).

```{r}
anlz_fibwetdry(enterodata, catchprecip, temporal_window = 2, wet_threshold = 0.5)
```

The remaining `anlz` functions are `anlz_enteromap()` to prepare data for mapping and `anlz_fibmatrix()` to prepare data for a score card.  Both can optionally use `anlz_fibwetdry()` to plot "wet" or "dry" samples, described further in the `show` section. 

The `anlz_enteromap()` function is an *Enterococcus*-specific analogue to the `anlz_fibmap()` fecal coliform function described in the EPC section above. The function assigns categories to each observation in the *Enterococcus* data frame, which can be viewed for a given month and year using `show_enteromap()` (analagous to `show_fibmap()`). The categories are specific to *Enterococcus* in marine waters, and are noted in the `cat` column of the output. Corresponding colors are in the `col` column of the output.  

```{r}
anlz_enteromap(enterodata)
```

The ranges (number of samples / 100 mL) are from EPC and are as follows for *Enterococcus*:  

```{r, echo = F}
fibdata %>%
  anlz_fibmap() %>% 
  filter(ind == "Enterococcus") %>%
  select(cat) %>% 
  unique() %>% 
  na.omit() %>% 
  mutate(
    Color = case_when(
      cat == "< 35" ~ '<span style="color: #2DC938">Green</span>',
      cat == "35 - 129" ~ '<span style="color: #E9C318">Yellow</span>',
      cat == "130 - 999" ~ '<span style="color: #EE7600">Orange</span>',
      cat == "> 999" ~ '<span style="color: #CC3231">Red</span>'
    )
  ) %>% 
  select(Color, Range = cat) %>% 
  knitr::kable()
```

The `yrsel` and `mosel` arguments can be used to filter results by year and month.  Not specifying these arguments will return results for the entire period of record. 

```{r}
anlz_enteromap(enterodata, yrsel = 2020, mosel = 8)
```

The `wetdry` argument can be used to determine whether a sample was taken after a rain event (logical `wet_sample` column in output), based on user-specified thresholds and a provided precipitation data object (`catchprecip`). Below shows how to identify wet samples based on at least 0.5 inches of rain occurring two days prior to and including the sample date.

```{r}
anlz_enteromap(enterodata, wetdry = TRUE, precipdata = catchprecip,
               temporal_window = 2, wet_threshold = 0.5)
```

The `areasel` argument can indicate one or any of the major subwatersheds in Tampa Bay (excluding Terra Ceia Bay where no data exist). For example, use `Old Tampa Bay` for stations in the subwatershed of Old Tampa Bay, where rows in `enterodata` are filtered based on the selection.  All stations are returned if this argument is set as `NULL` (default). All valid options for `areasel` include `"Old Tampa Bay"`, `"Hillsborough Bay"`, `"Middle Tampa Bay"`, `"Lower Tampa Bay"`, `"Boca Ciega Bay"`, or `"Manatee River"`.

```{r}
anlz_enteromap(enterodata, yrsel = 2023, mosel = 7, areasel = 'Old Tampa Bay')
```

The `anlz_fibmatrix()` function is used with the `show_fibmatrix()` function and is used similarly as for the EPC workflow described above. The function assigns Microbial Water Quality Assessment (MWQA) letter categories for each station and year based on the likelihood that *Enterococcus* concentrations will exceed 130 CFU / 100 mL.  By default, the results for each year are based on a right-centered window that uses the previous two years and the current year to calculate probabilities from the monthly samples (`lagyr = 3`).  The columns for each station and year include the estimated geometric mean of fecal coliform concentrations (`gmean`) and a category indicating a letter outcome based on the likelihood of exceedences (`cat`).  The `indic` argument must be set explicitly as `'entero'` to select the indicator as *Enterococcus*.

```{r}
anlz_fibmatrix(enterodata, indic = 'entero')
```

### Show  

The `show_enteromap()` function creates a map of *Enterococcus* sites and thresholds based on output from `anlz_enteromap()`. The same arguments that apply to `anlz_enteromap()` also apply to `show_enteromap()`, including classification of samples as 'wet' or not depending on specified thresholds. Wet and dry samples are differentiated on the map by their shapes.  Unlike `anlz_enteromap()`, the `yrsel` and `mosel` arguments are required.

```{r, out.width="100%"}
show_enteromap(enterodata, yrsel = 2020, mosel = 9)
```

```{r, out.width="100%"}
show_enteromap(enterodata, yrsel = 2020, mosel = 9, wetdry = TRUE,
               temporal_window = 2, wet_threshold = 0.5)
```

Additional information about a site can be seen by placing the cursor over a location.  A map inset can also be seen by clicking the arrow on the bottom left of the map.

Sites for specific areas can be shown using the `areasel` argument. 

```{r, out.width="100%"}
show_enteromap(enterodata, yrsel = 2023, mosel = 7, areasel = 'Old Tampa Bay')
```

The `show_fibmatrix()` function creates a stoplight graphic of summarized FIB data at selected stations for each year of available data.  The function was primarily designed for fecal coliform data, but has been adapted to work with *Enterococcus* data.  The matrix color codes years and stations based on the likelihood of fecal indicator bacteria concentrations exceeding 130 CFU / 100 mL for *Enterococcus* (`entero` in both `fibdata` and `enterodata`). The likelihoods are categorized as A, B, C, D, or E (Microbial Water Quality Assessment or MWQA categories) with corresponding colors, where the breakpoints for each category are <10%, 10-30%, 30-50%, 50-75%, and >75% (right-closed). Methods and rationale for this categorization scheme are provided by the Florida Department of Environmental Protection, Figure 8 in @pbsj08 and @Morrison09.  All stations are shown by default.

```{r, fig.height = 8, fig.width = 9}
show_fibmatrix(enterodata, indic = 'entero')
```

By default, the results for each year are based on a right-centered window that uses the previous two years and the current year to calculate probabilities from the monthly samples (`lagyr = 3`).  This example shows results using only the monthly observations in each year.

```{r, fig.height = 8, fig.width = 9}
show_fibmatrix(enterodata, indic = 'entero', lagyr = 1)
```

Individual stations can be selectd using the `stas` argument.

```{r, fig.height = 8, fig.width = 1.5}
show_fibmatrix(enterodata,
               indic = 'entero',
               stas = c('21FLHILL_WQX-101', '21FLHILL_WQX-102', '21FLHILL_WQX-103'))
```

The `yrrng` argument can also be used to select a year range, where the default is the date range contained in the data.

```{r, fig.height = 5, fig.width = 9}
show_fibmatrix(enterodata, indic = 'entero', yrrng = c(2015, 2020))
```

Note that the `subset_wetdry` argument can be used with `show_fibmatrix()` to show only wet or dry samples based on the thresholds provided by `temporal_window` and `wet_threshold`.  However, this is not recommended because the scores are probability-based and comparisons between wet or dry samples may be misleading due to different sample sizes, and therefore, power to detect the likelihood of exceeding the threshold.  Specifically, there are far fewer wet samples than dry and these samples will generally receive higher grades due to lower power of the statistical tests.

As for the EPC data, an HTML table can be returned with `show_fibmatrix()` using `asreact = TRUE` and a plotly object can be returned using `plotly = TRUE`.  See the above section for examples of these outputs.

## Retrieving additional FIB data

The `read_importwqp()` function can be used to retrieve data from the USEPA Water Quality Portal using an organization identifier. The data can be retrieved as follows and will typically take less than one minute to download. 

```{r, eval=FALSE}
# get Manatee County data
mancodata <- read_importwqp(org = '21FLMANA_WQX', type = 'fib', trace = T)

# get Pinellas County data
pincodata <- read_importwqp(org = '21FLPDEM_WQX', type = 'fib', trace = T)
```

# References