---
title: "Quality control overview"
vignette: >
  %\VignetteIndexEntry{qcoverview}
  %\VignetteEngine{quarto::html}
  %\VignetteEncoding{UTF-8}
knitr:
  opts_chunk:
    collapse: true
    comment: '#>'
---

```{r}
#| label: setup
#| include: false
#| message: false
#| warning: false
library(AquaSensR)
```

The quality control (QC) functions in AquaSensR can be used once the required data are successfully imported into R (see the [inputs vignette](inputs.html) for details). This vignette covers the primary functions of the QC workflow:

- `utilASRflag()`: Applies four independent QC checks to a selected parameter and returns a data frame of flag results.
- `anlzASRflag()`: Produces an interactive time-series plot of those flags for visual review.

## Load the data

The examples throughout this vignette use the example files bundled with the package. Import both files before proceeding:

```{r}
contpth <- system.file("extdata/ExampleCont1.xlsx", package = "AquaSensR")
dqopth <- system.file("extdata/ExampleDQO.xlsx", package = "AquaSensR")

contdat <- readASRcont(contpth)
dqodat <- readASRdqo(dqopth)
```

`contdat` is a data frame with columns `DateTime`, and one numeric column per parameter. `dqodat` contains the parameter-specific data quality objectives (DQOs) for each check. See the [inputs vignette](inputs.html) for more information on the required formats.

## `utilASRflag()` to flag continuous data

`utilASRflag()` is the primary QC function. It applies four independent checks to the chosen parameter in `contdat`.

### Arguments

Three arguments are required for the function: 

| Argument | Description |
|----------|-------------|
| `cont` | `contdat` data frame returned by `readASRcont()` |
| `dqo` | `dqodat` data frame returned by `readASRdqo()` |
| `param` | Name of the parameter column to evaluate (must match a column in `contdat` and a `Parameter` entry in `dqodat`) |

### Basic usage

Pass the two data frames and the name of the parameter to evaluate:

```{r}
flagdat <- utilASRflag(contdat, dqodat, param = "Water_Temp_C")
head(flagdat)
```

### Output

`utilASRflag()` returns a data frame with the following columns:

| Column | Description |
|--------|-------------|
| `DateTime` | Observation timestamp |
| *`param`* | The evaluated parameter values |
| `gross_flag` | Flag from the gross range check |
| `spike_flag` | Flag from the spike check |
| `roc_flag` | Flag from the rate-of-change check |
| `flat_flag` | Flag from the flatline check |

Each flag column contains one of three values: `"pass"`, `"suspect"`, or `"fail"`. Checks are independent of each other such that a single observation can receive any combination of flags across the four columns.

If no row in `dqodat` matches a parameter a warning is returned and the function leaves all flags as `"pass"` and continues.

## QC checks explained

AquaSensR implements four QC checks that reflect widely used sensor data quality standards. The underlying concepts and code borrow heavily from the [ContDataQC](https://leppott.github.io/ContDataQC) package. All threshold values are set in the data quality objectives file and can be customised per parameter.  Manual update of these thresholds is likely necessary to avoide false positives and negatives.  Importantly, these flags require manual verification and should not be used to automatically exclude data without review.

Any threshold value set to `NA` in the `dqodat` file is silently skipped such that the corresponding severity level is not applied and affected observations remain `"pass"` for that check. This applies to the `"Suspect"` and `"Fail"` rows independently, so individual checks or severity levels can be disabled selectively by leaving their threshold columns blank in the input file.

### 1. Gross range

**DQO columns:** `GrMin`, `GrMax` (thresholds differ by row: `Flag = "Fail"` vs `Flag = "Suspect"`)

**Flag column:** `gross_flag`

The gross range check tests whether each observation falls within absolute physical or sensor limits. It is the broadest of the four checks and is intended to catch values that are simply impossible or outside the expected operating range of the instrument.

Each observation is compared to the thresholds in the two data quality objectives rows for that parameter:

- Values below `GrMin` or above `GrMax` in the `"Fail"` row return `"fail"`
- Values below `GrMin` or above `GrMax` in the `"Suspect"` row but within the fail bounds return `"suspect"`

The fail thresholds define hard physical limits (e.g., water temperature cannot be below −5 °C for a freshwater deployment). The suspect bounds are set somewhat more conservatively to flag readings that are unusual but not impossible.

Any threshold can be set to `NA` in the data quality objectives file to skip that particular flag.

Quickly view how many flags of each type were generated by the gross range check:

```{r}
# Check which observations received a gross range flag
table(flagdat$gross_flag)
```

### 2. Spike

**DQO columns:** `Spike` (threshold differs by row: `Flag = "Fail"` vs `Flag = "Suspect"`)

**Flag column:** `spike_flag`

The spike check detects sudden, anomalous jumps (either up or down) between consecutive observations. It computes the absolute difference between each reading and the one immediately before it, then compares that difference to the thresholds in the two data quality objectives rows for that parameter:

- |diff| ≥ `Spike` in the `"Suspect"` row returns `"suspect"`
- |diff| ≥ `Spike` in the `"Fail"` row returns `"fail"`

The first observation in each series has no predecessor and is always left as `"pass"`. Because the spike check flags the observation at the large step, a single anomalous reading embedded in otherwise stable data will generate two flagged observations — one for the step up (or down) to the outlier, and one for the step back to baseline.

The spike thresholds are absolute.  For example, a 5 °C step is flagged regardless of whether the surrounding series is calm or noisy. The rate-of-change check (below) evaluates potentially spurious changes when relative variability matters.

Quickly view how many flags of each type were generated by the spike check:

```{r}
table(flagdat$spike_flag)
```

### 3. Rate of change

**DQO columns:** `RoCStDv`, `RoCHours` (thresholds differ by row: `Flag = "Fail"` vs `Flag = "Suspect"`)

**Flag column:** `roc_flag`

The rate-of-change (RoC) check is an adaptive counterpart to the spike check. Rather than comparing against a fixed step size, the check determines whether a step is large relative to the recent variability in the series.

For each observation the function:

1. Collects all values within a trailing `RoCHours`-hour window ending just before that timestamp (the current observation is excluded).
2. Computes the standard deviation (SD) of those preceding values.
3. Multiplies the SD by `RoCStDv` to produce a contextual threshold.
4. Flags the observation if the absolute lag-1 difference exceeds that threshold — `"suspect"` using the `"Suspect"` row thresholds and `"fail"` using the `"Fail"` row thresholds.

At least two values must fall within the window before a standard deviation can be computed; observations with fewer window values are not flagged. Each row is evaluated independently, so either or both severity levels can be active. Setting `RoCStDv` or `RoCHours` to `NA` for a row skips that severity level entirely.

The key advantage over the spike check is sensitivity scaling.  During a "calm" period, a small absolute change can exceed the threshold, while during a naturally variable period (e.g., diurnal temperature swings) the threshold rises accordingly.

Quickly view how many flags of each type were generated by the rate of change check:

```{r}
table(flagdat$roc_flag)
```

### 4. Flatline

**DQO columns:** `FlatN`, `FlatDelta` (thresholds differ by row: `Flag = "Fail"` vs `Flag = "Suspect"`)

**Flag column:** `flat_flag`

The flatline check identifies periods where a sensor appears to be stuck at a constant value, which can occur from sensor fouling, burial, or loss of power. The check counts the length of "runs" of near-identical consecutive values and flags observations whose run length reaches a specified count.

A run is defined by a minimum length (`FlatN`) and tolerance (`FlatDelta`), each read from the appropriate data quality objectives row. An observation extends the current run only when the range (max minus min) of all values in the run so far — including the new observation — is strictly **less than** `FlatDelta`. A change equal to `FlatDelta` is not treated as flat and resets the run. When the condition fails the run length resets to 1 starting from the current observation. The range-based approach prevents both single large jumps and slow cumulative drift from accumulating run length.

- Run length ≥ `FlatN` (using `FlatDelta` tolerance) from the `"Suspect"` row returns `"suspect"`
- Run length ≥ `FlatN` (using `FlatDelta` tolerance) from the `"Fail"` row returns `"fail"`

The suspect and fail thresholds are evaluated independently using their respective delta tolerances, so the two run lengths may differ. Either row can have `NA` values to skip that level.

Quickly view how many flags of each type were generated by the flatline check:

```{r}
table(flagdat$flat_flag)
```

## `anlzASRflag()` to visualise flag results

The flags generated by `utilASRflag()` can be viewed using the `anlzASRflag()` function.  This produces an interactive time-series plot:

```{r}
#| out-width: 100%
anlzASRflag(flagdat)
```

The plot shows all observations as a continuous line. Non-passing observations are overlaid as coloured markers, with colour encoding the check type and shape encoding the severity:

| Check | Colour |
|-------|--------|
| Gross range | Red |
| Spike | Orange |
| Rate of change | Purple |
| Flatline | Blue |

| Severity | Marker shape |
|----------|-------------|
| Suspect | Upward triangle |
| Fail | Cross (×) |

An observation flagged by more than one check appears as overlying markers for each check, so that all potential issues remain visible. Hovering over a marker reveals the check name, severity, parameter value, and timestamp. Items in the legend can be clicked to toggle visibility of a check or severity level, which is useful for reviewing specific flags in a busy plot.  The plot can also be zoomed and panned to focus on specific periods.

A second parameter can be overlaid on the plot by passing a two-column data frame (with `DateTime` and the parameter of interest) to the `overlay` argument.  The overlay is drawn as a light blue line on a right-side y-axis, making it easy to see whether flagged observations in one parameter co-occur with changes in another.

```{r}
#| out-width: 100%
overlay_df <- contdat[, c("DateTime", "DO_pctsat")]
anlzASRflag(flagdat, overlay = overlay_df)
```

## `editASRflag()` to review and clean data interactively

`editASRflag()` opens a Shiny application that lets you inspect the flag plot for every parameter and selectively remove observations before exporting the cleaned data back to R.  The app uses `utilASRflag()` and `anlzASRflag()` under the hood to generate the flags and plots, but adds interactive selection tools to facilitate data cleaning.

The app can be opened by providing `contdat` and `dqodat` as arguments to `editASRflag()`.  The app lets you interactively evaluate your data until you click **Done / Close**, at which point the cleaned data are returned to your R session.

```r
cleaned <- editASRflag(contdat, dqodat)
```

### Interface overview

![The main editASRflag interface. The left sidebar contains parameter selection, overlay options, linked-removal controls, and the removed-points table. The flag plot for the selected parameter is in the center. The DQO Settings panel (right, not shown here) is accessed by clicking the toggle on the right edge of the plot area.](figures/editASRflag_main.png){fig-alt="Screenshot of the editASRflag Shiny app showing the flag plot and left sidebar." width="100%"}

### Selecting and removing points

Zoom and pan with the plot toolbar (visible when the pointer hovers over the plot, top-right corner) to focus on regions of interest before selecting.  Three removal methods are available:

- **Click**: remove a single point directly on the line or flag marker.
- **Box Select**: drag a rectangle over a region to remove multiple points at once.
- **Lasso Select**: draw a free-form outline around the points you want to remove.

After a box or lasso selection, double-click the plot background to clear the selection highlight before starting a new one.  Each removal action is logged in the **Removed Points** table in the left sidebar (scroll down to view and/or expand the sidebard by clicking and dragging the edge to the right).

### Sidebar controls

| Control | Action |
|---------|--------|
| **Parameter** | Switch between parameters.  Prev/Next buttons cycle through all parameters. |
| **Overlay** | Display a second parameter from `contdat` on a right-side axis. |
| **USGS Overlay** | Enter a USGS site number and select a parameter type, then click **Load** to fetch continuous data from NWIS and display it on the secondary axis.  Loading USGS data clears any contdat overlay. Selecting a contdat overlay clears the USGS data.  Site numbers can be found using the [NWIS Mapper](https://apps.usgs.gov/nwismapper). |
| **Linked Removal** | When checked, propagate every removal to all other parameters simultaneously.  Undo restores all parameters together in the same batch. |
| **Undo Last Removal** | Restore the most recently removed point or selection batch.  Linked parameters are restored together. |
| **Start Over** | Restore all removed points for every parameter and reset all DQO thresholds to their original values. |
| **Export Progress** | Save the current cleaned data and DQO thresholds as Excel files in a ZIP archive.  If any points have been removed, a removed-observations file is included. |
| **Done / Close** | Stop the app and return the cleaned data. |

The **USGS Overlay** feature uses `readASRusgs()` internally to pull unit-value (continuous) data from the [NWIS API](https://waterservices.usgs.gov) over the same date range as `contdat`.  Supported parameter types are streamflow (00060), gage height (00065), and precipitation (00045).  The fetched time series is displayed on the secondary y-axis in the same position as a contdat overlay but is retrieved live when Load is clicked.  Users without an internet connection or outside NWIS coverage can still use the contdat Overlay selector instead.

### DQO Settings panel

Clicking the toggle on the right edge of the plot area opens the **DQO Settings** panel. The panel shows the numeric QC thresholds from `dqodat` for the currently selected parameter across all four checks and both severity levels.

![The DQO Settings panel, showing editable Suspect and Fail threshold inputs for each of the four QC checks.](figures/editASRflag_dqo.png){fig-alt="Screenshot of the editASRflag DQO Settings panel with numeric inputs for gross range, spike, rate of change, and flatline thresholds." width="100%"}

Editing thresholds and clicking **Apply** re-computes flags for the current parameter while preserving any points already removed.  **Reset to original** reverts the inputs to the values from the original `dqo` file and re-evaluates all flags.  Threshold edits are per-parameter and independent.

### Return value

`editASRflag()` returns a named list with three elements:

| Element | Description |
|---------|-------------|
| `contdat` | The original data frame sorted by `DateTime`, with all removed observations replaced by `NA`. |
| `dqodat` | The DQO thresholds data frame reflecting any edits made in the DQO Settings panel.  If no edits were made the values are identical to the input. |
| `removed` | A stacked data frame of every removed observation, with columns `Parameter`, `DateTime`, and all four flag columns. |

```r
# View the cleaned continuous data
View(cleaned$contdat)

# Inspect the final DQO thresholds used
cleaned$dqodat

# View what was removed and its flags
View(cleaned$removed)
```

Removed rows in `contdat` are set to `NA` rather than dropped so the time series remains regular and aligned across all parameters.