AquaSensR inputs and checks

AquaSensR requires two input files to use the functions in the package:

  1. Continuous monitoring data: time series of sensor observations at a site, one column per parameter.
  2. Data Quality Objectives: parameter-specific data quality objectives used by the four QC checks (gross range, spike, rate of change, and flatline).

The DQO file is an Excel workbook (.xlsx). The continuous monitoring data file can be an Excel workbook (.xlsx), a CSV file (.csv), or a comma-delimited text file (.txt). This vignette describes how to import and check each input dataset. It is critical that the input datasets follow the exact specified format. Example files with the correct format are included with the package and are used throughout.

Load the package

Load the package in an R session after installation:

library(AquaSensR)

File paths

First, specify the location of the two files by saving their paths to R variables. In practice you will supply paths to your own files, for example:

contpth <- "path/to/your/ContinuousData.xlsx"
dqopth <- "path/to/your/DQO.xlsx"

The examples below use the files included with the package:

contpth <- system.file("extdata/ExampleCont1.xlsx", package = "AquaSensR")
dqopth <- system.file("extdata/ExampleDQO.xlsx", package = "AquaSensR")

Continuous monitoring data

Use readASRcont() to import continuous monitoring data. The function reads the Excel file, automatically runs a series of checks via checkASRcont(), and then formats the result for downstream use. The tz argument sets the time zone for the output DateTime column (see OlsonNames() for valid values). The default value is Eastern without daylight savings (Etc/GMT+5) and does not need to be set explicitly, unless you need a different time zone. For example, if your data are in local time and the time zone observes DST, consider using a time zone like America/New_York that will automatically adjust for daylight savings.

AquaSensR accepts two input formats for the date and time information. The examples below demonstrate both.

Format 1 — separate Date and Time columns (ExampleCont1.xlsx):

contdat <- readASRcont(contpth)
#> Running checks on continuous data...
#>  Checking column names... OK
#>  Checking Date, Time are present... OK
#>  Checking at least one parameter column is present... OK
#>  Checking date format... OK
#>  Checking time format... OK
#>  Checking for missing values... OK
#>  Checking parameter columns for non-numeric values... OK
#> 
#> All checks passed!

Format 2 — combined DateTime column (ExampleCont2.xlsx):

contpth2 <- system.file("extdata/ExampleCont2.xlsx", package = "AquaSensR")
contdat2 <- readASRcont(contpth2)
#> Running checks on continuous data...
#>  Checking column names... OK
#>  Checking DateTime is present... OK
#>  Checking at least one parameter column is present... OK
#>  Checking DateTime format... OK
#>  Checking for missing values... OK
#>  Checking parameter columns for non-numeric values... OK
#> 
#> All checks passed!

Both calls return identically structured output (see Output format below).

Format requirements

The continuous monitoring data file must follow one of two accepted schemas. Additional unrecognised columns will trigger an error.

Format 1: separate Date and Time columns

Column Description
Date Observation date, parseable by lubridate::parse_date_time() in year-first (e.g., 2024-06-01), month-first (e.g., 06/01/2024), or day-first (e.g., 01/06/2024) formats
Time Observation time in 24-hour (e.g., 16:30:33), 12-hour AM/PM (e.g., 4:30:33 PM), or Excel-native format (e.g., 1899-12-31 16:30:33)
At least one parameter column Column name must match a Parameter entry in paramsASR (e.g., Water_Temp_C)

Format 2: combined DateTime column

Column Description
DateTime Combined date and time with the date in year-first (e.g., 2024-06-01 16:30:33), month-first (e.g., 06/01/2024 16:30:33), or day-first format, combined with 24-hour or 12-hour AM/PM time (e.g., 2024-06-01 4:30:33 PM)
At least one parameter column Column name must match a Parameter entry in paramsASR (e.g., Water_Temp_C)

Currently, AquaSensR allows the following parameters. Note the inclusion of the units in the parameter name. Make sure the parameter name matches the units used in your data.

Description Required file name Units
Air Temp (C) Air_Temp_C deg C
Air Temp (F) Air_Temp_F deg F
Air BP (psi) Air_BP_psi psi
Air BP (mmHg) Air_BP_mmHg mmHg
Chlorophyll-a (μg/l) Chlorophylla_ug_l ug/l
Chlorophyll-a (RFU) Chlorophylla_RFU RFU
Pheophytin (μg/l) Pheophytin_ug_l ug/l
Pheophytin (RFU) Pheophytin_RFU RFU
pCO2 (ppm) pCO2_ppm ppm
Conductivity (μS/cm) Conductivity_uS_cm uS/cm
Salinity (ppt) Salinity_ppt ppt
Specific Conductance (μS/cm) Sp_Conductance_uS_cm uS/cm
Cyanobacteria (μg/l) Cyanobacteria_ug_l ug/l
Phycocyanin (μg/l) Phycocyanin_ug_l ug/l
Phycoerythrin (μg/l) Phycoerythrin_ug_l ug/l
DO (mg/l) DO_mg_l mg/l
DO Adjusted (mg/l) DO_adj_mg_l mg/l
DO (% Sat) DO_pctsat %
CDOM (mg/l) CDOM_mg_l mg/l
FDOM (mg/l) FDOM_mg_l mg/l
E. coli (#/100ml) E_coli_#_100ml #/100ml
E. coli (CFU/100ml) E_coli_CFU_100ml CFU/100ml
Discharge (cfs) Discharge_cfs cfs
Nitrate (μg/l) Nitrate_ug_l ug/l
PAR (μmol/m2/s) PAR_umol_m2_s umol/m2/s
pH pH_SU None
TDS (mg/l) TDS_mg_l mg/l
TSS (mg/l) TSS_mg_l mg/l
Turbidity (NTU) Turbidity_NTU NTU
Turbidity (FNU) Turbidity_FNU FNU
Gage Height (ft) Gage_Height_ft ft
Sensor Depth (ft) Sensor_Depth_ft ft
Water Pressure (psi) Water_Pressure_psi psi
Water Pressure (mmHg) Water_Pressure_mmHg mmHg
Water Temp (C) Water_Temp_C deg C
Water Temp (F) Water_Temp_F deg F

The list above can also be viewed in R with the paramsASR dataset, which is included in the package and used for the checks.

paramsASR
#> # A tibble: 36 × 6
#>    `Parameter Group` Parameter uom   Label `WQX Parameter` `WQX Unit of measure`
#>    <chr>             <chr>     <chr> <chr> <chr>           <chr>                
#>  1 Air Temp          Air_Temp… deg C Air … Temperature, a… deg C                
#>  2 Air Temp          Air_Temp… deg F Air … Temperature, a… deg F                
#>  3 Barometric Press… Air_BP_p… psi   Air … Barometric pre… psi                  
#>  4 Barometric Press… Air_BP_m… mmHg  Air … Barometric pre… mmHg                 
#>  5 Chlorophyll       Chloroph… ug/l  Chlo… Chlorophyll a … ug/l                 
#>  6 Chlorophyll       Chloroph… RFU   Chlo… Chlorophyll a … RFU                  
#>  7 Chlorophyll       Pheophyt… ug/l  Pheo… Pheophytin a    ug/l                 
#>  8 Chlorophyll       Pheophyt… RFU   Pheo… Pheophytin a    RFU                  
#>  9 CO2               pCO2_ppm  ppm   pCO2… Partial Pressu… ppm                  
#> 10 Conductivity      Conducti… uS/cm Cond… Conductivity    uS/cm                
#> # ℹ 26 more rows

Checks performed

The readASRcont() function imports the data and runs a series of checks using the checkASRcont() function. Most checks stop with an informative error if they fail, except the check for missing values which produces a warning since these may occur in continuous data. The checks evaluate the following:

  1. Column names: all columns are either Date, Time, DateTime, or a recognised parameter from paramsASR.
  2. Required columns present: either Date and Time (Format 1) or DateTime (Format 2).
  3. At least one parameter column: at least one column matches an entry in paramsASR$Parameter.
  4. Date format (Format 1 only): all values in Date are parseable by lubridate::parse_date_time() in year-first, month-first, or day-first formats.
  5. Time format (Format 1 only): all values in Time are parseable by lubridate::parse_date_time() in 24-hour, 12-hour AM/PM, or Excel-native formats.
  6. DateTime format (Format 2 only): all values in DateTime are parseable by lubridate::parse_date_time() with year-first, month-first, or day-first date order combined with 24-hour or 12-hour AM/PM time.
  7. Missing values: NA values in parameter columns produce a warning listing the affected columns and row numbers. Missing values in DateTime, Date, or Time columns remain an error.
  8. Numeric parameter columns: all parameter columns contain numeric values.

Example: triggering an error

Adding an unrecognised column causes checkASRcont() to stop immediately. The following examples demonstrate this for both formats.

nms <- names(readxl::read_excel(contpth, n_max = 0))
col_types <- ifelse(nms %in% c("Date", "Time", "DateTime"), "text", "guess")
contdat_raw <- suppressWarnings(
  readxl::read_excel(
    contpth,
    col_types = col_types,
    na = c("NA", "na", ""),
    guess_max = Inf
  )
)

contdat_raw$BadColumn <- 1

checkASRcont(contdat_raw)
#> Running checks on continuous data...
#> Error:
#> !    Checking column names...
#>  Please correct the column names or remove: BadColumn

Output format

After passing all checks, readASRcont() returns a data frame with the same structure regardless of input format:

  • DateTime: time-zone-aware POSIXct column
  • One numeric column per parameter present in the input file
head(contdat)
#> # A tibble: 6 × 8
#>   DateTime            Water_Temp_C DO_pctsat DO_mg_l Conductivity_uS_cm TDS_mg_l
#>   <dttm>                     <dbl>     <dbl>   <dbl>              <dbl>    <dbl>
#> 1 2024-08-14 13:56:33         24.2      76.9    6.44               410.      266
#> 2 2024-08-14 13:56:43         24.2      76.7    6.43               410.      266
#> 3 2024-08-14 13:56:53         24.2      76.6    6.42               410.      266
#> 4 2024-08-14 13:57:03         24.2      76.5    6.41               410.      266
#> 5 2024-08-14 13:57:13         24.2      76.3    6.4                409       266
#> 6 2024-08-14 13:57:23         24.2      76.3    6.39               409.      266
#> # ℹ 2 more variables: Salinity_ppt <dbl>, pH_SU <dbl>
head(contdat2)
#> # A tibble: 6 × 8
#>   DateTime            Water_Temp_C DO_pctsat DO_mg_l Conductivity_uS_cm TDS_mg_l
#>   <dttm>                     <dbl>     <dbl>   <dbl>              <dbl>    <dbl>
#> 1 2024-08-14 13:56:33         24.2      76.9    6.44               410.      266
#> 2 2024-08-14 13:56:43         24.2      76.7    6.43               410.      266
#> 3 2024-08-14 13:56:53         24.2      76.6    6.42               410.      266
#> 4 2024-08-14 13:57:03         24.2      76.5    6.41               410.      266
#> 5 2024-08-14 13:57:13         24.2      76.3    6.4                409       266
#> 6 2024-08-14 13:57:23         24.2      76.3    6.39               409.      266
#> # ℹ 2 more variables: Salinity_ppt <dbl>, pH_SU <dbl>

Data quality objectives

The data quality objectives file includes various information for the quality control checks applied to each parameter (see the quality control vignette for details). Use readASRdqo() to import the data quality objectives. The function reads the workbook, runs checks via checkASRdqo(), and returns a formatted data frame.

dqodat <- readASRdqo(dqopth)
#> Running checks on data quality objectives...
#>  Checking column names... OK
#>  Checking all columns present... OK
#>  Checking at least one parameter is present... OK
#>  Checking parameter format... OK
#>  Checking Flag column... OK
#>  Checking columns for non-numeric values... OK
#> 
#> All checks passed!

Format requirements

The workbook must contain exactly the following columns (all required; thresholds you do not want to apply should be left blank / NA):

Column Description
Parameter Parameter name matching paramsASR$Parameter
Flag Flag level for the thresholds in the row, either “Fail” or “Suspect”
GrMin Gross range, lower threshold
GrMax Gross range, upper threshold
Spike Spike, absolute step size for a flag
FlatN Flatline, run length at which a flag is triggered
FlatDelta Flatline, the run range (max minus min) must be strictly less than this value to continue the run; a change equal to or greater than FlatDelta resets the run
RoCStDv Rate of change, multiplier applied to the rolling SD (flag if \|diff\| > SD × RoCStDv)
RoCHours Rate of change, look-back window length in hours

Checks performed

The readASRdqo() function imports the data quality objectives and runs a series of checks using the checkASRdqo() function. The checks evaluate the following and stops with an informative error if any check fails:

  1. Column names: Should include only Parameter, Flag, GrMin, GrMax, Spike, FlatN, FlatDelta, RoCStDv, and RoCHours
  2. All columns present: All columns from the previous check should be present
  3. At least one parameter is present: At least one parameter in the Parameter column matches the Parameter column in paramsASR
  4. Parameter format: All parameters listed in the Parameter column should match those in the Parameter column in paramsASR
  5. Flag column: The Flag column should contain only “Fail” or “Suspect” entries
  6. Numeric columns: All columns except Parameter and Flag should be numeric values

Example: triggering an error

Supplying an unrecognised parameter name fails the parameter format check:

# import the data for the example
dqodat_raw <- suppressWarnings(
  readxl::read_excel(dqopth, na = c("NA", "na", ""), guess_max = Inf)
)

# introduce a typo in the Parameter column
dqodat_raw$Parameter[1] <- "WaterTemp"

checkASRdqo(dqodat_raw)
#> Running checks on data quality objectives...
#>  Checking column names... OK
#>  Checking all columns present... OK
#>  Checking at least one parameter is present... OK
#> Error:
#> !    Checking parameter format...
#>  Incorrect parameter format: WaterTemp

Output format

After passing all checks, readASRdqo() returns a data frame with the columns listed in the format requirements table above, with all threshold columns coerced to numeric.

head(dqodat)
#> # A tibble: 6 × 9
#>   Parameter    Flag    GrMin GrMax Spike FlatN FlatDelta RoCStDv RoCHours
#>   <chr>        <chr>   <dbl> <dbl> <dbl> <dbl>     <dbl>   <dbl>    <dbl>
#> 1 Water_Temp_C Suspect  -0.5    28   1.5    60      0.01       6       25
#> 2 Water_Temp_C Fail     -1      30   2     100      0.01       8       25
#> 3 DO_pctsat    Suspect   0     100  10      30      0.01       6       25
#> 4 DO_pctsat    Fail     -1     120  25      60      0.01      NA       NA
#> 5 DO_mg_l      Suspect   2      16   2      30      0.01       6       25
#> 6 DO_mg_l      Fail      1      18   4      60      0.01      NA       NA

The remaining functions in AquaSensR can now be used after the continuous data and data quality objectives files are successfully imported.