Title: | Quality Control and Analysis of Massachusetts Water Quality Data |
---|---|
Description: | Methods for quality control and exploratory analysis of surface water quality data collected in Massachusetts, USA. Functions are developed to facilitate data formatting for the Water Quality Exchange Network <https://www.epa.gov/waterdata/water-quality-data-upload-wqx> and reporting of data quality objectives to state agencies. Quality control methods are from Massachusetts Department of Environmental Protection (2020) <https://www.mass.gov/orgs/massachusetts-department-of-environmental-protection>. |
Authors: | Marcus Beck [aut, cre] , Jill Carr [aut], Ben Wetherill [aut] |
Maintainer: | Marcus Beck <[email protected]> |
License: | CC0 |
Version: | 2.1.5 |
Built: | 2024-11-22 17:26:05 UTC |
Source: | https://github.com/massbays-tech/MassWateR |
Analyze trends by date in results file
anlzMWRdate( res = NULL, param, acc = NULL, sit = NULL, fset = NULL, thresh, group = c("site", "locgroup", "all"), threshlab = NULL, threshcol = "tan", site = NULL, resultatt = NULL, locgroup = NULL, dtrng = NULL, ptsize = 2, repel = FALSE, labsize = 3, expand = c(0.05, 0.1), confint = FALSE, palcol = "Set2", yscl = "auto", sumfun = yscl, colleg = FALSE, ttlsize = 1.2, bssize = 11, runchk = TRUE, warn = TRUE )
anlzMWRdate( res = NULL, param, acc = NULL, sit = NULL, fset = NULL, thresh, group = c("site", "locgroup", "all"), threshlab = NULL, threshcol = "tan", site = NULL, resultatt = NULL, locgroup = NULL, dtrng = NULL, ptsize = 2, repel = FALSE, labsize = 3, expand = c(0.05, 0.1), confint = FALSE, palcol = "Set2", yscl = "auto", sumfun = yscl, colleg = FALSE, ttlsize = 1.2, bssize = 11, runchk = TRUE, warn = TRUE )
res |
character string of path to the results file or |
param |
character string of the parameter to plot, must conform to entries in the |
acc |
character string of path to the data quality objectives file for accuracy or |
sit |
optional character string of path to the site metadata file or |
fset |
optional list of inputs with elements named |
thresh |
character indicating if relevant freshwater or marine threshold lines are included, one of |
group |
character indicating whether the results are grouped by site (default), combined across location groups, or combined across sites, see details |
threshlab |
optional character string indicating legend label for the threshold, required only if |
threshcol |
character indicating color of threshold lines if available |
site |
character string of sites to include, default all |
resultatt |
character string of result attributes to plot, default all |
locgroup |
character string of location groups to plot from the |
dtrng |
character string of length two for the date ranges as YYYY-MM-DD, default all |
ptsize |
numeric indicating size of the points |
repel |
logical indicating if overlapping site labels are offset, default |
labsize |
numeric indicating font size for the site labels, only if |
expand |
numeric of length two indicating expansion proportions on the x-axis to include labels outside of the plot range if |
confint |
logical indicating if confidence intervals are shown, only applies if data are summarized using |
palcol |
character string indicating the color palette for points and lines from RColorBrewer, see details |
yscl |
character indicating one of |
sumfun |
character indicating one of |
colleg |
logical indicating if a color legend for sites or location groups is included if |
ttlsize |
numeric value indicating font size of the title relative to other text in the plot |
bssize |
numeric for overall plot text scaling, passed to |
runchk |
logical to run data checks with |
warn |
logical to return warnings to the console (default) |
Results are shown for the selected parameter as continuous line plots over time. Specifying group = "site"
plot a separate line for each site. Specifying group = "locgroup"
will summarize results across sites in the locgroup
argument based on the value passed to sumfun
or yscl
if no value is passed to sumfun
. The site metadata file must be passed to the `sit`
argument to use this option. Specifying group = "all"
will summarize results across sites for each date based on the value passed to sumfun
or yscl
if no value is passed to sumfun
. Summarized results will include confidence intervals if confint = TRUE
and they can be calculated (i.e., more than one point is used in the summary and data are summarized using group
as "locgroup"
or "all"
).
Threshold lines applicable to marine or freshwater environments can be included in the plot by using the thresh
argument. These thresholds are specific to each parameter and can be found in the thresholdMWR
file. Threshold lines are plotted only for those parameters with entries in thresholdMWR
and only if the value in `Result Unit`
matches those in thresholdMWR
. The threshold lines can be suppressed by setting thresh = 'none'
. A user-supplied numeric value can also be used for the thresh
argument to override the default values. An appropriate label must also be supplied to threshlab
if thresh
is numeric.
Any acceptable color palette for from RColorBrewer for the points and lines can be used for palcol
, which is passed to the palette
argument in scale_color_brewer
. These could include any of the qualitative color palettes, e.g., "Set1"
, "Set2"
, etc. The continuous and diverging palettes will also work, but may return color scales for points and lines that are difficult to distinguish. The palcol
argument does not apply if group = "all"
.
The y-axis scaling as arithmetic (linear) or logarithmic can be set with the yscl
argument. If yscl = "auto"
(default), the scaling is determined automatically from the data quality objective file for accuracy, i.e., parameters with "log" in any of the columns are plotted on log10-scale, otherwise arithmetic. Setting yscl = "linear"
or yscl = "log"
will set the axis as linear or log10-scale, respectively, regardless of the information in the data quality objective file for accuracy.
Similarly, the data will be summarized appropriately for group
(only applies if group
is not site) based on the value passed to sumfun
. The default if no value is provided to sumfun
is to use the appropriate summary based on the value provided to yscl
. If yscl = "auto"
(default), then sumfun = "auto"
, and the mean or geometric mean is used for the summary based on information in the data quality objective file for accuracy. Using yscl = "linear"
or yscl = "log"
will default to the mean or geometric mean summary if no value is provided to sumfun
. Any other appropriate value passed to sumfun
will override the value passed to yscl
. Valid summary functions for sumfun
include "auto"
, "mean"
, "geomean"
, "median"
, "min"
, or "max"
).
Any entries in resdat
in the "Result Value"
column as "BDL"
or "AQL"
are replaced with appropriate values in the "Quantitation Limit"
column, if present, otherwise the "MDL"
or "UQL"
columns from the data quality objectives file for accuracy are used. Values as "BDL"
use one half of the appropriate limit.
A ggplot
object that can be further modified.
# results data path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # accuracy data accdat <- readMWRacc(accpth) # site data path sitpth <- system.file('extdata/ExampleSites.xlsx', package = 'MassWateR') # site data sitdat <- readMWRsites(sitpth) # select sites anlzMWRdate(res = resdat, param = 'DO', acc = accdat, group = 'site', thresh = 'fresh', site = c("ABT-026", "ABT-077"))
# results data path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # accuracy data accdat <- readMWRacc(accpth) # site data path sitpth <- system.file('extdata/ExampleSites.xlsx', package = 'MassWateR') # site data sitdat <- readMWRsites(sitpth) # select sites anlzMWRdate(res = resdat, param = 'DO', acc = accdat, group = 'site', thresh = 'fresh', site = c("ABT-026", "ABT-077"))
Analyze results with maps
anlzMWRmap( res = NULL, param, acc = NULL, sit = NULL, fset = NULL, site = NULL, resultatt = NULL, locgroup = NULL, dtrng = NULL, ptsize = 4, repel = TRUE, labsize = 3, palcol = "Greens", palcolrev = FALSE, sumfun = "auto", crs = 4326, zoom = 11, addwater = "medium", watercol = "lightblue", maptype = NULL, buffdist = 2, scaledist = "km", northloc = "tl", scaleloc = "br", latlon = TRUE, ttlsize = 1.2, bssize = 11, runchk = TRUE, warn = TRUE )
anlzMWRmap( res = NULL, param, acc = NULL, sit = NULL, fset = NULL, site = NULL, resultatt = NULL, locgroup = NULL, dtrng = NULL, ptsize = 4, repel = TRUE, labsize = 3, palcol = "Greens", palcolrev = FALSE, sumfun = "auto", crs = 4326, zoom = 11, addwater = "medium", watercol = "lightblue", maptype = NULL, buffdist = 2, scaledist = "km", northloc = "tl", scaleloc = "br", latlon = TRUE, ttlsize = 1.2, bssize = 11, runchk = TRUE, warn = TRUE )
res |
character string of path to the results file or |
param |
character string of the parameter to plot, must conform to entries in the |
acc |
character string of path to the data quality objectives file for accuracy or |
sit |
character string of path to the site metadata file or |
fset |
optional list of inputs with elements named |
site |
character string of sites to include, default all |
resultatt |
character string of result attributes to plot, default all |
locgroup |
character string of location groups to plot from the |
dtrng |
character string of length two for the date ranges as YYYY-MM-DD, default all |
ptsize |
numeric for size of the points, use a negative value to omit the points |
repel |
logical indicating if overlapping site labels are offset |
labsize |
numeric for size of the site labels |
palcol |
character string indicating the color palette to be used from RColorBrewer, see details |
palcolrev |
logical indicating if color palette in |
sumfun |
character indicating one of |
crs |
numeric as a four-digit EPSG number for the coordinate reference system, see details |
zoom |
numeric indicating resolution of the base map, see details |
addwater |
character string as |
watercol |
character string of color for water objects if |
maptype |
character string indicating the basemap type, see details |
buffdist |
numeric for buffer around the bounding box for the selected sites in kilometers, see details |
scaledist |
character string indicating distance unit for the scale bar, |
northloc |
character string indicating location of the north arrow, see details |
scaleloc |
character string indicating location of the scale bar, see details |
latlon |
logical to include latitude and longitude labels on the plot, default |
ttlsize |
numeric value indicating font size of the title relative to other text in the plot |
bssize |
numeric for overall plot text scaling, passed to |
runchk |
logical to run data checks with |
warn |
logical to return warnings to the console (default) |
This function creates a map of summarized results for a selected parameter at each monitoring site. By default, all dates for the parameter are averaged. Options to filter by site, date range, and result attribute are provided. Only sites with spatial information in the site metadata file are plotted and a warning is returned for those that do not have this information. The site labels are also plotted next to each point. The labels can be suppressed by setting labsize = NULL
.
Any acceptable color palette from RColorBrewer can be used for palcol
, which is passed to the palette
argument in scale_fill_distiller
. These could include any of the sequential color palettes, e.g., "Greens"
, "Blues"
, etc. The diverging and qualitative palettes will also work, but may return uninterpretable color scales. The palette can be reversed by setting palcolrev = TRUE
.
The default value for crs
is EPSG 4326 for the WGS 84 projection in decimal degrees. The crs
argument is passed to st_as_sf
and any acceptable CRS appropriate for the data can be used.
The results shown on the map represent the parameter summary for each site within the date range provided by dtrng
. If sumfun = "auto"
(default), the mean is used where the distribution is determined automatically from the data quality objective file for accuracy, i.e., parameters with "log" in any of the columns are summarized with the geometric mean, otherwise arithmetic. Any other valid summary function will be applied if passed to sumfun
("mean"
, "geomean"
, "median"
, "min"
, "max"
), regardless of the information in the data quality objective file for accuracy.
Using addwater = "medium"
(default) will include lines and polygons of natural water bodies defined using the National Hydrography Dataset (NHD). The level of detail can be changed to low or high using addwater = "low"
or addwater = "high"
, respectively. Use addwater = NULL
to not show any water features.
A base map can be plotted using the maptype
argument. The zoom
value specifies the resolution of the map. Use higher values to download map tiles with greater resolution, although this increases the download time. The maptype
argument describes the type of base map to download. Acceptable options include "OpenStreetMap"
, "OpenStreetMap.DE"
, "OpenStreetMap.France"
, "OpenStreetMap.HOT"
, "OpenTopoMap"
, "Esri.WorldStreetMap"
, "Esri.DeLorme"
, "Esri.WorldTopoMap"
, "Esri.WorldImagery"
, "Esri.WorldTerrain"
, "Esri.WorldShadedRelief"
, "Esri.OceanBasemap"
, "Esri.NatGeoWorldMap"
, "Esri.WorldGrayCanvas"
, "CartoDB.Positron"
, "CartoDB.PositronNoLabels"
, "CartoDB.PositronOnlyLabels"
, "CartoDB.DarkMatter"
, "CartoDB.DarkMatterNoLabels"
, "CartoDB.DarkMatterOnlyLabels"
, "CartoDB.Voyager"
, "CartoDB.VoyagerNoLabels"
, or "CartoDB.VoyagerOnlyLabels"
. Use maptype = NULL
to suppress the base map.
The area around the summarized points can be increased or decreased using the buffdist
argument. This creates a buffered area around the bounding box for the points, where the units are kilometers.
A north arrow and scale bar are also placed on the map as defined by the northloc
and scaleloc
arguments. The placement for both can be chosen as "tl"
, "tr"
, "bl"
, or "br"
for top-left, top-right, bottom-left, or bottom-right respectively. Setting either of the arguments to NULL
will suppress the placement on the map.
A ggplot
object that can be further modified.
# results data path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # accuracy data accdat <- readMWRacc(accpth) # site data path sitpth <- system.file('extdata/ExampleSites.xlsx', package = 'MassWateR') # site data sitdat <- readMWRsites(sitpth) # map with NHD water bodies anlzMWRmap(res = resdat, param = 'DO', acc = accdat, sit = sitdat, addwater = 'medium')
# results data path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # accuracy data accdat <- readMWRacc(accpth) # site data path sitpth <- system.file('extdata/ExampleSites.xlsx', package = 'MassWateR') # site data sitdat <- readMWRsites(sitpth) # map with NHD water bodies anlzMWRmap(res = resdat, param = 'DO', acc = accdat, sit = sitdat, addwater = 'medium')
Analyze outliers in results file
anlzMWRoutlier( res = NULL, param, acc = NULL, fset = NULL, type = c("box", "jitterbox", "jitter"), group, dtrng = NULL, repel = TRUE, outliers = FALSE, labsize = 3, fill = "lightgrey", alpha = 0.8, width = 0.8, yscl = "auto", ttlsize = 1.2, bssize = 11, runchk = TRUE, warn = TRUE )
anlzMWRoutlier( res = NULL, param, acc = NULL, fset = NULL, type = c("box", "jitterbox", "jitter"), group, dtrng = NULL, repel = TRUE, outliers = FALSE, labsize = 3, fill = "lightgrey", alpha = 0.8, width = 0.8, yscl = "auto", ttlsize = 1.2, bssize = 11, runchk = TRUE, warn = TRUE )
res |
character string of path to the results file or |
param |
character string of the parameter to plot, must conform to entries in the |
acc |
character string of path to the data quality objectives file for accuracy or |
fset |
optional list of inputs with elements named |
type |
character indicating |
group |
character indicating whether the summaries are grouped by month, site, or week of year |
dtrng |
character string of length two for the date ranges as YYYY-MM-DD, optional |
repel |
logical indicating if overlapping outlier labels are offset |
outliers |
logical indicating if outliers are returned to the console instead of plotting |
labsize |
numeric indicating font size for the outlier labels |
fill |
numeric indicating fill color for boxplots |
alpha |
numeric from 0 to 1 indicating transparency of fill color |
width |
numeric for width of boxplots |
yscl |
character indicating one of |
ttlsize |
numeric value indicating font size of the title relative to other text in the plot |
bssize |
numeric for overall plot text scaling, passed to |
runchk |
logical to run data checks with |
warn |
logical to return warnings to the console (default) |
Outliers are defined following the standard ggplot
definition as 1.5 times the inter-quartile range of each boxplot. The data frame returned if outliers = TRUE
may vary based on the boxplot groupings defined by group
.
Specifying type = "box"
(default) will produce standard boxplots. Specifying type = "jitterbox"
will produce boxplots with non-outlier observations jittered on top. Specifying type = "jitter"
will suppress the boxplots and show only the jittered points and the outliers.
Specifying group = "week"
will group the samples by week of year using an integer specifying the week. Note that there can be no common month/day indicating the start of the week between years and an integer is the only way to compare summaries if the results data span multiple years.
The y-axis scaling as arithmetic (linear) or logarithmic can be set with the yscl
argument. If yscl = "auto"
(default), the scaling is determined automatically from the data quality objective file for accuracy, i.e., parameters with "log" in any of the columns are plotted on log10-scale, otherwise arithmetic. Setting yscl = "linear"
or yscl = "log"
will set the axis as linear or log10-scale, respectively, regardless of the information in the data quality objective file for accuracy.
Any entries in resdat
in the "Result Value"
column as "BDL"
or "AQL"
are replaced with appropriate values in the "Quantitation Limit"
column, if present, otherwise the "MDL"
or "UQL"
columns from the data quality objectives file for accuracy are used. Values as "BDL"
use one half of the appropriate limit.
A ggplot
object that can be further modified if outliers = FALSE
, otherwise a data frame of outliers is returned.
# results data path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # accuracy data accdat <- readMWRacc(accpth) # outliers by month anlzMWRoutlier(res = resdat, param = 'DO', acc = accdat, group = 'month') # outliers by site anlzMWRoutlier(res = resdat, param = 'DO', acc = accdat, group = 'site') # outliers by site, May through July 2021 only anlzMWRoutlier(res = resdat, param = 'DO', acc = accdat, group = 'site', dtrng = c('2022-05-01', '2022-07-31')) # outliers by month, type as jitterbox anlzMWRoutlier(res = resdat, param = 'DO', acc = accdat, group = 'month', type = 'jitterbox') # outliers by month, type as jitter anlzMWRoutlier(res = resdat, param = 'DO', acc = accdat, group = 'month', type = 'jitter') # data frame output anlzMWRoutlier(res = resdat, param = 'DO', acc = accdat, group = 'month', outliers = TRUE)
# results data path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # accuracy data accdat <- readMWRacc(accpth) # outliers by month anlzMWRoutlier(res = resdat, param = 'DO', acc = accdat, group = 'month') # outliers by site anlzMWRoutlier(res = resdat, param = 'DO', acc = accdat, group = 'site') # outliers by site, May through July 2021 only anlzMWRoutlier(res = resdat, param = 'DO', acc = accdat, group = 'site', dtrng = c('2022-05-01', '2022-07-31')) # outliers by month, type as jitterbox anlzMWRoutlier(res = resdat, param = 'DO', acc = accdat, group = 'month', type = 'jitterbox') # outliers by month, type as jitter anlzMWRoutlier(res = resdat, param = 'DO', acc = accdat, group = 'month', type = 'jitter') # data frame output anlzMWRoutlier(res = resdat, param = 'DO', acc = accdat, group = 'month', outliers = TRUE)
Analyze outliers in results file for all parameters
anlzMWRoutlierall( res = NULL, acc = NULL, fset = NULL, fig_height = 4, fig_width = 8, format = c("word", "png", "zip"), output_dir, output_file = NULL, type = c("box", "jitterbox", "jitter"), group, dtrng = NULL, repel = TRUE, outliers = FALSE, labsize = 3, fill = "lightgrey", alpha = 0.8, width = 0.8, yscl = "auto", ttlsize = 1.2, bssize = 11, runchk = TRUE, warn = TRUE )
anlzMWRoutlierall( res = NULL, acc = NULL, fset = NULL, fig_height = 4, fig_width = 8, format = c("word", "png", "zip"), output_dir, output_file = NULL, type = c("box", "jitterbox", "jitter"), group, dtrng = NULL, repel = TRUE, outliers = FALSE, labsize = 3, fill = "lightgrey", alpha = 0.8, width = 0.8, yscl = "auto", ttlsize = 1.2, bssize = 11, runchk = TRUE, warn = TRUE )
res |
character string of path to the results file or |
acc |
character string of path to the data quality objectives file for accuracy or |
fset |
optional list of inputs with elements named |
fig_height |
numeric for plot heights in inches |
fig_width |
numeric for plot width in inches |
format |
character string indicating if results are placed in a word file, as separate png files, or as a zipped file of separate png files in |
output_dir |
character string of the output directory for the results |
output_file |
optional character string for the file name if |
type |
character indicating |
group |
character indicating whether the summaries are grouped by month, site, or week of year |
dtrng |
character string of length two for the date ranges as YYYY-MM-DD, optional |
repel |
logical indicating if overlapping outlier labels are offset |
outliers |
logical indicating if outliers are returned to the console instead of plotting |
labsize |
numeric indicating font size for the outlier labels |
fill |
numeric indicating fill color for boxplots |
alpha |
numeric from 0 to 1 indicating transparency of fill color |
width |
numeric for width of boxplots |
yscl |
character indicating one of |
ttlsize |
numeric value indicating font size of the title relative to other text in the plot |
bssize |
numeric for overall plot text scaling, passed to |
runchk |
logical to run data checks with |
warn |
logical to return warnings to the console (default) |
This function is a wrapper to anlzMWRoutlier
to create plots for all parameters with appropriate data in the water quality monitoring results
A word document named outlierall.docx
(or name passed to output_file
) if format = "word"
or separate png files for each parameter if format = "png"
will be saved in the directory specified by output_dir
# results data path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # accuracy data accdat <- readMWRacc(accpth) # create word output anlzMWRoutlierall(resdat, accdat, group = 'month', format = 'word', output_dir = tempdir()) # create png output anlzMWRoutlierall(resdat, accdat, group = 'month', format = 'png', output_dir = tempdir()) # create zipped png output anlzMWRoutlierall(resdat, accdat, group = 'month', format = 'zip', output_dir = tempdir())
# results data path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # accuracy data accdat <- readMWRacc(accpth) # create word output anlzMWRoutlierall(resdat, accdat, group = 'month', format = 'word', output_dir = tempdir()) # create png output anlzMWRoutlierall(resdat, accdat, group = 'month', format = 'png', output_dir = tempdir()) # create zipped png output anlzMWRoutlierall(resdat, accdat, group = 'month', format = 'zip', output_dir = tempdir())
Analyze seasonal trends in results file
anlzMWRseason( res = NULL, param, acc = NULL, sit = NULL, fset = NULL, thresh, group = c("month", "week"), type = c("box", "jitterbox", "bar", "jitterbar", "jitter"), threshlab = NULL, threshcol = "tan", site = NULL, resultatt = NULL, locgroup = NULL, dtrng = NULL, confint = FALSE, fill = "lightblue", alpha = 0.8, width = 0.8, yscl = "auto", sumfun = yscl, ttlsize = 1.2, bssize = 11, runchk = TRUE, warn = TRUE )
anlzMWRseason( res = NULL, param, acc = NULL, sit = NULL, fset = NULL, thresh, group = c("month", "week"), type = c("box", "jitterbox", "bar", "jitterbar", "jitter"), threshlab = NULL, threshcol = "tan", site = NULL, resultatt = NULL, locgroup = NULL, dtrng = NULL, confint = FALSE, fill = "lightblue", alpha = 0.8, width = 0.8, yscl = "auto", sumfun = yscl, ttlsize = 1.2, bssize = 11, runchk = TRUE, warn = TRUE )
res |
character string of path to the results file or |
param |
character string of the parameter to plot, must conform to entries in the |
acc |
character string of path to the data quality objectives file for accuracy or |
sit |
optional character string of path to the site metadata file or |
fset |
optional list of inputs with elements named |
thresh |
character indicating if relevant freshwater or marine threshold lines are included, one of |
group |
character indicating whether the summaries are grouped by month (default) or week of year |
type |
character indicating |
threshlab |
optional character string indicating legend label for the threshold, required only if |
threshcol |
character indicating color of threshold lines if available |
site |
character string of sites to include, default all |
resultatt |
character string of result attributes to plot, default all |
locgroup |
character string of location groups to plot from the |
dtrng |
character string of length two for the date ranges as YYYY-MM-DD, default all |
confint |
logical indicating if confidence intervals are shown, only applies if |
fill |
numeric indicating fill color for boxplots or barplots |
alpha |
numeric from 0 to 1 indicating transparency of fill color |
width |
numeric for width of boxplots or barplots |
yscl |
character indicating one of |
sumfun |
character indicating one of |
ttlsize |
numeric value indicating font size of the title relative to other text in the plot |
bssize |
numeric for overall plot text scaling, passed to |
runchk |
logical to run data checks with |
warn |
logical to return warnings to the console (default) |
Summaries of a parameter are shown as boxplots if type = "box"
or as barplots if type = "bar"
. Points can be jittered over the boxplots by setting type = "jitterbox"
or jittered over the barplots by setting type = "jitterbar"
. Setting type = "jitter"
will show only the jittered points. For type = "bar"
or type = "jitterbar"
, 95% confidence intervals can also be shown if confint = TRUE
and they can be estimated (i.e., more than one result value per bar and sumfun
is "auto"
, "mean"
, or "geomean"
).
Specifying group = "week"
will group the samples by week of year using an integer specifying the week. Note that there can be no common month/day indicating the start of the week between years and an integer is the only way to compare summaries if the results data span multiple years.
Threshold lines applicable to marine or freshwater environments can be included in the plot by using the thresh
argument. These thresholds are specific to each parameter and can be found in the thresholdMWR
file. Threshold lines are plotted only for those parameters with entries in thresholdMWR
and only if the value in `Result Unit`
matches those in thresholdMWR
. The threshold lines can be suppressed by setting thresh = 'none'
. A user-supplied numeric value can also be used for the thresh
argument to override the default values. An appropriate label must also be supplied to threshlab
if thresh
is numeric.
The y-axis scaling as arithmetic (linear) or logarithmic can be set with the yscl
argument. If yscl = "auto"
(default), the scaling is determined automatically from the data quality objective file for accuracy, i.e., parameters with "log" in any of the columns are plotted on log10-scale, otherwise arithmetic. Setting yscl = "linear"
or yscl = "log"
will set the axis as linear or log10-scale, respectively, regardless of the information in the data quality objective file for accuracy.
Similarly, the data will be summarized if type
is "bar"
or "jitterbar"
based on the value passed to sumfun
. The default if no value is provided to sumfun
is to use the appropriate summary based on the value provided to yscl
. If yscl = "auto"
(default), then sumfun = "auto"
, and the mean or geometric mean is used for the summary based on information in the data quality objective file for accuracy. Using yscl = "linear"
or yscl = "log"
will default to the mean or geometric mean summary if no value is provided to sumfun
. Any other appropriate value passed to sumfun
will override the value passed to yscl
. Valid summary functions for sumfun
include "auto"
, "mean"
, "geomean"
, "median"
, "min"
, or "max"
).
Any entries in resdat
in the "Result Value"
column as "BDL"
or "AQL"
are replaced with appropriate values in the "Quantitation Limit"
column, if present, otherwise the "MDL"
or "UQL"
columns from the data quality objectives file for accuracy are used. Values as "BDL"
use one half of the appropriate limit.
A ggplot
object that can be further modified.
# results data path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # accuracy data accdat <- readMWRacc(accpth) # site data path sitpth <- system.file('extdata/ExampleSites.xlsx', package = 'MassWateR') # site data sitdat <- readMWRsites(sitpth) # seasonal trends by month, boxplot anlzMWRseason(res = resdat, param = 'DO', acc = accdat, thresh = 'fresh', group = 'month', type = 'box') # seasonal trends by week, boxplot anlzMWRseason(res = resdat, param = 'DO', acc = accdat, thresh = 'fresh', group = 'week', type = 'box') # seasonal trends by month, May to July only anlzMWRseason(res = resdat, param = 'DO', acc = accdat, thresh = 'fresh', group = 'month', type = 'bar', dtrng = c('2022-05-01', '2022-07-31')) # seasonal trends by month, barplot anlzMWRseason(res = resdat, param = 'DO', acc = accdat, thresh = 'fresh', group = 'month', type = 'bar') # seasonal trends by week, barplot anlzMWRseason(res = resdat, param = 'DO', acc = accdat, thresh = 'fresh', group = 'week', type = 'bar') # seasonal trends by location group, requires sitdat anlzMWRseason(res = resdat, param = 'DO', acc = accdat, sit = sitdat, thresh = 'fresh', group = 'month', type = 'box', locgroup = 'Assabet')
# results data path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # accuracy data accdat <- readMWRacc(accpth) # site data path sitpth <- system.file('extdata/ExampleSites.xlsx', package = 'MassWateR') # site data sitdat <- readMWRsites(sitpth) # seasonal trends by month, boxplot anlzMWRseason(res = resdat, param = 'DO', acc = accdat, thresh = 'fresh', group = 'month', type = 'box') # seasonal trends by week, boxplot anlzMWRseason(res = resdat, param = 'DO', acc = accdat, thresh = 'fresh', group = 'week', type = 'box') # seasonal trends by month, May to July only anlzMWRseason(res = resdat, param = 'DO', acc = accdat, thresh = 'fresh', group = 'month', type = 'bar', dtrng = c('2022-05-01', '2022-07-31')) # seasonal trends by month, barplot anlzMWRseason(res = resdat, param = 'DO', acc = accdat, thresh = 'fresh', group = 'month', type = 'bar') # seasonal trends by week, barplot anlzMWRseason(res = resdat, param = 'DO', acc = accdat, thresh = 'fresh', group = 'week', type = 'bar') # seasonal trends by location group, requires sitdat anlzMWRseason(res = resdat, param = 'DO', acc = accdat, sit = sitdat, thresh = 'fresh', group = 'month', type = 'box', locgroup = 'Assabet')
Analyze data by sites in results file
anlzMWRsite( res = NULL, param, acc = NULL, sit = NULL, fset = NULL, type = c("box", "jitterbox", "bar", "jitterbar", "jitter"), thresh, threshlab = NULL, threshcol = "tan", site = NULL, resultatt = NULL, locgroup = NULL, dtrng = NULL, confint = FALSE, fill = "lightgreen", alpha = 0.8, width = 0.8, yscl = "auto", sumfun = yscl, byresultatt = FALSE, ttlsize = 1.2, bssize = 11, runchk = TRUE, warn = TRUE )
anlzMWRsite( res = NULL, param, acc = NULL, sit = NULL, fset = NULL, type = c("box", "jitterbox", "bar", "jitterbar", "jitter"), thresh, threshlab = NULL, threshcol = "tan", site = NULL, resultatt = NULL, locgroup = NULL, dtrng = NULL, confint = FALSE, fill = "lightgreen", alpha = 0.8, width = 0.8, yscl = "auto", sumfun = yscl, byresultatt = FALSE, ttlsize = 1.2, bssize = 11, runchk = TRUE, warn = TRUE )
res |
character string of path to the results file or |
param |
character string of the parameter to plot, must conform to entries in the |
acc |
character string of path to the data quality objectives file for accuracy or |
sit |
optional character string of path to the site metadata file or |
fset |
optional list of inputs with elements named |
type |
character indicating |
thresh |
character indicating if relevant freshwater or marine threshold lines are included, one of |
threshlab |
optional character string indicating legend label for the threshold, required only if |
threshcol |
character indicating color of threshold lines if available |
site |
character string of sites to include, default all |
resultatt |
character string of result attributes to plot, default all |
locgroup |
character string of location groups to plot from the |
dtrng |
character string of length two for the date ranges as YYYY-MM-DD, default all |
confint |
logical indicating if confidence intervals are shown, only applies if |
fill |
numeric indicating fill color for boxplots or barplots |
alpha |
numeric from 0 to 1 indicating transparency of fill color |
width |
numeric for width of boxplots or barplots |
yscl |
character indicating one of |
sumfun |
character indicating one of |
byresultatt |
logical indicating if the plot has sites grouped separately by result attributes, see details |
ttlsize |
numeric value indicating font size of the title relative to other text in the plot |
bssize |
numeric for overall plot text scaling, passed to |
runchk |
logical to run data checks with |
warn |
logical to return warnings to the console (default) |
Summaries of a parameter for each site are shown as boxplots if type = "box"
or as barplots if type = "bar"
. Points can be jittered over the boxplots by setting type = "jitterbox"
or jittered over the barplots by setting type = "jitterbar"
. Setting type = "jitter"
will show only the jittered points. For type = "bar"
or type = "jitterbar"
, 95% confidence intervals can also be shown if confint = TRUE
and they can be estimated (i.e., more than one result value per bar and sumfun
is "auto"
, "mean"
, or "geomean"
).
Threshold lines applicable to marine or freshwater environments can be included in the plot by using the thresh
argument. These thresholds are specific to each parameter and can be found in the thresholdMWR
file. Threshold lines are plotted only for those parameters with entries in thresholdMWR
and only if the value in `Result Unit`
matches those in thresholdMWR
. The threshold lines can be suppressed by setting thresh = 'none'
. A user-supplied numeric value can also be used for the thresh
argument to override the default values. An appropriate label must also be supplied to threshlab
if thresh
is numeric.
The y-axis scaling as arithmetic (linear) or logarithmic can be set with the yscl
argument. If yscl = "auto"
(default), the scaling is determined automatically from the data quality objective file for accuracy, i.e., parameters with "log" in any of the columns are plotted on log10-scale, otherwise arithmetic. Setting yscl = "linear"
or yscl = "log"
will set the axis as linear or log10-scale, respectively, regardless of the information in the data quality objective file for accuracy.
Similarly, the data will be summarized if type
is "bar"
or "jitterbar"
based on the value passed to sumfun
. The default if no value is provided to sumfun
is to use the appropriate summary based on the value provided to yscl
. If yscl = "auto"
(default), then sumfun = "auto"
, and the mean or geometric mean is used for the summary based on information in the data quality objective file for accuracy. Using yscl = "linear"
or yscl = "log"
will default to the mean or geometric mean summary if no value is provided to sumfun
. Any other appropriate value passed to sumfun
will override the value passed to yscl
. Valid summary functions for sumfun
include "auto"
, "mean"
, "geomean"
, "median"
, "min"
, or "max"
).
Any entries in resdat
in the "Result Value"
column as "BDL"
or "AQL"
are replaced with appropriate values in the "Quantitation Limit"
column, if present, otherwise the "MDL"
or "UQL"
columns from the data quality objectives file for accuracy are used. Values as "BDL"
use one half of the appropriate limit.
The byresultatt
argument can be used to group sites separately by result attributes. For example, sites with E. coli samples can be grouped by "Dry"
or "Wet"
conditions if present in the "Result Attribute"
column. Filtering by sites first using the site
argument is advised to reduce the amount of data that are plotted. The grouping can be filtered further by passing appropriate values in the "Result Attribute"
column to the resultatt
argument. Note that specifying result attributes with resultatt
and setting byresultatt = FALSE
will filter the plot data by the result attributes but will not plot the results separately.
A ggplot
object that can be further modified.
# results data path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # accuracy data accdat <- readMWRacc(accpth) # site data path sitpth <- system.file('extdata/ExampleSites.xlsx', package = 'MassWateR') # site data sitdat <- readMWRsites(sitpth) # site trends, boxplot anlzMWRsite(res = resdat, param = 'DO', acc = accdat, type = 'box', thresh = 'fresh') # site trends, barplot anlzMWRsite(res = resdat, param = 'DO', acc = accdat, type = 'bar', thresh = 'fresh') # site trends, May to July only anlzMWRsite(res = resdat, param = 'DO', acc = accdat, type = 'box', thresh = 'fresh', dtrng = c('2022-05-01', '2022-07-31')) # grouping by result attribute anlzMWRsite(res = resdat, param = 'DO', acc = accdat, type = 'box', thresh = 'fresh', site = c('ABT-062', 'ABT-077'), byresultatt = TRUE) # site trends by location group, requires sitdat anlzMWRsite(res = resdat, param = 'DO', acc = accdat, sit = sitdat, type = 'box', thresh = 'fresh', locgroup = 'Assabet')
# results data path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # accuracy data accdat <- readMWRacc(accpth) # site data path sitpth <- system.file('extdata/ExampleSites.xlsx', package = 'MassWateR') # site data sitdat <- readMWRsites(sitpth) # site trends, boxplot anlzMWRsite(res = resdat, param = 'DO', acc = accdat, type = 'box', thresh = 'fresh') # site trends, barplot anlzMWRsite(res = resdat, param = 'DO', acc = accdat, type = 'bar', thresh = 'fresh') # site trends, May to July only anlzMWRsite(res = resdat, param = 'DO', acc = accdat, type = 'box', thresh = 'fresh', dtrng = c('2022-05-01', '2022-07-31')) # grouping by result attribute anlzMWRsite(res = resdat, param = 'DO', acc = accdat, type = 'box', thresh = 'fresh', site = c('ABT-062', 'ABT-077'), byresultatt = TRUE) # site trends by location group, requires sitdat anlzMWRsite(res = resdat, param = 'DO', acc = accdat, sit = sitdat, type = 'box', thresh = 'fresh', locgroup = 'Assabet')
Check data quality objective accuracy data
checkMWRacc(accdat, warn = TRUE)
checkMWRacc(accdat, warn = TRUE)
accdat |
input data frame |
warn |
logical to return warnings to the console (default) |
This function is used internally within readMWRacc
to run several checks on the input data for completeness and conformance to WQX requirements
The following checks are made:
Column name spelling: Should be the following: Parameter, uom, MDL, UQL, Value Range, Field Duplicate, Lab Duplicate, Field Blank, Lab Blank, Spike/Check Accuracy
Columns present: All columns from the previous check should be present
Column types: All columns should be characters/text, except for MDL and UQL
Value Range
column na check: The character string "na"
should not be in the Value Range
column, "all"
should be used if the entire range applies
Unrecognized characters: Fields describing accuracy checks should not include symbols or text other than ,
,
,
,
,
,
,
"%"
, "BDL"
, "AQL"
, "log"
, or "all"
Overlap in Value Range
column: Entries in Value Range
should not overlap for a parameter (excludes ascending ranges)
Gap in Value Range
column: Entries in Value Range
should not include a gap for a parameter, warning only
Parameter: Should match parameter names in the Simple Parameter
or WQX Parameter
columns of the paramsMWR
data
Units: No missing entries in units (uom
), except pH which can be blank
Single unit: Each unique Parameter
should have only one type for the units (uom
)
Correct units: Each unique Parameter
should have an entry in the units (uom
) that matches one of the acceptable values in the Units of measure
column of the paramsMWR
data
Empty columns: Columns with all missing or NA values will return a warning
accdat
is returned as is if no errors are found, otherwise an informative error message is returned prompting the user to make the required correction to the raw data before proceeding.
# accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # accuracy data with no checks accdat <- readxl::read_excel(accpth, na = c('NA', ''), col_types = 'text') accdat <- dplyr::mutate(accdat, dplyr::across(-c(`Value Range`), ~ dplyr::na_if(.x, 'na'))) checkMWRacc(accdat)
# accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # accuracy data with no checks accdat <- readxl::read_excel(accpth, na = c('NA', ''), col_types = 'text') accdat <- dplyr::mutate(accdat, dplyr::across(-c(`Value Range`), ~ dplyr::na_if(.x, 'na'))) checkMWRacc(accdat)
Check data quality objective frequency and completeness data
checkMWRfrecom(frecomdat, warn = TRUE)
checkMWRfrecom(frecomdat, warn = TRUE)
frecomdat |
input data frame |
warn |
logical to return warnings to the console (default) |
This function is used internally within readMWRfrecom
to run several checks on the input data for frequency and completeness and conformance to WQX requirements
The following checks are made:
Column name spelling: Should be the following: Parameter, Field Duplicate, Lab Duplicate, Field Blank, Lab Blank, Spike/Check Accuracy, % Completeness
Columns present: All columns from the previous check should be present
Non-numeric values: Values entered in columns other than the first should be numeric
Values outside of 0 - 100: Values entered in columns other than the first should not be outside of 0 and 100
Parameter: Should match parameter names in the Simple Parameter
or WQX Parameter
columns of the paramsMWR
data
Empty columns: Columns with all missing or NA values will return a warning
frecomdat
is returned as is if no errors are found, otherwise an informative error message is returned prompting the user to make the required correction to the raw data before proceeding.
library(dplyr) frecompth <- system.file('extdata/ExampleDQOFrequencyCompleteness.xlsx', package = 'MassWateR') frecomdat <- suppressMessages(readxl::read_excel(frecompth, skip = 1, na = c('NA', 'na', ''), col_types = c('text', 'numeric', 'numeric', 'numeric', 'numeric', 'numeric', 'numeric') )) %>% rename(`% Completeness` = `...7`) checkMWRfrecom(frecomdat)
library(dplyr) frecompth <- system.file('extdata/ExampleDQOFrequencyCompleteness.xlsx', package = 'MassWateR') frecomdat <- suppressMessages(readxl::read_excel(frecompth, skip = 1, na = c('NA', 'na', ''), col_types = c('text', 'numeric', 'numeric', 'numeric', 'numeric', 'numeric', 'numeric') )) %>% rename(`% Completeness` = `...7`) checkMWRfrecom(frecomdat)
Check water quality monitoring results
checkMWRresults(resdat, warn = TRUE)
checkMWRresults(resdat, warn = TRUE)
resdat |
input data frame for results |
warn |
logical to return warnings to the console (default) |
This function is used internally within readMWRresults
to run several checks on the input data for completeness and conformance to WQX requirements.
The following checks are made:
Column name spelling: Should be the following: Monitoring Location ID, Activity Type, Activity Start Date, Activity Start Time, Activity Depth/Height Measure, Activity Depth/Height Unit, Activity Relative Depth Name, Characteristic Name, Result Value, Result Unit, Quantitation Limit, QC Reference Value, Result Measure Qualifier, Result Attribute, Sample Collection Method ID, Project ID, Local Record ID, Result Comment
Columns present: All columns from the previous check should be present
Activity Type: Should be one of Field Msr/Obs, Sample-Routine, Quality Control Sample-Field Blank, Quality Control Sample-Lab Blank, Quality Control Sample-Lab Duplicate, Quality Control Sample-Lab Spike, Quality Control-Calibration Check, Quality Control-Meter Lab Duplicate, Quality Control-Meter Lab Blank
Date formats: Should be mm/dd/yyyy and parsed correctly on import
Depth data present: Depth data should be included in Activity Depth/Height Measure or Activity Relative Depth Name for all rows where Activity Type is Field Msr/Obs or Sample-Routine
Non-numeric Activity Depth/Height Measure: All depth values should be numbers, excluding missing values
Activity Depth/Height Unit: All entries should be ft
, m
, or blank
Activity Relative Depth Name: Should be either Surface, Bottom, Midwater, Near Bottom, or blank (warning only)
Activity Depth/Height Measure out of range: All depth values should be less than or equal to 1 meter / 3.3 feet or entered as Surface in the Activity Relative Depth Name column (warning only)
Characteristic Name: Should match parameter names in the Simple Parameter
or WQX Parameter
columns of the paramsMWR
data (warning only)
Result Value: Should be a numeric value or a text value as AQL or BDL
Non-numeric Quantitation Limit: All values should be numbers, excluding missing values
QC Reference Value: Should be a numeric value or a text value as AQL or BDL
Result Unit: No missing entries in Result Unit
, except pH which can be blank
Single Result Unit: Each unique parameter in Characteristic Name
should have only one entry in Result Unit
(excludes entries for lab spikes reported as %
or % recovery
)
Correct Result Unit: Each unique parameter in Characteristic Name
should have an entry in Result Unit
that matches one of the acceptable values in the Units of measure
column of the paramsMWR
data (excludes entries for lab spikes reported as %
or % recovery
)
resdat
is returned as is if no errors are found, otherwise an informative error message is returned prompting the user to make the required correction to the raw data before proceeding. Checks with warnings can be fixed at the discretion of the user before proceeding.
library(dplyr) respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') resdat <- suppressWarnings(readxl::read_excel(respth, na = c('NA', 'na', ''), guess_max = Inf)) %>% dplyr::mutate_if(function(x) !lubridate::is.POSIXct(x), as.character) checkMWRresults(resdat)
library(dplyr) respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') resdat <- suppressWarnings(readxl::read_excel(respth, na = c('NA', 'na', ''), guess_max = Inf)) %>% dplyr::mutate_if(function(x) !lubridate::is.POSIXct(x), as.character) checkMWRresults(resdat)
Check site metadata file
checkMWRsites(sitdat)
checkMWRsites(sitdat)
sitdat |
input data frame |
This function is used internally within readMWRsites
to run several checks on the input data for completeness and conformance to WQX requirements
The following checks are made:
Column name spelling: Should be the following: Monitoring Location ID, Monitoring Location Name, Monitoring Location Latitude, Monitoring Location Longitude, Location Group
Columns present: All columns from the previous check should be present
Missing longitude or latitude: No missing entries in Monitoring Location Latitude or Monitoring Location Longitude
Non-numeric latitude values: Values entered in Monitoring Location Latitude must be numeric
Non-numeric longitude values: Values entered in Monitoring Location Longitude must be numeric
Positive longitude values: Values in Monitoring Location Longitude must be negative
Missing Location ID: No missing entries for Monitoring Location ID
sitdat
is returned as is if no errors are found, otherwise an informative error message is returned prompting the user to make the required correction to the raw data before proceeding.
library(dplyr) sitpth <- system.file('extdata/ExampleSites.xlsx', package = 'MassWateR') sitdat <- readxl::read_excel(sitpth, na = c('NA', 'na', '')) checkMWRsites(sitdat)
library(dplyr) sitpth <- system.file('extdata/ExampleSites.xlsx', package = 'MassWateR') sitdat <- readxl::read_excel(sitpth, na = c('NA', 'na', '')) checkMWRsites(sitdat)
Check water quality exchange (wqx) metadata input
checkMWRwqx(wqxdat, warn = TRUE)
checkMWRwqx(wqxdat, warn = TRUE)
wqxdat |
input data frame |
warn |
logical to return warnings to the console (default) |
This function is used internally within readMWRwqx
to run several checks on the input data for conformance with downstream functions
The following checks are made:
Column name spelling: Should be the following: Parameter, Sampling Method Context, Method Speciation, Result Sample Fraction, Analytical Method, Analytical Method Context
Columns present: All columns from the previous check should be present
Unique parameters: Values in Parameter
should be unique (no duplicates)
Parameter: Should match parameter names in the Simple Parameter
or WQX Parameter
columns of the paramsMWR
data (warning only)
wqxdat
is returned as is if no errors are found, otherwise an informative error message is returned prompting the user to make the required correction to the raw data before proceeding. Checks with warnings can be fixed at the discretion of the user before proceeding.
library(dplyr) wqxpth <- system.file('extdata/ExampleWQX.xlsx', package = 'MassWateR') wqxdat <- readxl::read_excel(wqxpth, na = c('NA', 'na', ''), col_types = 'text') checkMWRwqx(wqxdat)
library(dplyr) wqxpth <- system.file('extdata/ExampleWQX.xlsx', package = 'MassWateR') wqxdat <- readxl::read_excel(wqxpth, na = c('NA', 'na', ''), col_types = 'text') checkMWRwqx(wqxdat)
Format data quality objective accuracy data
formMWRacc(accdat)
formMWRacc(accdat)
accdat |
input data fram |
This function is used internally within readMWRacc
to format the input data for downstream analysis. The formatting includes:
Minor formatting for units: For conformance to WQX, e.g., ppt is changed to ppth, s.u. is changed to NA in uom
Convert Parameter: All parameters are converted to Simple Parameter
in paramsMWR
as needed
Remove unicode: Remove or replace unicode characters with those that can be used in logical expressions in qcMWRacc
, e.g., replace with
Convert limits to numeric: Convert MDL
and UQL
columns to numeric
A formatted data frame of the data quality objectives file for accuracy
accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') accdat <- readxl::read_excel(accpth, na = c('NA', '')) accdat <- dplyr::mutate(accdat, dplyr::across(-c(`Value Range`), ~ dplyr::na_if(.x, 'na'))) formMWRacc(accdat)
accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') accdat <- readxl::read_excel(accpth, na = c('NA', '')) accdat <- dplyr::mutate(accdat, dplyr::across(-c(`Value Range`), ~ dplyr::na_if(.x, 'na'))) formMWRacc(accdat)
Format data quality objective frequency and completeness data
formMWRfrecom(frecomdat)
formMWRfrecom(frecomdat)
frecomdat |
input data frame |
This function is used internally within readMWRfrecom
to format the input data for downstream analysis. The formatting includes:
Convert Parameter: All parameters are converted to Simple Parameter
in paramsMWR
as needed
A formatted data frame of the data quality objectives file for frequency and completeness
library(dplyr) frecompth <- system.file('extdata/ExampleDQOFrequencyCompleteness.xlsx', package = 'MassWateR') frecomdat <- suppressMessages(readxl::read_excel(frecompth, skip = 1, na = c('NA', 'na', ''), col_types = c('text', 'numeric', 'numeric', 'numeric', 'numeric', 'numeric', 'numeric') )) %>% rename(`% Completeness` = `...7`) formMWRfrecom(frecomdat)
library(dplyr) frecompth <- system.file('extdata/ExampleDQOFrequencyCompleteness.xlsx', package = 'MassWateR') frecomdat <- suppressMessages(readxl::read_excel(frecompth, skip = 1, na = c('NA', 'na', ''), col_types = c('text', 'numeric', 'numeric', 'numeric', 'numeric', 'numeric', 'numeric') )) %>% rename(`% Completeness` = `...7`) formMWRfrecom(frecomdat)
Format water quality monitoring results
formMWRresults(resdat, tzone = "America/Jamaica")
formMWRresults(resdat, tzone = "America/Jamaica")
resdat |
input data frame for results |
tzone |
character string for time zone |
This function is used internally within readMWRresults
to format the input data for downstream analysis. The formatting includes:
Fix date and time inputs: Activity Start Date is converted to YYYY-MM-DD as a date object, Actvity Start Time is convered to HH:MM as a character to fix artifacts from Excel import
Minor formatting for Result Unit: For conformance to WQX, e.g., ppt is changed to ppth, s.u. is changed to NA
Convert characteristic names: All parameters in Characteristic Name
are converted to Simple Parameter
in paramsMWR
as needed
A formatted data frame of the water quality monitoring results file
library(dplyr) respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') resdat <- suppressWarnings(readxl::read_excel(respth, na = c('NA', 'na', ''), guess_max = Inf)) %>% dplyr::mutate_if(function(x) !lubridate::is.POSIXct(x), as.character) formMWRresults(resdat)
library(dplyr) respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') resdat <- suppressWarnings(readxl::read_excel(respth, na = c('NA', 'na', ''), guess_max = Inf)) %>% dplyr::mutate_if(function(x) !lubridate::is.POSIXct(x), as.character) formMWRresults(resdat)
Format WQX metadata input
formMWRwqx(wqxdat)
formMWRwqx(wqxdat)
wqxdat |
input data frame for wqx metadata |
This function is used internally within readMWRwqx
to format the input data for downstream analysis. The formatting includes:
Convert characteristic names: All parameters in Characteristic Name
are converted to Simple Parameter
in paramsMWR
as needed
A formatted data frame of the WQX metadata file
library(dplyr) wqxpth <- system.file('extdata/ExampleWQX.xlsx', package = 'MassWateR') wqxdat <- suppressWarnings(readxl::read_excel(wqxpth, na = c('NA', 'na', ''), col_types = 'text')) formMWRwqx(wqxdat)
library(dplyr) wqxpth <- system.file('extdata/ExampleWQX.xlsx', package = 'MassWateR') wqxdat <- suppressWarnings(readxl::read_excel(wqxpth, na = c('NA', 'na', ''), col_types = 'text')) formMWRwqx(wqxdat)
Master parameter list and units for Characteristic Name column in results data
paramsMWR
paramsMWR
A data.frame
This information is used to verify the correct format of input data and for formatting output data for upload to WQX. A column showing the corresponding WQX names is also included.
paramsMWR
paramsMWR
Run quality control accuracy checks for water quality monitoring results
qcMWRacc( res = NULL, acc = NULL, frecom = NULL, fset = NULL, runchk = TRUE, warn = TRUE, accchk = c("Field Blanks", "Lab Blanks", "Field Duplicates", "Lab Duplicates", "Lab Spikes / Instrument Checks"), suffix = "%" )
qcMWRacc( res = NULL, acc = NULL, frecom = NULL, fset = NULL, runchk = TRUE, warn = TRUE, accchk = c("Field Blanks", "Lab Blanks", "Field Duplicates", "Lab Duplicates", "Lab Spikes / Instrument Checks"), suffix = "%" )
res |
character string of path to the results file or |
acc |
character string of path to the data quality objectives file for accuracy or |
frecom |
character string of path to the data quality objectives file for frequency and completeness or |
fset |
optional list of inputs with elements named |
runchk |
logical to run data checks with |
warn |
logical to return warnings to the console (default) |
accchk |
character string indicating which accuracy check to return, one to any of |
suffix |
character string indicating suffix to append to percentage values |
The function can be used with inputs as paths to the relevant files or as data frames returned by readMWRresults
and readMWRacc
. For the former, the full suite of data checks can be evaluated with runkchk = T
(default) or suppressed with runchk = F
. In the latter case, downstream analyses may not work if data are formatted incorrectly. For convenience, a named list with the input arguments as paths or data frames can be passed to the fset
argument instead. See the help file for utilMWRinput
.
Note that accuracy is only evaluated on parameters in the Parameter
column in the data quality objectives accuracy file. A warning is returned if there are parameters in Parameter
in the accuracy file that are not in Characteristic Name
in the results file.
Similarly, parameters in the results file in the Characteristic Name
column that are not found in the data quality objectives accuracy file are not evaluated. A warning is returned if there are parameters in Characteristic Name
in the results file that are not in Parameter
in the accuracy file.
The data quality objectives file for frequency and completeness is used to screen parameters in the results file for inclusion in the accuracy tables. Parameters with empty values in the frequency and completeness table are not returned.
The output shows the accuracy checks from the input files returned as a list, with each element of the list corresponding to a specific accuracy check specified with accchk
.
## # using file paths # results path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # frequency and completeness path frecompth <- system.file('extdata/ExampleDQOFrequencyCompleteness.xlsx', package = 'MassWateR') qcMWRacc(res = respth, acc = accpth, frecom = frecompth)
## # using file paths # results path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # frequency and completeness path frecompth <- system.file('extdata/ExampleDQOFrequencyCompleteness.xlsx', package = 'MassWateR') qcMWRacc(res = respth, acc = accpth, frecom = frecompth)
Run quality control completeness checks for water quality monitoring results
qcMWRcom(res = NULL, frecom = NULL, fset = NULL, runchk = TRUE, warn = TRUE)
qcMWRcom(res = NULL, frecom = NULL, fset = NULL, runchk = TRUE, warn = TRUE)
res |
character string of path to the results file or |
frecom |
character string of path to the data quality objectives file for frequency and completeness or |
fset |
optional list of inputs with elements named |
runchk |
logical to run data checks with |
warn |
logical to return warnings to the console (default) |
The function can be used with inputs as paths to the relevant files or as data frames returned by readMWRresults
and readMWRfrecom
. For the former, the full suite of data checks can be evaluated with runkchk = T
(default) or suppressed with runchk = F
. In the latter case, downstream analyses may not work if data are formatted incorrectly. For convenience, a named list with the input arguments as paths or data frames can be passed to the fset
argument instead. See the help file for utilMWRinput
.
Note that frequency is only evaluated on parameters in the Parameter
column in the data quality objectives frequency and completeness file. A warning is returned if there are parameters in Parameter
in the frequency and completeness file that are not in Characteristic Name
in the results file.
Similarly, parameters in the results file in the Characteristic Name
column that are not found in the data quality objectives frequency and completeness file are not evaluated. A warning is returned if there are parameters in Characteristic Name
in the results file that are not in Parameter
in the frequency and completeness file.
The output shows the completeness checks from the combined files. Each row applies to a completeness check for a parameter. The datarec
and qualrec
columns show the number of data records and qualified records, respectively. The datarec
column specifically shows only records not for quality control by excluding those as duplicates, blanks, or spikes in the count. The standard
column shows the relevant percentage required for the quality control check from the quality control objectives file, the complete
column shows the calculated completeness taken from the input data, and the met
column shows if the standard was met by comparing if complete
is greater than or equal to standard
.
## # using file paths # results path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # frequency and completeness path frecompth <- system.file('extdata/ExampleDQOFrequencyCompleteness.xlsx', package = 'MassWateR') qcMWRcom(res = respth, frecom = frecompth) ## # using data frames # results data resdat <- readMWRresults(respth) # frequency and completeness data frecomdat <- readMWRfrecom(frecompth) qcMWRcom(res = resdat, frecom = frecomdat)
## # using file paths # results path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # frequency and completeness path frecompth <- system.file('extdata/ExampleDQOFrequencyCompleteness.xlsx', package = 'MassWateR') qcMWRcom(res = respth, frecom = frecompth) ## # using data frames # results data resdat <- readMWRresults(respth) # frequency and completeness data frecomdat <- readMWRfrecom(frecompth) qcMWRcom(res = resdat, frecom = frecomdat)
Run quality control frequency checks for water quality monitoring results
qcMWRfre( res = NULL, acc = NULL, frecom = NULL, fset = NULL, runchk = TRUE, warn = TRUE )
qcMWRfre( res = NULL, acc = NULL, frecom = NULL, fset = NULL, runchk = TRUE, warn = TRUE )
res |
character string of path to the results file or |
acc |
character string of path to the data quality objectives file for accuracy or |
frecom |
character string of path to the data quality objectives file for frequency and completeness or |
fset |
optional list of inputs with elements named |
runchk |
logical to run data checks with |
warn |
logical to return warnings to the console (default) |
The function can be used with inputs as paths to the relevant files or as data frames returned by readMWRresults
, readMWRacc
, and readMWRfrecom
. For the former, the full suite of data checks can be evaluated with runkchk = T
(default) or suppressed with runchk = F
. In the latter case, downstream analyses may not work if data are formatted incorrectly. For convenience, a named list with the input arguments as paths or data frames can be passed to the fset
argument instead. See the help file for utilMWRinput
.
Note that frequency is only evaluated on parameters in the Parameter
column in the data quality objectives frequency and completeness file. A warning is returned if there are parameters in Parameter
in the frequency and completeness file that are not in Characteristic Name
in the results file.
Similarly, parameters in the results file in the Characteristic Name
column that are not found in the data quality objectives frequency and completeness file are not evaluated. A warning is returned if there are parameters in Characteristic Name
in the results file that are not in Parameter
in the frequency and completeness file.
The output shows the frequency checks from the input files. Each row applies to a frequency check for a parameter. The Parameter
column shows the parameter, the obs
column shows the total records that apply to regular activity types, the check
column shows the relevant activity type for each frequency check, the count
column shows the number of records that apply to a check, the standard
column shows the relevant percentage required for the quality control check from the quality control objectives file, and the met
column shows if the standard was met by comparing if percent
is greater than or equal to standard
.
## # using file paths # results path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # dqo accuracy data path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # frequency and completeness path frecompth <- system.file('extdata/ExampleDQOFrequencyCompleteness.xlsx', package = 'MassWateR') qcMWRfre(res = respth, acc = accpth, frecom = frecompth) ## # using data frames # results data resdat <- readMWRresults(respth) # accuracy data accdat <- readMWRacc(accpth) # frequency and completeness data frecomdat <- readMWRfrecom(frecompth) qcMWRfre(res = resdat, acc = accdat, frecom = frecomdat)
## # using file paths # results path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # dqo accuracy data path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # frequency and completeness path frecompth <- system.file('extdata/ExampleDQOFrequencyCompleteness.xlsx', package = 'MassWateR') qcMWRfre(res = respth, acc = accpth, frecom = frecompth) ## # using data frames # results data resdat <- readMWRresults(respth) # accuracy data accdat <- readMWRacc(accpth) # frequency and completeness data frecomdat <- readMWRfrecom(frecompth) qcMWRfre(res = resdat, acc = accdat, frecom = frecomdat)
Create the quality control review report
qcMWRreview( res = NULL, acc = NULL, frecom = NULL, fset = NULL, output_dir, output_file = NULL, rawdata = TRUE, dqofontsize = 7.5, tabfontsize = 9, padding = 0, warn = TRUE, runchk = TRUE )
qcMWRreview( res = NULL, acc = NULL, frecom = NULL, fset = NULL, output_dir, output_file = NULL, rawdata = TRUE, dqofontsize = 7.5, tabfontsize = 9, padding = 0, warn = TRUE, runchk = TRUE )
res |
character string of path to the results file or |
acc |
character string of path to the data quality objectives file for accuracy or |
frecom |
character string of path to the data quality objectives file for frequency and completeness or |
fset |
optional list of inputs with elements named |
output_dir |
character string of the output directory for the rendered file |
output_file |
optional character string for the file name |
rawdata |
logical to include quality control accuracy summaries for raw data, e.g., field blanks, etc. |
dqofontsize |
numeric for font size in the data quality objective tables in the first page of the review |
tabfontsize |
numeric for font size in the review tables |
padding |
numeric for row padding for table output |
warn |
logical indicating if warnings from the table functions are included in the file output |
runchk |
logical to run data checks with |
The function compiles a review report as a Word document for all quality control checks included in the MassWateR package. The report shows several tables, including the data quality objectives files for accuracy, frequency, and completeness, summary results for all accuracy checks, summary results for all frequency checks, summary results for all completeness checks, and individual results for all accuracy checks. The report uses the individual table functions (which can be used separately) to return the results, which include tabMWRacc
, tabMWRfre
, and tabMWRcom
. The help files for each of these functions can be consulted for a more detailed explanation of the quality control checks.
The workflow for using this function is to import the required data (results and data quality objective files) and to fix any errors noted on import prior to creating the review report. Additional warnings that may be of interest as returned by the individual table functions can be returned in the console by setting warn = TRUE
.
Optional arguments that can be changed as needed include specifying the file name with output_file
, suppressing the raw data summaries at the end of the report with rawdata = FALSE
, and changing the table font sizes (dqofontsize
for the data quality objectives on the first page, tabfontsize
for the remainder).
The function can be used with inputs as paths to the relevant files or as data frames returned by readMWRresults
, readMWRacc
, and readMWRfrecom
. For the former, the full suite of data checks can be evaluated with runkchk = T
(default) or suppressed with runchk = F
, as explained in the relevant help files. In the latter case, downstream analyses may not work if data are formatted incorrectly. For convenience, a named list with the input arguments as paths or data frames can be passed to the fset
argument instead. See the help file for utilMWRinput
.
A compiled review report named qcreview.docx
(or name passed to output_file
) will be saved in the directory specified by output_dir
# results data path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # dqo accuracy data path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # dqo completeness data path frecompth <- system.file('extdata/ExampleDQOFrequencyCompleteness.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # accuracy data accdat <- readMWRacc(accpth) # frequency and completeness data frecomdat <- readMWRfrecom(frecompth) # create report qcMWRreview(res = resdat, acc = accdat, frecom = frecomdat, output_dir = tempdir())
# results data path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # dqo accuracy data path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # dqo completeness data path frecompth <- system.file('extdata/ExampleDQOFrequencyCompleteness.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # accuracy data accdat <- readMWRacc(accpth) # frequency and completeness data frecomdat <- readMWRfrecom(frecompth) # create report qcMWRreview(res = resdat, acc = accdat, frecom = frecomdat, output_dir = tempdir())
Read data quality objectives for accuracy from an external file
readMWRacc(accpth, runchk = TRUE, warn = TRUE)
readMWRacc(accpth, runchk = TRUE, warn = TRUE)
accpth |
character string of path to the data quality objectives file for accuracy |
runchk |
logical to run data checks with |
warn |
logical to return warnings to the console (default) |
Data are imported with read_excel
and checked with checkMWRacc
.
A formatted data frame of data quality objectives for completeness that can be used for downstream analysis
accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') accdat <- readMWRacc(accpth) head(accdat)
accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') accdat <- readMWRacc(accpth) head(accdat)
Read data quality objectives for frequency and completeness from an external file
readMWRfrecom(frecompth, runchk = TRUE, warn = TRUE)
readMWRfrecom(frecompth, runchk = TRUE, warn = TRUE)
frecompth |
character string of path to the data quality objectives file for frequency and completeness |
runchk |
logical to run data checks with |
warn |
logical to return warnings to the console (default) |
Data are imported with read_excel
and checked with checkMWRfrecom
.
A formatted data frame of data quality objectives for frequency and completeness that can be used for downstream analysis
frecompth <- system.file('extdata/ExampleDQOFrequencyCompleteness.xlsx', package = 'MassWateR') frecomdat <- readMWRfrecom(frecompth) head(frecomdat)
frecompth <- system.file('extdata/ExampleDQOFrequencyCompleteness.xlsx', package = 'MassWateR') frecomdat <- readMWRfrecom(frecompth) head(frecomdat)
Read water quality monitoring results from an external file
readMWRresults(respth, runchk = TRUE, warn = TRUE, tzone = "America/Jamaica")
readMWRresults(respth, runchk = TRUE, warn = TRUE, tzone = "America/Jamaica")
respth |
character string of path to the results file |
runchk |
logical to run data checks with |
warn |
logical to return warnings to the console (default) |
tzone |
character string for time zone, passed to |
Date are imported with read_excel
, checked with checkMWRresults
, and formatted with formMWRresults
.
A formatted water quality monitoring results data frame that can be used for downstream analysis
readMWRresultsview
for troubleshooting import checks
respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') resdat <- readMWRresults(respth) head(resdat)
respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') resdat <- readMWRresults(respth) head(resdat)
Create summary spreadsheet of unique values for each column in the water quality results file to check for data mistakes prior to running the readMWRresults
function
readMWRresultsview( respth, columns = NULL, output_dir, output_file = NULL, maxlen = 8 )
readMWRresultsview( respth, columns = NULL, output_dir, output_file = NULL, maxlen = 8 )
respth |
character string of path to the results file |
columns |
character string indicating which columns to view, defaults to all |
output_dir |
character string of the output directory for the rendered file |
output_file |
optional character string for the name of the .csv file output, must include the file extension |
maxlen |
numeric to truncate numeric values to the specified length |
Acceptable options for the columns
argument include any of the column names in the results file. The default setting (NULL
) will show every column in the results file.
The output of this function can be useful to troubleshoot the checks when importing the water quality monitoring result file with readMWRresults
(see https://massbays-tech.github.io/MassWateR/articles/MassWateR.html#data-import-and-checks).
Creates a spreadsheet at the location specified by output_dir
. Each column shows the unique values.
respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # all columns readMWRresultsview(respth, output_dir = tempdir()) # parameters and units readMWRresultsview(respth, columns = c('Characteristic Name', 'Result Unit'), output_dir = tempdir())
respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # all columns readMWRresultsview(respth, output_dir = tempdir()) # parameters and units readMWRresultsview(respth, columns = c('Characteristic Name', 'Result Unit'), output_dir = tempdir())
Read site metadata from an external file
readMWRsites(sitpth, runchk = TRUE)
readMWRsites(sitpth, runchk = TRUE)
sitpth |
character string of path to the site metadata file |
runchk |
logical to run data checks with |
Data are imported with read_excel
and checked with checkMWRsites
.
A formatted data frame of site metadata that can be used for downstream analysis
sitpth <- system.file('extdata/ExampleSites.xlsx', package = 'MassWateR') sitdat <- readMWRsites(sitpth) head(sitdat)
sitpth <- system.file('extdata/ExampleSites.xlsx', package = 'MassWateR') sitdat <- readMWRsites(sitpth) head(sitdat)
Read water quality exchange (wqx) metadata input from an external file
readMWRwqx(wqxpth, runchk = TRUE, warn = TRUE)
readMWRwqx(wqxpth, runchk = TRUE, warn = TRUE)
wqxpth |
character string of path to the wqx metadata file |
runchk |
logical to run data checks with |
warn |
logical to return warnings to the console (default) |
Date are imported with read_excel
, checked with checkMWRwqx
.
A formatted data frame that can be used for downstream analysis
wqxpth <- system.file('extdata/ExampleWQX.xlsx', package = 'MassWateR') wqxdat <- readMWRwqx(wqxpth) head(wqxdat)
wqxpth <- system.file('extdata/ExampleWQX.xlsx', package = 'MassWateR') wqxdat <- readMWRwqx(wqxpth) head(wqxdat)
Create a formatted table of quality control accuracy checks
tabMWRacc( res = NULL, acc = NULL, frecom = NULL, fset = NULL, runchk = TRUE, warn = TRUE, accchk = c("Field Blanks", "Lab Blanks", "Field Duplicates", "Lab Duplicates", "Lab Spikes / Instrument Checks"), type = c("individual", "summary", "percent"), pass_col = "#57C4AD", fail_col = "#DB4325", suffix = "%", caption = TRUE )
tabMWRacc( res = NULL, acc = NULL, frecom = NULL, fset = NULL, runchk = TRUE, warn = TRUE, accchk = c("Field Blanks", "Lab Blanks", "Field Duplicates", "Lab Duplicates", "Lab Spikes / Instrument Checks"), type = c("individual", "summary", "percent"), pass_col = "#57C4AD", fail_col = "#DB4325", suffix = "%", caption = TRUE )
res |
character string of path to the results file or |
acc |
character string of path to the data quality objectives file for accuracy or |
frecom |
character string of path to the data quality objectives file for frequency and completeness or |
fset |
optional list of inputs with elements named |
runchk |
logical to run data checks with |
warn |
logical to return warnings to the console (default) |
accchk |
character string indicating which accuracy check to return, one to any of |
type |
character string indicating |
pass_col |
character string (as hex code) for the cell color of checks that pass, applies only if |
fail_col |
character string (as hex code) for the cell color of checks that fail, applies only if |
suffix |
character string indicating suffix to append to percentage values |
caption |
logical to include a caption from |
The function can be used with inputs as paths to the relevant files or as data frames returned by readMWRresults
and readMWRacc
. For the former, the full suite of data checks can be evaluated with runkchk = T
(default) or suppressed with runchk = F
, as explained in the relevant help files. In the latter case, downstream analyses may not work if data are formatted incorrectly. For convenience, a named list with the input arguments as paths or data frames can be passed to the fset
argument instead. See the help file for utilMWRinput
.
Also note that accuracy is only evaluated on parameters that are shared between the results file and data quality objectives file for accuracy. A warning is returned for parameters that do not match between the files. This warning can be suppressed by setting warn = FALSE
.
The function can return three types of tables as specified with the type
argument: "individual"
, "summary"
, or "percent"
. The individual tables are specific to each type of accuracy check for each parameter (e.g., field blanks, lab blanks, etc.). The summary table summarizes all accuracy checks by the number of checks and how many hit/misses are returned for each across all parameters. The percent table is similar to the summary table, but showing only percentages with appropriate color-coding for hit/misses. The data quality objectives file for frequency and completeness is required if type = "summary"
or type = "percent"
.
For type = "individual"
, the quality control tables for accuracy are retrieved by specifying the check with the accchk
argument. The accchk
argument can be used to specify one of the following values to retrieve the relevant tables: "Field Blanks"
, "Lab Blanks"
, "Field Duplicates"
, "Lab Duplicates"
, or "Lab Spikes / Instrument Checks"
.
For type = "summary"
, the function summarizes all accuracy checks by counting the number of quality control checks, number of misses, and percent acceptance for each parameter. All accuracy checks are used and the accchk
argument does not apply.
For type = "percent"
, the function returns a similar table as for the summary option, except only the percentage of checks that pass for each parameter are shown in wide format. Cells are color-coded based on the percentage of checks that have passed using the percent thresholds from the % Completeness
column of the data quality objectives file for frequency and completeness. Parameters without an entry for % Completeness
are not color-coded and an appropriate warning is returned. All accuracy checks are used and the accchk
argument does not apply.
Inputs for the results and data quality objectives for accuracy are processed internally with qcMWRacc
and the same arguments are accepted for this function, in addition to others listed above.
A flextable
object with formatted results.
## # using file paths # results path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # frequency and completeness path frecompth <- system.file('extdata/ExampleDQOFrequencyCompleteness.xlsx', package = 'MassWateR') # table as individual tabMWRacc(res = respth, acc = accpth, frecom = frecompth, type = 'individual', accchk = 'Field Blanks')
## # using file paths # results path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # frequency and completeness path frecompth <- system.file('extdata/ExampleDQOFrequencyCompleteness.xlsx', package = 'MassWateR') # table as individual tabMWRacc(res = respth, acc = accpth, frecom = frecompth, type = 'individual', accchk = 'Field Blanks')
Create a formatted table of quality control completeness checks
tabMWRcom( res = NULL, frecom = NULL, fset = NULL, runchk = TRUE, warn = TRUE, pass_col = "#57C4AD", fail_col = "#DB4325", digits = 0, suffix = "%", parameterwd = 1.15, noteswd = 3 )
tabMWRcom( res = NULL, frecom = NULL, fset = NULL, runchk = TRUE, warn = TRUE, pass_col = "#57C4AD", fail_col = "#DB4325", digits = 0, suffix = "%", parameterwd = 1.15, noteswd = 3 )
res |
character string of path to the results file or |
frecom |
character string of path to the data quality objectives file for frequency and completeness or |
fset |
optional list of inputs with elements named |
runchk |
logical to run data checks with |
warn |
logical to return warnings to the console (default) |
pass_col |
character string (as hex code) for the cell color of checks that pass |
fail_col |
character string (as hex code) for the cell color of checks that fail |
digits |
numeric indicating number of significant digits to report for percentages |
suffix |
character string indicating suffix to append to percentage values |
parameterwd |
numeric indicating width of the parameter column |
noteswd |
numeric indicating width of notes column |
The function can be used with inputs as paths to the relevant files or as data frames returned by readMWRresults
and readMWRfrecom
. For the former, the full suite of data checks can be evaluated with runkchk = T
(default) or suppressed with runchk = F
, as explained in the relevant help files. In the latter case, downstream analyses may not work if data are formatted incorrectly. For convenience, a named list with the input arguments as paths or data frames can be passed to the fset
argument instead. See the help file for utilMWRinput
.
Also note that completeness is only evaluated on parameters that are shared between the results file and data quality objectives file for frequency and completeness. A warning is returned for parameters that do not match between the files. This warning can be suppressed by setting warn = FALSE
.
A summary table showing the number of data records, number of qualified records, and percent completeness is created. The % Completeness
column shows cells as green or red if the required percentage of observations for completeness are present as specified in the data quality objectives file. The Hit/ Miss
column shows similar information but in text format, i.e., MISS
is shown if the quality control standard for completeness is not met.
Inputs for the results and data quality objectives for frequency and completeness are processed internally with qcMWRcom
and the same arguments are accepted for this function, in addition to others listed above.
A flextable
object with formatted results showing summary counts for all completeness checks for each parameter.
## # using file paths # results path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # frequency and completeness path frecompth <- system.file('extdata/ExampleDQOFrequencyCompleteness.xlsx', package = 'MassWateR') tabMWRcom(res = respth, frecom = frecompth) ## # using data frames # results data resdat <- readMWRresults(respth) # frequency and completeness data frecomdat <- readMWRfrecom(frecompth) tabMWRcom(res = resdat, frecom = frecomdat)
## # using file paths # results path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # frequency and completeness path frecompth <- system.file('extdata/ExampleDQOFrequencyCompleteness.xlsx', package = 'MassWateR') tabMWRcom(res = respth, frecom = frecompth) ## # using data frames # results data resdat <- readMWRresults(respth) # frequency and completeness data frecomdat <- readMWRfrecom(frecompth) tabMWRcom(res = resdat, frecom = frecomdat)
Create a formatted table of quality control frequency checks
tabMWRfre( res = NULL, acc = NULL, frecom = NULL, fset = NULL, runchk = TRUE, warn = TRUE, type = c("summary", "percent"), pass_col = "#57C4AD", fail_col = "#DB4325", digits = 0, suffix = "%" )
tabMWRfre( res = NULL, acc = NULL, frecom = NULL, fset = NULL, runchk = TRUE, warn = TRUE, type = c("summary", "percent"), pass_col = "#57C4AD", fail_col = "#DB4325", digits = 0, suffix = "%" )
res |
character string of path to the results file or |
acc |
character string of path to the data quality objectives file for accuracy or |
frecom |
character string of path to the data quality objectives file for frequency and completeness or |
fset |
optional list of inputs with elements named |
runchk |
logical to run data checks with |
warn |
logical to return warnings to the console (default) |
type |
character string indicating |
pass_col |
character string (as hex code) for the cell color of checks that pass, applies only if |
fail_col |
character string (as hex code) for the cell color of checks that fail, applies only if |
digits |
numeric indicating number of significant digits to report for percentages |
suffix |
character string indicating suffix to append to percentage values |
The function can be used with inputs as paths to the relevant files or as data frames returned by readMWRresults
, readMWRacc
, and readMWRfrecom
. For the former, the full suite of data checks can be evaluated with runkchk = T
(default) or suppressed with runchk = F
, as explained in the relevant help files. In the latter case, downstream analyses may not work if data are formatted incorrectly. For convenience, a named list with the input arguments as paths or data frames can be passed to the fset
argument instead. See the help file for utilMWRinput
.
Also note that completeness is only evaluated on parameters that are shared between the results file and data quality objectives file for frequency and completeness. A warning is returned for parameters that do not match between the files. This warning can be suppressed by setting warn = FALSE
.
The quality control tables for frequency show the number of records that apply to a given check (e.g., Lab Blank, Field Blank, etc.) relative to the number of "regular" data records (e.g., field samples or measures) for each parameter. A summary of all frequency checks for each parameter is provided if type = "summary"
or a color-coded table showing similar information as percentages for each parameter is provided if type = "percent"
.
Inputs for the results and data quality objectives for accuracy and frequency and completeness are processed internally with qcMWRcom
and the same arguments are accepted for this function, in addition to others listed above.
A flextable
object with formatted results.
## # using file paths # results path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # dqo accuracy data path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # frequency and completeness path frecompth <- system.file('extdata/ExampleDQOFrequencyCompleteness.xlsx', package = 'MassWateR') # table as summary tabMWRfre(res = respth, acc = accpth, frecom = frecompth, type = 'summary') # table as percent tabMWRfre(res = respth, acc = accpth, frecom = frecompth, type = 'percent') ## # using data frames # results data resdat <- readMWRresults(respth) # accuracy data accdat <- readMWRacc(accpth) # frequency and completeness data frecomdat <- readMWRfrecom(frecompth) # table as summary tabMWRfre(res = resdat, acc = accdat, frecom = frecomdat, type = 'summary') # table as percent tabMWRfre(res = resdat, acc = accdat, frecom = frecomdat, type = 'percent')
## # using file paths # results path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # dqo accuracy data path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # frequency and completeness path frecompth <- system.file('extdata/ExampleDQOFrequencyCompleteness.xlsx', package = 'MassWateR') # table as summary tabMWRfre(res = respth, acc = accpth, frecom = frecompth, type = 'summary') # table as percent tabMWRfre(res = respth, acc = accpth, frecom = frecompth, type = 'percent') ## # using data frames # results data resdat <- readMWRresults(respth) # accuracy data accdat <- readMWRacc(accpth) # frequency and completeness data frecomdat <- readMWRfrecom(frecompth) # table as summary tabMWRfre(res = resdat, acc = accdat, frecom = frecomdat, type = 'summary') # table as percent tabMWRfre(res = resdat, acc = accdat, frecom = frecomdat, type = 'percent')
Create and save tables in a single workbook for WQX upload
tabMWRwqx( res = NULL, acc = NULL, sit = NULL, wqx = NULL, fset = NULL, output_dir, output_file = NULL, warn = TRUE, runchk = TRUE )
tabMWRwqx( res = NULL, acc = NULL, sit = NULL, wqx = NULL, fset = NULL, output_dir, output_file = NULL, warn = TRUE, runchk = TRUE )
res |
character string of path to the results file or |
acc |
character string of path to the data quality objectives file for accuracy or |
sit |
character string of path to the site metadata file or |
wqx |
character string of path to the wqx metadata file or |
fset |
optional list of inputs with elements named |
output_dir |
character string of the output directory for the results |
output_file |
optional character string for the file name, must include .xlsx suffix |
warn |
logical to return warnings to the console (default) |
runchk |
logical to run data checks with |
This function will export a single Excel workbook with three sheets, named "Project", "Locations", and "Results". The output is populated with as much content as possible based on information in the input files. The remainder of the information not included in the output will need to be manually entered before uploading the data to WQX. All required columns are present, but individual rows will need to be verified for completeness. It is the responsibility of the user to verify this information is complete and correct before uploading the data.
The workflow for using this function is to import the required data (results, data quality objectives file for accuracy, site metadata, and wqx metadata) and to fix any errors noted on import prior to creating the output. The function can be used with inputs as paths to the relevant files or as data frames returned by readMWRresults
, readMWRacc
, readMWRsites
, and readMWRwqx
. For the former, the full suite of data checks can be evaluated with runkchk = T
(default) or suppressed with runchk = F
, as explained in the relevant help files. In the latter case, downstream analyses may not work if data are formatted incorrectly. For convenience, a named list with the input arguments as paths or data frames can be passed to the fset
argument instead. See the help file for utilMWRinput
.
The name of the output file can also be changed using the output_file
argument, the default being wqxtab.xlsx
. Warnings can also be turned off or on (default) using the warn
argument. This returns any warnings when data are imported and only applies if the file inputs are paths.
An Excel workbook named wqxtab.xlsx
(or name passed to output_file
) will be saved in the directory specified by output_dir
. The workbook will include three sheets names "Projects", "Locations", and "Results".
# results data path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # dqo accuracy data path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # site data path sitpth <- system.file('extdata/ExampleSites.xlsx', package = 'MassWateR') # wqx data path wqxpth <- system.file('extdata/ExampleWQX.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # accuracy data accdat <- readMWRacc(accpth) # site data sitdat <- readMWRsites(sitpth) # wqx data wqxdat <- readMWRwqx(wqxpth) # create workbook tabMWRwqx(res = resdat, acc = accdat, sit = sitdat, wqx = wqxdat, output_dir = tempdir())
# results data path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # dqo accuracy data path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # site data path sitpth <- system.file('extdata/ExampleSites.xlsx', package = 'MassWateR') # wqx data path wqxpth <- system.file('extdata/ExampleWQX.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # accuracy data accdat <- readMWRacc(accpth) # site data sitdat <- readMWRsites(sitpth) # wqx data wqxdat <- readMWRwqx(wqxpth) # create workbook tabMWRwqx(res = resdat, acc = accdat, sit = sitdat, wqx = wqxdat, output_dir = tempdir())
Master thresholds list for analysis of results data
thresholdMWR
thresholdMWR
A data.frame
of 28 rows and 10 columns
This file includes appropriate threshold values of water quality parameters for marine and freshwater environments based on state standards or typical ranges in Massachusetts.
thresholdMWR
thresholdMWR
Filter results data by parameter, date range, site, result attributes, and/or location group
utilMWRfilter( resdat, sitdat = NULL, param, dtrng = NULL, site = NULL, resultatt = NULL, locgroup = NULL, alllocgroup = FALSE, allresultatt = FALSE )
utilMWRfilter( resdat, sitdat = NULL, param, dtrng = NULL, site = NULL, resultatt = NULL, locgroup = NULL, alllocgroup = FALSE, allresultatt = FALSE )
resdat |
results data as returned by |
sitdat |
site metadata file as returned by |
param |
character string to filter results by a parameter in |
dtrng |
character string of length two for the date ranges as YYYY-MM-DD |
site |
character string of sites to include, default all |
resultatt |
character string of result attributes to include, default all |
locgroup |
character string of location groups to include from the |
alllocgroup |
logical indicating if results data are filtered by all location groups in |
allresultatt |
logical indicating if results data are filtered by all result attributes if |
resdat
filtered by param
, dtrng
, site
, resultatt
, and/or locgroup
, otherwise resdat
filtered only by param
if other arguments are NULL
# results file path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # site data path sitpth <- system.file('extdata/ExampleSites.xlsx', package = 'MassWateR') # site data sitdat <- readMWRsites(sitpth) # filter by parameter, date range utilMWRfilter(resdat, param = 'DO', dtrng = c('2022-06-01', '2022-06-30')) # filter by parameter, site utilMWRfilter(resdat, param = 'DO', site = c('ABT-026', 'ABT-062', 'ABT-077')) # filter by parameter, result attribute utilMWRfilter(resdat, param = 'DO', resultatt = 'DRY') # filter by parameter, location group, date range utilMWRfilter(resdat, param = 'DO', sitdat = sitdat, locgroup = 'Assabet', dtrng = c('2022-06-01', '2022-06-30'))
# results file path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # site data path sitpth <- system.file('extdata/ExampleSites.xlsx', package = 'MassWateR') # site data sitdat <- readMWRsites(sitpth) # filter by parameter, date range utilMWRfilter(resdat, param = 'DO', dtrng = c('2022-06-01', '2022-06-30')) # filter by parameter, site utilMWRfilter(resdat, param = 'DO', site = c('ABT-026', 'ABT-062', 'ABT-077')) # filter by parameter, result attribute utilMWRfilter(resdat, param = 'DO', resultatt = 'DRY') # filter by parameter, location group, date range utilMWRfilter(resdat, param = 'DO', sitdat = sitdat, locgroup = 'Assabet', dtrng = c('2022-06-01', '2022-06-30'))
Filter results data to surface measurements
utilMWRfiltersurface(resdat)
utilMWRfiltersurface(resdat)
resdat |
results data as returned by |
This function is used internally for all analysis functions
resdat
filtered by Activity Depth/Height Measure
less than or equal to 1 meter or 3.3 feet or Activity Relative Depth Name
as "Surface"
# results file path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # filter surface data utilMWRfiltersurface(resdat)
# results file path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # filter surface data utilMWRfiltersurface(resdat)
Prep results data for frequency checks
utilMWRfre(resdat, param, accdat, warn = TRUE)
utilMWRfre(resdat, param, accdat, warn = TRUE)
resdat |
results data as returned by |
param |
character string to filter results and check if a parameter in the |
accdat |
|
warn |
logical to return warnings to the console (default) |
This function is similar to utilMWRlimits
with some additional processing appropriate for creating the frequency table in tabMWRfree
. The param
argument is used to identify the appropriate "MDL"
or "UQL"
values in the data quality objectives file for accuracy. A warning is returned to the console if the accuracy file does not contain the appropriate information for the parameter. Results will be filtered by param
regardless of any warning.
resdat
filtered by param
with any entries in "Result Value"
as "BDL"
or "AQL"
replaced with appropriate values in the "Quantitation Limit"
column, if present, otherwise the "MDL"
or "UQL"
columns from the data quality objectives file for accuracy are used. Values as "BDL"
use one half of the appropriate limit. Values not in the "Value Range"
column of the accuracy file are removed from the output.
# results file path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # accuracy data accdat <- readMWRacc(accpth) # apply to total phosphorus utilMWRfre(resdat, accdat, param = 'TP') # apply to E.coli utilMWRfre(resdat, accdat, param = 'E.coli')
# results file path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # accuracy data accdat <- readMWRacc(accpth) # apply to total phosphorus utilMWRfre(resdat, accdat, param = 'TP') # apply to E.coli utilMWRfre(resdat, accdat, param = 'E.coli')
Load external file from remote source, fail gracefully
utilMWRhttpgrace(remote_file)
utilMWRhttpgrace(remote_file)
remote_file |
URL of the external file |
The external file as an RData object
# fails gracefully utilMWRhttpgrace('http://httpbin.org/status/404') # imports data or fails gracefully fl <- 'https://github.com/massbays-tech/MassWateRdata/raw/main/data/streamsMWR.RData' utilMWRhttpgrace(fl)
# fails gracefully utilMWRhttpgrace('http://httpbin.org/status/404') # imports data or fails gracefully fl <- 'https://github.com/massbays-tech/MassWateRdata/raw/main/data/streamsMWR.RData' utilMWRhttpgrace(fl)
Utility function to import data as paths or data frames
utilMWRinput( res = NULL, acc = NULL, frecom = NULL, sit = NULL, wqx = NULL, fset = NULL, runchk = TRUE, warn = TRUE )
utilMWRinput( res = NULL, acc = NULL, frecom = NULL, sit = NULL, wqx = NULL, fset = NULL, runchk = TRUE, warn = TRUE )
res |
character string of path to the results file or |
acc |
character string of path to the data quality objectives file for accuracy or |
frecom |
character string of path to the data quality objectives file for frequency and completeness or |
sit |
character string of path to the site metadata file or |
wqx |
character string of path to the wqx metadata file or |
fset |
optional list of inputs with elements named |
runchk |
logical to run data checks with |
warn |
logical to return warnings to the console (default) |
The function is used internally by others to import data from paths to the relevant files or as data frames returned by readMWRresults
, readMWRacc
, readMWRfrecom
, readMWRsites
, or readMWRwqx
. For the former, the full suite of data checks can be evaluated with runkchk = T
(default) or suppressed with runchk = F
.
The fset
argument can used in place of the preceding arguments. The argument accepts a list with named elements as res
, acc
, frecom
, sit
, or wqx
, where the elements are either character strings of the path or data frames to the corresponding inputs. Missing elements will be interpreted as NULL
values. This argument is provided as convenience to apply a single list as input versus separate inputs for each argument.
Any of the arguments for the data files can be NULL
, used as a convenience for downstream functions that do not require all.
A five element list with the imported results, data quality objective files, site metadata, and wqx metadata, named "resdat"
, "accdat"
, "frecomdat"
, "sitdat"
, and "wqxdat"
, respectively.
## # using file paths # results path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # frequency and completeness path frecompth <- system.file('extdata/ExampleDQOFrequencyCompleteness.xlsx', package = 'MassWateR') # site path sitpth <- system.file('extdata/ExampleSites.xlsx', package = 'MassWateR') # wqx path wqxpth <- system.file('extdata/ExampleWQX.xlsx', package = 'MassWateR') inp <- utilMWRinput(res = respth, acc = accpth, frecom = frecompth, sit = sitpth, wqx = wqxpth) inp$resdat inp$accdat inp$frecomdat inp$sitdat inp$wqxdat ## # using data frames # results data resdat <- readMWRresults(respth) # accuracy data accdat <- readMWRacc(accpth) # frequency and completeness data frecomdat <- readMWRfrecom(frecompth) # site data sitdat <- readMWRsites(sitpth) # wqx data wqxdat <- readMWRwqx(wqxpth) inp <- utilMWRinput(res = resdat, acc = accdat, frecom = frecomdat, sit = sitdat, wqx = wqxpth) inp$resdat inp$accdat inp$frecomdat inp$sitdat inp$wqxdat ## # using fset as list input # input with paths to files fset <- list( res = respth, acc = accpth, frecom = frecompth, sit = sitpth, wqx = wqxpth ) utilMWRinput(fset = fset)
## # using file paths # results path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # frequency and completeness path frecompth <- system.file('extdata/ExampleDQOFrequencyCompleteness.xlsx', package = 'MassWateR') # site path sitpth <- system.file('extdata/ExampleSites.xlsx', package = 'MassWateR') # wqx path wqxpth <- system.file('extdata/ExampleWQX.xlsx', package = 'MassWateR') inp <- utilMWRinput(res = respth, acc = accpth, frecom = frecompth, sit = sitpth, wqx = wqxpth) inp$resdat inp$accdat inp$frecomdat inp$sitdat inp$wqxdat ## # using data frames # results data resdat <- readMWRresults(respth) # accuracy data accdat <- readMWRacc(accpth) # frequency and completeness data frecomdat <- readMWRfrecom(frecompth) # site data sitdat <- readMWRsites(sitpth) # wqx data wqxdat <- readMWRwqx(wqxpth) inp <- utilMWRinput(res = resdat, acc = accdat, frecom = frecomdat, sit = sitdat, wqx = wqxpth) inp$resdat inp$accdat inp$frecomdat inp$sitdat inp$wqxdat ## # using fset as list input # input with paths to files fset <- list( res = respth, acc = accpth, frecom = frecompth, sit = sitpth, wqx = wqxpth ) utilMWRinput(fset = fset)
Check if required inputs are present for a function
utilMWRinputcheck(inputs)
utilMWRinputcheck(inputs)
inputs |
list of arguments passed from the parent function |
NULL if all inputs are present, otherwise an error message indicating which inputs are missing
inputchk <- formals(tabMWRcom) inputchk$res <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') inputchk$frecom <- system.file('extdata/ExampleDQOFrequencyCompleteness.xlsx', package = 'MassWateR') utilMWRinputcheck(inputchk)
inputchk <- formals(tabMWRcom) inputchk$res <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') inputchk$frecom <- system.file('extdata/ExampleDQOFrequencyCompleteness.xlsx', package = 'MassWateR') utilMWRinputcheck(inputchk)
Fill results data as BDL or AQL with appropriate values
utilMWRlimits(resdat, param, accdat, warn = TRUE)
utilMWRlimits(resdat, param, accdat, warn = TRUE)
resdat |
results data as returned by |
param |
character string to filter results and check if a parameter in the |
accdat |
|
warn |
logical to return warnings to the console (default) |
The param
argument is used to identify the appropriate "MDL"
or "UQL"
values in the data quality objectives file for accuracy. A warning is returned to the console if the accuracy file does not contain the appropriate information for the parameter. Results will be filtered by param
regardless of any warning.
resdat
filtered by param
with any entries in "Result Value"
as "BDL"
or "AQL"
replaced with appropriate values in the "Quantitation Limit"
column, if present, otherwise the "MDL"
or "UQL"
columns from the data quality objectives file for accuracy are used. Values as "BDL"
use one half of the appropriate limit. Output only includes rows with the activity type as "Field Msr/Obs"
or "Sample-Routine"
.
# results file path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # accuracy data accdat <- readMWRacc(accpth) # apply to total phosphorus utilMWRlimits(resdat, accdat, param = 'TP') # apply to E.coli utilMWRlimits(resdat, accdat, param = 'E.coli')
# results file path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # accuracy data accdat <- readMWRacc(accpth) # apply to total phosphorus utilMWRlimits(resdat, accdat, param = 'TP') # apply to E.coli utilMWRlimits(resdat, accdat, param = 'E.coli')
Identify outliers in a numeric vector
utilMWRoutlier(x, logscl)
utilMWRoutlier(x, logscl)
x |
numeric vector of any length |
logscl |
logical to indicate if vector should be log10-transformed first |
Outliers are identified as 1.5 times the interquartile range
A logical vector equal in length to x
indicating TRUE
for outliers or FALSE
for within normal range
x <- rnorm(20) utilMWRoutlier(x, logscl = FALSE)
x <- rnorm(20) utilMWRoutlier(x, logscl = FALSE)
Verify summary function
utilMWRsumfun(accdat, param, sumfun = "auto")
utilMWRsumfun(accdat, param, sumfun = "auto")
accdat |
|
param |
character string for the parameter to evaluate as provided in the |
sumfun |
character indicating one of |
This function verifies appropriate summary functions are passed from sumfun
. The mean or geometric mean output is used for sumfun = "auto"
based on information in the data quality objective file for accuracy, i.e., parameters with "log" in any of the columns are summarized with the geometric mean, otherwise arithmetic. Using "mean"
or "geomean"
for sumfun
will apply the appropriate function regardless of information in the data quality objective file for accuracy.
Character indicating the appropriate summary function based on the value passed to sumfun
.
# accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # accuracy data accdat <- readMWRacc(accpth) # geomean auto utilMWRsumfun(accdat, param = 'E.coli') # mean force utilMWRsumfun(accdat, param = 'E.coli', sumfun = 'mean') # mean auto utilMWRsumfun(accdat, param = 'DO') # geomean force utilMWRsumfun(accdat, param = 'DO', sumfun = 'geomean')
# accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # accuracy data accdat <- readMWRacc(accpth) # geomean auto utilMWRsumfun(accdat, param = 'E.coli') # mean force utilMWRsumfun(accdat, param = 'E.coli', sumfun = 'mean') # mean auto utilMWRsumfun(accdat, param = 'DO') # geomean force utilMWRsumfun(accdat, param = 'DO', sumfun = 'geomean')
Summarize a results data frame by a grouping variable
utilMWRsummary(dat, accdat, param, sumfun = "auto", confint)
utilMWRsummary(dat, accdat, param, sumfun = "auto", confint)
dat |
input data frame |
accdat |
|
param |
character string for the parameter to evaluate as provided in the |
sumfun |
character indicating one of |
confint |
logical if user expects a confidence interval to be returned with the summary |
This function summarizes a results data frame by an existing grouping variable using the function supplied to sumfun
. The mean or geometric mean is used for sumfun = "auto"
based on information in the data quality objective file for accuracy, i.e., parameters with "log" in any of the columns are summarized with the geometric mean, otherwise arithmetic. Using "mean"
or "geomean"
for sumfun
will apply the appropriate function regardless of information in the data quality objective file for accuracy.
A summarized data frame, a warning will be returned if the confidence interval cannot be estimated and confint = TRUE
library(dplyr) # results data path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # accuracy data accdat <- readMWRacc(accpth) # fill BDL, AQL resdat <- utilMWRlimits(resdat = resdat, accdat = accdat, param = "DO") dat <- resdat %>% group_by(`Monitoring Location ID`) # summarize sites by mean utilMWRsummary(dat, accdat, param = 'DO', sumfun = 'auto', confint = TRUE) # summarize sites by minimum utilMWRsummary(dat, accdat, param = 'DO', sumfun = 'min', confint = FALSE)
library(dplyr) # results data path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # accuracy data accdat <- readMWRacc(accpth) # fill BDL, AQL resdat <- utilMWRlimits(resdat = resdat, accdat = accdat, param = "DO") dat <- resdat %>% group_by(`Monitoring Location ID`) # summarize sites by mean utilMWRsummary(dat, accdat, param = 'DO', sumfun = 'auto', confint = TRUE) # summarize sites by minimum utilMWRsummary(dat, accdat, param = 'DO', sumfun = 'min', confint = FALSE)
Get threshold lines from thresholdMWR
utilMWRthresh(resdat, param, thresh, threshlab = NULL)
utilMWRthresh(resdat, param, thresh, threshlab = NULL)
resdat |
results data as returned by |
param |
character string to first filter results by a parameter in |
thresh |
character indicating if relevant freshwater or marine threshold lines are included, one of |
threshlab |
optional character string indicating legend label for the threshold, required only if |
If thresh
is not numeric and thresholds are available for param
, a data.frame
of relevant marine or freshwater thresholds, otherwise NULL
. If thresh
is numeric, a data.frame
of the threshold with the appropriate label from threshlabel
.
# results file path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # get threshold lines utilMWRthresh(resdat = resdat, param = 'E.coli', thresh = 'fresh') # user-defined numeric threshold line utilMWRthresh(resdat = resdat, param = 'TP', thresh = 5, threshlab = 'My threshold')
# results file path respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR') # results data resdat <- readMWRresults(respth) # get threshold lines utilMWRthresh(resdat = resdat, param = 'E.coli', thresh = 'fresh') # user-defined numeric threshold line utilMWRthresh(resdat = resdat, param = 'TP', thresh = 5, threshlab = 'My threshold')
Format the title for analyze functions
utilMWRtitle( param, accdat = NULL, sumfun = NULL, site = NULL, dtrng = NULL, resultatt = NULL, locgroup = NULL )
utilMWRtitle( param, accdat = NULL, sumfun = NULL, site = NULL, dtrng = NULL, resultatt = NULL, locgroup = NULL )
param |
character string of the parameter to plot |
accdat |
optional |
sumfun |
optional character indicating one of |
site |
character string of sites to include |
dtrng |
character string of length two for the date ranges as YYYY-MM-DD |
resultatt |
character string of result attributes to plot |
locgroup |
character string of location groups to plot from the |
All arguments are optional except param
, appropriate text strings are appended to the param
argument for all other optional arguments indicating the level of filtering used in the plot and data summary if appropriate
A formatted character string used for the title in analysis plots
# no filters utilMWRtitle(param = 'DO') # filter by date only utilMWRtitle(param = 'DO', dtrng = c('2021-05-01', '2021-07-31')) # filter by all utilMWRtitle(param = 'DO', site = 'test', dtrng = c('2021-05-01', '2021-07-31'), resultatt = 'test', locgroup = 'test') # title using summary accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') accdat <- readMWRacc(accpth, runchk = FALSE) utilMWRtitle(param = 'DO', accdat = accdat, sumfun = 'auto', site = 'test', dtrng = c('2021-05-01', '2021-07-31'), resultatt = 'test', locgroup = 'test')
# no filters utilMWRtitle(param = 'DO') # filter by date only utilMWRtitle(param = 'DO', dtrng = c('2021-05-01', '2021-07-31')) # filter by all utilMWRtitle(param = 'DO', site = 'test', dtrng = c('2021-05-01', '2021-07-31'), resultatt = 'test', locgroup = 'test') # title using summary accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') accdat <- readMWRacc(accpth, runchk = FALSE) utilMWRtitle(param = 'DO', accdat = accdat, sumfun = 'auto', site = 'test', dtrng = c('2021-05-01', '2021-07-31'), resultatt = 'test', locgroup = 'test')
Value Range
columnCheck if incomplete range in Value Range
column
utilMWRvaluerange(accdat)
utilMWRvaluerange(accdat)
accdat |
|
The function evaluates if an incomplete or overlapping range is present in the Value Range
column of the data quality objectives file for accuracy
A named vector of "gap"
, "nogap"
, or "overlap"
indicating if a gap is present, no gap is present, or an overlap is present in the ranges provided by the value range for each parameter. The names correspond to the parameters.
# accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # accuracy data with no checks accdat <- readxl::read_excel(accpth, na = c('NA', ''), col_types = 'text') accdat <- dplyr::mutate(accdat, dplyr::across(-c(`Value Range`), ~ dplyr::na_if(.x, 'na'))) utilMWRvaluerange(accdat)
# accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # accuracy data with no checks accdat <- readxl::read_excel(accpth, na = c('NA', ''), col_types = 'text') accdat <- dplyr::mutate(accdat, dplyr::across(-c(`Value Range`), ~ dplyr::na_if(.x, 'na'))) utilMWRvaluerange(accdat)
Get logical value for y axis scaling
utilMWRyscale(accdat, param, yscl = "auto")
utilMWRyscale(accdat, param, yscl = "auto")
accdat |
|
param |
character string for the parameter to evaluate as provided in the |
yscl |
character indicating one of |
A logical value indicating TRUE
for log10-scale, FALSE
for arithmetic (linear)
# accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # accuracy data accdat <- readMWRacc(accpth) # log auto utilMWRyscale(accdat, param = 'E.coli') # linear force utilMWRyscale(accdat, param = 'E.coli', yscl = 'linear') # linear auto utilMWRyscale(accdat, param = 'DO') # log force utilMWRyscale(accdat, param = 'DO', yscl = 'log')
# accuracy path accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx', package = 'MassWateR') # accuracy data accdat <- readMWRacc(accpth) # log auto utilMWRyscale(accdat, param = 'E.coli') # linear force utilMWRyscale(accdat, param = 'E.coli', yscl = 'linear') # linear auto utilMWRyscale(accdat, param = 'DO') # log force utilMWRyscale(accdat, param = 'DO', yscl = 'log')