Functions for Positive Least Squares (PSL) fitting of (re)SPECIATE profiles

rsp_pls_profile builds PSL models for supplied profile(s) using the nls function, the 'port' algorithm and a lower limit of zero for all model outputs to enforce the positive fits. The modeled profiles are typically from an external source, e.g. a measurement campaign, and are fit as a linear additive series of reference profiles, here typically from (re)SPECIATE, to provide a measure of source apportionment based on the assumption that the profiles in the reference set are representative of the mix that make up the modeled sample. The pls_ functions work with rsp_pls_profile outputs, and are intended to be used when refining and analyzing these PLS models. See also pls_plots for PLS model plots.

rsp_pls_profile(rsp, ref, power = 1, ...)

pls_report(pls)

pls_test(pls)

pls_fit_species(
  pls,
  species,
  power = 1,
  refit.profile = TRUE,
  as.marker = FALSE,
  drop.missing = FALSE,
  ...
)

pls_refit_species(
  pls,
  species,
  power = 1,
  refit.profile = TRUE,
  as.marker = FALSE,
  drop.missing = FALSE,
  ...
)

pls_rebuild(
  pls,
  species,
  power = 1,
  refit.profile = TRUE,
  as.marker = FALSE,
  drop.missing = FALSE,
  ...
)

Arguments

rsp

A respeciate object, a data.frame of profiles in standard long form, intended for PLS modelling.

ref

A respeciate object, a data.frame of profiles also in standard long form, used as the set of candidate source profiles when fitting rsp.

power

A numeric, an additional factor to be added to weightings when fitting the PLS model. This is applied in the form weight^power, and increasing this, increases the relative weighting of the more heavily weighted measurements. Values in the range 1 - 2.5 are sometimes helpful.

...

additional arguments, typically ignored or passed on to nls.

pls

A rsp_pls_profile output, intended for use with pls_ functions.

species

for pls_fit_species, a data.frame of measurements of an additional species to be fitted to an existing PLS model, or for pls_refit_species a character vector of the names of species already included in the model to be refit. Both are multiple-species wrappers for pls_rebuild, a general-purpose PLS fitter than only handles single species.

refit.profile

(for pls_fit_species, pls_refit_species and pls_rebuild) logical. When fitting a new species (or refitted an existing species), all other species in the reference profiles are held 'as is' and added species is fit to the source contribution time-series of the previous PLS model. By default, the full PLS model is then refit using the revised ref source profile to generate a PLS model based on the revised source profiles (i.e., ref + new species or ref + refit species). However, this second step can be omitted using refit.profile=FALSE if you want to use the supplied species as an indicator rather than a standard member of the apportionment model.

as.marker

for pls_rebuild, pls_fit_species and pls_refit_species, logical, default FALSE, when fitting (or refitting) a species, treat it as source marker.

drop.missing

for pls_rebuild, pls_fit_species and pls_refit_species, logical, default FALSE, when building or rebuilding a PLS model, discard cases where species is missing.

Value

rsp_pls_profile returns a list of nls models, one per profile/measurement set in rsp. The pls_ functions work with these outputs. pls_report generates a data.frame of model outputs, and is used of several of the other pls_

functions. pls_fit_species, pls_refit_species and pls_fit_parent return the supplied rsp_pls_profile output, updated on the basis of the pls_ function action. pls_plots (documented separately) produce various plots commonly used in source apportionment studies.

Note

This implementation of PLS applies the following modeling constraints:

1. It generates a model of rsp that is positively constrained linear product of the profiles in ref, so outputs can only be zero or more. Although the model is generated using nls, which is a Nonlinear Least Squares (NLS) model, the fitting term applied in this case is linear.

2. The model is fit in the form:

\(X_{i,j} = \sum\limits_{k=1}^{K}{N_{i,k} * M_{k,j} + e_{i,j}}\)

Where X is the data set of measurements, rsp, M is data set of reference profiles, ref, N is the data set of source contributions, the source apportion solution, to be solved by minimising e, the error terms.

3. The number of species in rsp must be more that the number of profiles in ref to reduce the likelihood of over-fitting.