Positive Least Squares models — rsp.pls • respeciate

Functions for Positive Least Squares (PSL) fitting of respeciate profiles

rsp_pls_x builds PSL models for supplied profile(s) using the nls function, the 'port' algorithm and a lower limit of zero for all model outputs to enforce the positive fits. The modeled profiles are typically from an external source, e.g. a measurement campaign, and are fit as a linear additive series of reference profiles, here typically from respeciate, to provide a measure of source apportionment based on the assumption that the profiles in the reference set are representative of the mix that make up the modeled sample. The pls_ functions work with rsp_pls_x outputs, and are intended to be used when refining and analyzing these PLS models. See also pls_plots for PLS model plots.

rsp_pls_x(x, m, power = 1, ...)

pls_report(pls)

pls_test(pls)

pls_fit_species(
  pls,
  species,
  power = 1,
  refit.profile = TRUE,
  as.marker = FALSE,
  drop.missing = FALSE,
  ...
)

pls_refit_species(
  pls,
  species,
  power = 1,
  refit.profile = TRUE,
  as.marker = FALSE,
  drop.missing = FALSE,
  ...
)

pls_rebuild(
  pls,
  species,
  power = 1,
  refit.profile = TRUE,
  as.marker = FALSE,
  drop.missing = FALSE,
  ...
)

Arguments

x: A respeciate object, a data.frame of profiles in standard long form, intended for PLS modelling.
m: A respeciate object, a data.frame of profiles also in standard long form, used as the set of candidate source profiles when fitting x.
power: A numeric, an additional factor to be added to weightings when fitting the PLS model. This is applied in the form weight^power, and increasing this, increases the relative weighting of the more heavily weighted measurements. Values in the range 1 - 2.5 are sometimes helpful.
...: additional arguments, typically ignored or passed on to nls.
pls: A rsp_pls_x output, intended for use with pls_ functions.
species: for pls_fit_species, a data.frame of measurements of an additional species to be fitted to an existing PLS model, or for pls_refit_species a character vector of the names of species already included in the model to be refit. Both are multiple-species wrappers for pls_rebuild, a general-purpose PLS fitter than only handles single species.
refit.profile: (for pls_fit_species, pls_refit_species and pls_rebuild) logical. When fitting a new species (or refitted an existing species), all other species in the reference profiles are held 'as is' and added species is fit to the source contribution time-series of the previous PLS model. By default, the full PLS model is then refit using the revised m source profile to generate a PLS model based on the revised source profiles (i.e., m + new species or m + refit species). However, this second step can be omitted using refit.profile=FALSE if you want to use the supplied species as an indicator rather than a standard member of the apportionment model.
as.marker: for pls_rebuild, pls_fit_species and pls_refit_species, logical, default FALSE, when fitting (or refitting) a species, treat it as source marker.
drop.missing: for pls_rebuild, pls_fit_species and pls_refit_species, logical, default FALSE, when building or rebuilding a PLS model, discard cases where species is missing.

Value

rsp_pls_x returns a list of nls models, one per profile/measurement set in x. The pls_ functions work with these outputs. pls_report generates a data.frame of model outputs, and is used of several of the other pls_ functions. pls_fit_species, pls_refit_species and pls_fit_parent return the supplied rsp_pls_profile output, updated on the basis of the pls_ function action. pls_plots (documented separately) produce various plots commonly used in source apportionment studies.

Note

This implementation of PLS applies the following modeling constraints:

1. It generates a model of x that is positively constrained linear product of the profiles in m, so outputs can only be zero or more. Although the model is generated using nls, which is a Nonlinear Least Squares (NLS) model, the fitting term applied in this case is linear.

2. The model is fit in the form:

\(X_{i,j} = \sum\limits_{k=1}^{K}{N_{i,k} * M_{k,j} + e_{i,j}}\)

Where X is the data set of measurements, input x in rsp_pls_x, M (m) is data set of reference profiles, and N is the data set of source contributions, the source apportion solution, to be solved by minimising e, the error terms.

3. The number of species in x must be more than the number of profiles in m to reduce the likelihood of over-fitting.