rsp_match_profile compares a supplied respeciate
profile (or similar data set) and a reference set of supplied profiles
and attempts to identify nearest matches on the
basis of similarity.
rsp_match_profile(
rsp,
ref,
matches = 10,
rescale = 5,
min.n = NULL,
method = "sid * srd",
self.test = FALSE,
...,
output = "summary"
)A respeciate object or similar data.frame containing
a species profile to be compared with profiles in ref. If rsp
contains more than one profile, these are averaged (using
rsp_average_profile), and the average compared.
A respeciate object, a data.frame containing a
multiple species profiles, to be used as reference library when identifying
nearest matches for rsp.
Numeric (default 10), the maximum number of profile matches to report.
Numeric (default 5), the data scaling method to apply before
comparing rsp and profiles in ref: options 0 to 5 handled by
rsp_rescale.
Numeric (or NULL), the minimum number of paired species
measurements required for a match to be assessed. The larger min.n,
the greater the required rsp and ref profile overlap, so the
better the matching confidence for paired cases but also the more likely
that a sparse but relevant ref profile may be missing. The default
option, NULL, is 65% of the number of species in rsp or 6
if larger.
Character (default 'sid * srd'), the ranking metric used to
rank profile matches. The function calculates several matching metrics:
'pd', the Pearson's Distance (1 - Pearson's correlation coefficient),
'srd', like pd but using the Spearman Ranked data correlation coefficient,
and 'sid', the Standardized Identity Distance (See References). All the
metrics tend to zero for better matches, and the method can be
any character string that can be evaluated from any of these, e.g.,
'pd', 'srd', 'sid', and combinations thereof.
Logical (default FALSE). The match process self-tests by adding
rsp to ref, which should generate an ideal (nearness = 0) score.
Setting self.test to TRUE retains this as an extra record.
Additional arguments, typically ignore but sometimes used for
function development. Currently, testing rm.reps (logical) option to
remove what appear to be replicate profile matches from the result set. This
is based on the assumption that identical 'pd' and 'sid' scores identical
identical ref profiles (or identical overlaps with rsp) but is
not validated, so handle with care...
Character, output options, including: 'summary' (the
default) a data.frame of the requested best matches, ranked
according to the method used; 'data' the full data set used
to make plots; 'plot' the associated output from
rsp_plot_match; or, a combination of these.
By default rsp_match_profile returns a fit report summary: a
data.frame of up to matches fit reports for the nearest
matches to profiles from the reference profile data set, ref. (See
also output above for other options). If several options are requested,
earlier options are report (e.g. using print or plot) and only
the final option is returned.
Distance metrics are based on recommendations by Belis et al (2015) and as implemented in Mooibroek et al (2022):
Belis, C.A., Pernigotti, D., Karagulian, F., Pirovano, G., Larsen, B.R., Gerboles, M., Hopke, P.K., 2015. A new methodology to assess the performance and uncertainty of source apportionment models in intercomparison exercises. Atmospheric Environment, 119, 35–44. https://doi.org/10.1016/j.atmosenv.2015.08.002.
Mooibroek, D., Sofowote, U.M. and Hopke, P.K., 2022. Source apportionment of ambient PM10 collected at three sites in an urban-industrial area with multi-time resolution factor analyses. Science of The Total Environment, 850, p.157981. http://dx.doi.org/10.1016/j.scitotenv.2022.157981.