rsp_match_profile
compares a supplied respeciate
profile (or similar data set) and a reference set of supplied profiles
and attempts to identify nearest matches on the
basis of similarity.
rsp_match_profile(
rsp,
ref,
matches = 10,
rescale = 5,
min.n = NULL,
method = "sid * srd",
self.test = FALSE,
...,
output = "summary"
)
A respeciate
object or similar data.frame
containing
a species profile to be compared with profiles in ref
. If rsp
contains more than one profile, these are averaged (using
rsp_average_profile
), and the average compared.
A respeciate
object, a data.frame
containing a
multiple species profiles, to be used as reference library when identifying
nearest matches for rsp
.
Numeric (default 10), the maximum number of profile matches to report.
Numeric (default 5), the data scaling method to apply before
comparing rsp
and profiles in ref
: options 0 to 5 handled by
rsp_rescale
.
Numeric (or NULL
), the minimum number of paired species
measurements required for a match to be assessed. The larger min.n
,
the greater the required rsp
and ref
profile overlap, so the
better the matching confidence for paired cases but also the more likely
that a sparse but relevant ref
profile may be missing. The default
option, NULL
, is 65% of the number of species in rsp
or 6
if larger.
Character (default 'sid * srd'), the ranking metric used to
rank profile matches. The function calculates several matching metrics:
'pd', the Pearson's Distance (1 - Pearson's correlation coefficient),
'srd', like pd but using the Spearman Ranked data correlation coefficient,
and 'sid', the Standardized Identity Distance (See References). All the
metrics tend to zero for better matches, and the method
can be
any character string that can be evaluated from any of these, e.g.,
'pd'
, 'srd'
, 'sid'
, and combinations thereof.
Logical (default FALSE). The match process self-tests by adding
rsp
to ref
, which should generate an ideal (nearness = 0) score.
Setting self.test
to TRUE
retains this as an extra record.
Additional arguments, typically ignore but sometimes used for
function development. Currently, testing rm.reps
(logical) option to
remove what appear to be replicate profile matches from the result set. This
is based on the assumption that identical 'pd' and 'sid' scores identical
identical ref
profiles (or identical overlaps with rsp
) but is
not validated, so handle with care...
Character, output options, including: 'summary'
(the
default) a data.frame
of the requested best matches
, ranked
according to the method
used; 'data'
the full data set used
to make plots; 'plot'
the associated output from
rsp_plot_match
; or, a combination of these.
By default rsp_match_profile
returns a fit report summary: a
data.frame
of up to matches
fit reports for the nearest
matches to profiles from the reference profile data set, ref
. (See
also output
above for other options). If several options are requested,
earlier options are report (e.g. using print
or plot
) and only
the final option is returned.
Distance metrics are based on recommendations by Belis et al (2015) and as implemented in Mooibroek et al (2022):
Belis, C.A., Pernigotti, D., Karagulian, F., Pirovano, G., Larsen, B.R., Gerboles, M., Hopke, P.K., 2015. A new methodology to assess the performance and uncertainty of source apportionment models in intercomparison exercises. Atmospheric Environment, 119, 35–44. https://doi.org/10.1016/j.atmosenv.2015.08.002.
Mooibroek, D., Sofowote, U.M. and Hopke, P.K., 2022. Source apportionment of ambient PM10 collected at three sites in an urban-industrial area with multi-time resolution factor analyses. Science of The Total Environment, 850, p.157981. http://dx.doi.org/10.1016/j.scitotenv.2022.157981.