Find nearest matches from reference set of profiles

rsp_match_profile compares a supplied respeciate profile (or similar data set) and a reference set of supplied profiles and attempts to identify nearest matches on the basis of similarity.

rsp_match_profile(
  rsp,
  ref,
  matches = 10,
  rescale = 5,
  min.n = NULL,
  method = "sid * srd",
  self.test = FALSE,
  ...,
  output = "summary"
)

Arguments

rsp: A respeciate object or similar data.frame containing a species profile to be compared with profiles in ref. If rsp contains more than one profile, these are averaged (using rsp_average_profile), and the average compared.
ref: A respeciate object, a data.frame containing a multiple species profiles, to be used as reference library when identifying nearest matches for rsp.
matches: Numeric (default 10), the maximum number of profile matches to report.
rescale: Numeric (default 5), the data scaling method to apply before comparing rsp and profiles in ref: options 0 to 5 handled by rsp_rescale.
min.n: Numeric (or NULL), the minimum number of paired species measurements required for a match to be assessed. The larger min.n, the greater the required rsp and ref profile overlap, so the better the matching confidence for paired cases but also the more likely that a sparse but relevant ref profile may be missing. The default option, NULL, is 65% of the number of species in rsp or 6 if larger.
method: Character (default 'sid * srd'), the ranking metric used to rank profile matches. The function calculates several matching metrics: 'pd', the Pearson's Distance (1 - Pearson's correlation coefficient), 'srd', like pd but using the Spearman Ranked data correlation coefficient, and 'sid', the Standardized Identity Distance (See References). All the metrics tend to zero for better matches, and the method can be any character string that can be evaluated from any of these, e.g., 'pd', 'srd', 'sid', and combinations thereof.
self.test: Logical (default FALSE). The match process self-tests by adding rsp to ref, which should generate an ideal (nearness = 0) score. Setting self.test to TRUE retains this as an extra record.
...: Additional arguments, typically ignore but sometimes used for function development. Currently, testing rm.reps (logical) option to remove what appear to be replicate profile matches from the result set. This is based on the assumption that identical 'pd' and 'sid' scores identical identical ref profiles (or identical overlaps with rsp) but is not validated, so handle with care...
output: Character, output options, including: 'summary' (the default) a data.frame of the requested best matches, ranked according to the method used; 'data' the full data set used to make plots; 'plot' the associated output from rsp_plot_match; or, a combination of these.

Value

By default rsp_match_profile returns a fit report summary: a data.frame of up to matches fit reports for the nearest matches to profiles from the reference profile data set, ref. (See also output above for other options). If several options are requested, earlier options are report (e.g. using print or plot) and only the final option is returned.

References

Distance metrics are based on recommendations by Belis et al (2015) and as implemented in Mooibroek et al (2022):

Belis, C.A., Pernigotti, D., Karagulian, F., Pirovano, G., Larsen, B.R., Gerboles, M., Hopke, P.K., 2015. A new methodology to assess the performance and uncertainty of source apportionment models in intercomparison exercises. Atmospheric Environment, 119, 35–44. https://doi.org/10.1016/j.atmosenv.2015.08.002.

Mooibroek, D., Sofowote, U.M. and Hopke, P.K., 2022. Source apportionment of ambient PM10 collected at three sites in an urban-industrial area with multi-time resolution factor analyses. Science of The Total Environment, 850, p.157981. http://dx.doi.org/10.1016/j.scitotenv.2022.157981.