Eficiencia relativa de 15 pruebas de discordancia con 33 variantes aplicadas al procesamiento de datos geoquímicos
Keywords: reference material, outlier-based methods, discordancy tests, Dixon tests, Grubbs test, skewness, kurtosis, critical values, significance tests
AbstractDiscordancy tests provide us with a statistical tool that is useful in different areas of science and engineering, including Earth Sciences. Their application represents a rigorous methodology for the detection and elimination of discordant outliers in statistically contaminated normal samples and provides us remaining data without any statistical contamination, which can then be used to estimate the central tendency (mean) and dispersion (standard deviation) parameters. For the empirical evaluation of 15 discordancy tests with 33 variants, an extensive database of 35 reference materials (RM) from four countries (Canada, U.S.A., Japan, and South Africa) having 2220 applicable cases with 41,821 individual geochemical data, was established. Nine single-outlier tests with 13 variants and seven multiple-outlier tests with 20 variants (test N4 belongs to both types) along with the new, most precise and accurate critical values, were employed for this evaluation. Two statistical parameters quantified the efficiency of discordancy tests: (1) Relative efficiency criterion (REC) known from previous work; and (2) relative outlier criterion (ROC) proposed in this work. Additionally, a methodology was used that combines linear regression analysis with Fisher F and Student t significance tests. Among the single-outlier discordancy tests, the greatest efficiency was shown by kurtosis test (N15), followed by Grubbs type tests (N1 and N4) and skewness test (N14), whereas, among multiple-outlier tests, the Grubbs test N4 in its three variants seemed to be characterized by the greatest efficiency values. The Dixon tests, being much more popular than the Grubbs tests, in general presented the smallest efficiencies. One important implication of these results would be to prefer N15, N1, N4, and N14 tests for the application of this outlier-based methodology for geochemical data handling. The quantitative interpretation using the combined methodology of linear regressions and significance tests confirms the results of REC and ROC parameters. Finally, it is inferred that independently of the analytical methods used for the determination of geochemical composition of reference materials, upper discordant outliers are much more common than the lower ones, and samples with a symmetrical statistical contamination on both sides of the sample are relatively scarce. Robust estimates, such as the median or Gastwirth mean, are likely to be biased for such geochemical data. The application of discordancy tests before estimating the mean and standard deviation values is a basic requirement.