Detection of a Single Outlier|Statistical Analysis|Quantitative Data

A problem often encountered while doing replicate measurements of a physical or chemical quantity  is that of determining whether an outlying result is far enough away from the rest of the data to justify discarding it. In this case there is always a tendency to eliminate those outlying results and not to include them in any further calculations based on "good judgment" and "common sense.
While "good judgment" and "common sense" are valuable tools in interpreting results in quantitative analysis the rejection of possible outlying data must be based on objective criteria - statistical treatment of data.
 
Are there any simple statistical tests for rejecting outliers in quantitative data?    

There are several simple tests that we can use to handle suspect values and to identify them as outliers at particular confidence intervals such as:

  • The Dixon's Q-test (for a single outlier, data normally distributed, small data sets)
  • Grubb'sTest (for a single test, small data sets, data nearly normally distributed)
  • Tietjen-Moore Test (generalization of the Grubb's Test to the case of more than one outlier,
    the number of outliers must be specified exactly)
  • Huber's method (for multiple outliers, data roughly normally distributed)
 All these tests have strong points and limitations and therefore must be used judiciously.



The most commonly used statistical test for identifying outliers is Dixon’s Q-test. The Q-test compares the difference between the suspected outlier and its nearest numerical neighbor to the range of the entire data set.
 
How the Q-test is applied?

The test is very simple and it applied as follows:


  • Order the N data values comprising the set of observations under examination in increasing order:

x1 <x2 < x3< xN


  • Calculate the experimental Q (Qexp). Qexp is defined as follows:

 Qexp = |(suspect value – nearest neighbor) / (largest value – smallest value)|     (1)


  • The value of Qexp is compared with a critical value of Qcritical found in tables. The critical value should correspond to the confidence level we have decided to run the test (usually 95% confidence).

 If  Qexp >  Qcritical  then the suspect value is an outlier and it can be rejected.

A table containing Qcritical for different confidence levels (90%, 95%, 96%, 98%,  99%) and number of data N (3-10) is given below:
 
Table I.1: Critical values of Q-test1


N
Qcritical (90%)**
Qcritical
(95%)**
Qcritical
(96%)**
Qcritical
(98%)**
Qcritical
(99%)**
3
0.941
0.970
0.976
0.988
0.994
4
0.765
0.829
0.846
0.889
0.926
5
0.642
0.710
0.729
0.780
0.821
6
0.560
0.625
0.644
0.698
0.740
7
0.507
0.568
0.586
0.637
0.680
8
0.468
0.526
0.543
0.590
0.634
9
0.437
0.493
0.510
0.555
0.598
10
0.412
0.466
0.483
0.527
0.568
**  The percentage expresses the confidence level

Are there any limitations to Dixon’s Q-test?

1. The data excluding  the possible outlier must be normally distributed
2.  The Q-test is valid for the detection of a single outlier (it cannot be used for a second time on the same set of data). Other forms of Dixon’s Q-test can be applied to the detection of multiple outliers2.
3.  The Q-test should be applied with caution – the same applies to all statistical tests used for rejecting data - since there is a probability, equal to the significance level a (a =0.05 at the 95% confidence level) that an outlier identified by the Q-test actually is not an outlier.

A typical example with a possible outlier value was given in a previous post entitled “Calibration and Outliers - Statistical Analysis”.

Can we reject the 0.6400 value at a 95% confidence level (please see Table I.1 in “Calibration and Outliers - Statistical Analysis)  as an outlier using Dixon’s Q-test?

By following the above procedure we get the following:

1.      The data excluding  the possible outlier are almost normally distributed as shown in    Fig. 1b in “Calibration and Outliers - Statistical Analysis
2.      Arrange the data under examination in increasing order:

0.5980  0.5993  0.5995  0.5997  0.601  0.6400

Calculate Qexp using equation (1):

Qexp = |(suspect value – nearest neighbor) / (largest value – smallest value)|  =
       = |(0.6400 – 0.601) / (0.6400 – 0.5980)| = 0.9285

Compare with the critical value of Qcritical found in table I.1 at the 95% confidence level and for N = 6 observations. This value is equal to Qcritical = 0.625.

 Qexp = 0.9285 > Qcritical = 0.625 and therefore we can reject 0.6400 at the 95% confidence level being certain that there is a probability a < 0.05 that our decision is false.

An Applet for doing Q-test calculations is given on the University of Athen’s Department of Chemistry website.


 References

1.      D. Harvey,  “Modern Analytical Chemistry”, McGraw-Hill Companies Inc., 2000
2.      D. B. Rorabacher,  Anal. Chem., 63, 139–146, (1991)
3.      R.D. Brown, “Introduction to Chemical Analysis”, McGraw-Hill Companies Inc.,
            1982


Hiç yorum yok:

Yorum Gönder