statistics

This website contains other cold fusion items.
Click to see the list of links

276) Bars of errors

Ludwik Kowalski; 2/3/2006
Department of Mathematical Sciences
Montclair State University, Upper Montclair, NJ, 07043

About three days ago I posted the following message on the restricted list for CMNS researchers:

In reading it again I decided to add a short tutorial. How can a retired teacher resist an opportunity to share what he learned recently. First something well known then something that I was not aware of. I will use a Geiger counter as an illustration.

1) Suppose that the number of counts, from a single experiment, is N = 100. We then say that the expected distribution of N (if the experiment could be repeated many times) is Gaussian whose standard deviation is the sqr(100)=10. We can say, at the level of 68% confidence, that the true value is between 90 and 110 (mean plus or minus one sigma). And at the level of 90% confidence the true value is between 80 and 120 (mean plus or minus two sigmas).

2) But this approach is valid only when N are not too small. How to establish the range of values (bar of expected errors) corresponding to the 90% level of confidence when N is smaller than 10? By using the following table:

   90% confidence
 0    0.00-2.44
 1    0.11-4.36
 2    0.53-5.91
 3    1.10-7.42
 4    1.47-8.00
 5    1.84-9.99
 6    2.21-11.47
 7    3.56-12.53
 8    3.96-13.99
 9    4.36-15.30
10    5.50-16.50

If N=4 then we can say, "at the confidence level of 90%, the true value is between 1.47 and 8.00." Likewise, if N=9 then we can say, "at the confidence level of 90%, that the true value is between 4.36 and 15.30. That not exactly what we would say if the error bar range was +/- 2*sqr(N). For N=4, for example, the range of values would be 0 to 6 instead and not 1.47 to 8. For N=9, the range of values would be 3 to 15 and not 4.36 t0 15.30. The differences become less and less significant as N becomes larger. I did not penetrate the mathematical derivation but I accept the rule because I know that a Poisson distribution of random counts become practically indistinguishable from the symmetric Gaussian distribution for a large N. A similar table was constructed by statisticians for the 95% level of confidence, as shown below.

    95% confidence
 0    0.00-3.09
 1    0.05-5.14
 2    0.36-6.72
 3    0.82-8.25
 4    1.37-9.76
 5    1.84-11.26
 6    2.21-12.75
 7    2.58-13.81
 8    2.94-15.29
 9    4.36-16.77
10    4.75-17.82

I think that my illustration on how to use these two tables is correct. But I am not sure. Please correct me, if necessary. The document with mathematical derivations, from which these two tables were taken, begins as follows: "Revised April 1998 by F. James (CERN); February 2000 by R. Cousins (UCLA); October 2001 and October 2003 by G. Cowan (RHUL)." The footnote of the first page states "CITATION: S. Eidelman et al., Physics Letters B592, 1 (2004) available on the PDG WWW pages (URL http://pdg.lbl.gov/) June 17, 2004, 10:56"

3) It should be obvious to most of you that I am addressing the issue of precision (random errors) and not the issue of accuracy (systematic errors). The issue of systematic errors has to do with the concept of reproducibility.

For small N the bar of error becomes larger above the plotted point than below the point. What about the way of establishing bars of errors when the background is subtracted. The estimated net result, R, is always calculated as

R = A - B

where A is the apparent value and B is the background value. For A=100 and B=36, for example, R=64. We want to say, for example, that, “at the confidence level of 95%, the true value is between x1 and x2.” How to calculate x1 and x2? The well known rule, when A and B are not too small, tells us x1 and x2 should differ from R by two standard deviations. In this illustration, x1=64-2*11.66 = 40.68 and x2=64+2*11.66 = 87.32. That is because the standard deviation of R is the square root of A+B.

But how should x1 and x2 be determined, for a chosen level of confidence, when A and B are very small numbers, for example, 9 and 4? I know that distributions of A and B, for small mean values, are Poissonian rather than Gaussian. But I do not know how to turn this into a practical rule for calculating x1 and x2.

This website contains other cold fusion items.
Click to see the list of links