This website contains other cold fusion items.
Click to see the list of links

257) Questioning a statistical protocol

Ludwik Kowalski (9/13/05)
Department of Mathematical Sciences
Montclair State University, Upper Montclair, NJ, 07043



Units 240 and 241 were devoted to a paper of Robert Bass. Two days ago he sent me a new version of that paper; that version will appear in the ICCF10 proceedings. I see that the second author, Michael McKubre, was added. I also see my name mentioned at the end; I am glad than my critical comments were useful. That encourages me to revisit the long-forgotten topic and to comment on the new version. Here is the message sent last night to Robert Bass:

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

1) Thanks for sending me Your ICCF10 paper, and for mentioning my name. I read the protocol and the description was clear to me. In the example of 5 data points (plus blank that is always 0,0) you found that rho was between 1/3 and 1/2. The 95.4% level of confidence was assigned to this experiment (to this set of 5 data points) on that basis.

I am familiar with the concept of level of confidence when samples of limited size are drawn from large populations. For example, suppose that heights of 100 of randomly selected students, in a large university, were measured and the average was 167 cm. A statistician might say that, at the level of 68% confidence, the true average height, for the entire population, is between 163 and 171 cm. Or she might say that, at the level of 95% confidence, the true average height is between 159 and 175 cm. To validate the protocol I could, at least in principle, collect 1000 random samples (from the same population) and count how many samples are consistent with a prediction. For the 68% prediction to be valid, about 680 random samples must yield means between 163 and 171 cm. The protocol (or the assumed randomness of selections) would be shown to be wrong, for example, if only 250 mean values were in the predicted range.

But your 95.4% level of confidence applies to the statement that "the relation is linear." It does not apply to a number that is confined to a specified range. Suppose the experiment is repeated 1000 times (producing 1000 sets of five slightly different data points). What should happen to confirm your statement? What should happen to contradict it? In other words, what does the 95.4% level of confidence refer to?

P.S.
You can DEFINE a rectangle whose sides are 2 cm and 3 cm. Then you can say that the area is 6 cm^2, with 100% confidence. Or, after MEASURING two sides, you can say that the area does not differ from 6 cm^2 by more that 1%, at the level of 90% confidence. But saying that the MEASURED area is 6 cm^2, at the 90% confidence, for example, is meaningless. Why is it so? Because it can not be verified by performing a finite number of experiments. Suppose the mean value, after 1000 measurements, becomes 6.09. That would contradict the 1% range. But the range was not specified in the last statement. Therefore results like 6.09 or 5.9999 neither confirm nor contradict it. As you know, statements that are not falsifiable are not acceptable in experimental science. That is why, I am questioning the statement about linearity, in your modified ICCF10 paper.

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

2) The reply will be appended here, after it arrives. For the time being I will assume a range of values about which the confidence level was 95.4% will be specified. This will offer me a possibility to test the protocol by simulating experimental data with a Monte Carlo code.

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

3) Bass and McKubre wrote:
“It is universally accepted, even by nonscientists, that if the measured output from a physical system is double, triple, or quadruple that obtained when the measured stimulus/input is doubled, tripled, or quadrupled then there is a ”cause and effect relationship” between the input and output (e.g. total energy input versus excess energy [or nuclear ash] output in a cold fusion experiment).”

Deciding what is stimulus/input for a given output is not always easy. Two quantities measured in a given experiment my be proportional to each other without being part of a “cause and effect relation.” Suppose that temperature and pressure of atmospheric air are measured for several hours. A correlation, between p and T, even linear, is discovered. Which variable is the input and which variable is the output? It is not at all obvious that one variable affects the other; both might be influenced by something else. The same is true in the case of burning fuel. Is the energy released caused by ash or is accumulated ash caused by released energy? Both are byproducts of a complex process. The same can be said about the weight and height of a growing child. I think that the reference to the “cause and effect” is totally unnecessary; the main purpose of the paper is to offer a tool for testing linearity between two experimentally measure quantities.

4) The simplest, well known, approach consists of plotting the data points with error bars. Then one can see how a straight line fits the data. Also well known is the linear regression analysis of a scatter plot. It offers the “best possible” straight line and the correlation coefficient. That coefficient gives a general idea about how the actual data points are scattered on both sides of the best line. The authors of the paper, however, want to go beyond this. They want the “level of confidence.” I already wrote that this is not possible unless a region in which the straight line is expected to be located is also specified. Without this their rho parameter is just another useful indicator of the “goodness of fit.”

5) But can experimental data in the area of CMNS be processed in the same way as in an area in which experimental data are reproducible? In my opinion the CMNS field is not ready for the “level of confidence” analysis. But I will ignore this fact and try to evaluate the protocol presented by Bass and McKubre. That protocol is straight forward. Given N-1 data points, and an additional (0,0) point, called “blank,” one calculates a single parameter, rho (Greek symbol in the paper) and uses it to determine the level of confidence. More specifically, the level of confidence is 68.3%, 95.4% and 99.7% when rho is less than 1, less than 1/2 and less than 1/3, respectively. It would be useful to have a table in which the range of levels of confidence were broader (for example, 50% to 100%) and steps in rho were smaller (for example 0.05). What surprizes me is that the outcome does not depend on the error bars assigned to individual data points. To test the protocol I will use the following set of six data points:

(0,0), (1,3), (2,4), (3.6), (4,11.9) and (5,13.8).

For these points, according to Figure 1, rho is between 1/3 and 1/2 and the level of confidence is supposed to be 95.4%. I will take this statement for granted and use the Monte Carlo method to determine its validity. In doing this I will assume that the first variable, in each pair, has a negligible random error. The second variable, however, in each data point, will be assigned an error bar, as explained below. The confidence level of 95.4% implies that in 10000 Monte Carlo trials (based on the above data points) at least nearly 9540 should produce regression lines confined to the specified region. Agreements with expectations will be labeled as YES. For example, if YES is 3500 then only 35% of experiments produce regression lines in the specified region.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How many YES out of 10,000 ?
Case 1: vertical error bars (sigma = 0.5%) . . . . . . . . . . >
Case 2: vertical error bars (sigma = 5%) . . . . . . . . . . . .>
Case 3: vertical error bars (sigma = 10% . . . . . . . . . . . >
Case 4: vertical error bars (sigma = 20% . . . . . . . . . . . >

Added on 9/17/05:
6) My Monte Carlo code is ready but Robert did not reply. Perhaps he is away, perhaps his computer is not working. I hope he is not sick. I suspect the value of rho is linked, somehow, with the “width” of the straight line; smaller values of rho refer to wider lines. I can not test his protocol without knowing how to distinguish YES from NO. But I can do the following (just to illustrate the working of my code). Let me arbitrarily assume that the calculated rho defines the width of the region as 5% of the mean expected value. The expected values are those found from linear regression fits. If the regression line, based on a set of six simulated data points is confined to the expected region then YES is incremented. Otherwise it is not incremented.


With such artificial choice the results were as follows:

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How many YES out of 10,000 ?
Case 1: vertical error bars (sigma = 0.5%) . . . . . . . . . . > 10,000
Case 2: vertical error bars (sigma = 5%) . . . . . . . . . . . .> 2,958
Case 3: vertical error bars (sigma = 10% . . . . . . . . . . . > 335
Case 4: vertical error bars (sigma = 20% . . . . . . . . . . . > 28

The fact that the vertical error bar has a decisive influence on the number of YESes is not surprising. This is likely to happen no matter what criterium is used to identify YESes. Validity of the protocol will always depend on vertical “width” of the regression lines (region of confinement) and on the vertical error bars, as intuitively expected. The weakness of the protocol is that it does not take into account bars of error and that it does not define what is and what is not a linear relation between two experimentally measured quantities.

This website contains other cold fusion items.
Click to see the list of links