Analytical System Administration
Adding up Independent Causes
Suppose we want to measure the value of a quantity v whose value has been altered by a series of independent random changes or perturbations By how much does that series of perturbations alter the value of v Our first instinct might be to add up the perturbations to get the total: Actual deviation
This estimate is not useful, however, because we do not usually know the exact values of we can only guess them. In other words, we are working with a set of guesses Ag,-, whose sign we do not know. Moreover, we do not know the signs of the perturbations, so we do not know whether they add or cancel each other out. In short, we are not in a position to know the actual value of the deviation from the true value. Instead, we have to estimate the limits of the possible deviation from the true value v. To do this, we add the perturbations together as though they were independent vectors. Independent influences are added together using Pythagoras theorem, because they are independent vectors. This is easy to understand geometrically. If we think of each change as being independent, then one perturbation Ag1 cannot affect the value of another perturbation But the only way that it is possible to have two changes which do not have any effect on one another is if they are movements at right angles to one another, i.e. they are orthogonal. Another way of saying this is that the independent changes are like the coordinates x,y,z,...of a point which is at a distance from the origin in some set of coordinate axes. The total distance of the point from the origin is, by Pythagoras theorem,
The formula we are looking for, for any number of independent changes, is just the TV-dimensional generalization of this, usually written a:
This tells us the distance d, or a by which we can expect the value we are trying to measure to have changed. It does not tell us the sign of the change, so all we can now say is that the true value could be in the range v a. To summarize, independent changes in a quantity are like Cartesian coordinates for a vector in an N-dimensional space. 11.8.3 The Mean and Standard Deviation
In the theory of errors, we use the ideas above to define two quantities for a set of data: the mean and the standard deviation. Now the situation is reversed: we have made a number of observations of values ... which have a certain scatter, and we are trying to find out the actual value v. Assuming that there are no systematic errors, i.e. assuming that all of the
Observational Errors
deviations have independent random causes, we define the value v to be the arithmetic mean of the data:
Next we treat the deviations of the actual measurements as our guesses for the error in the measurements:
Agw = V-VN
and define the standard deviation of the data by
<7 =
This is clearly a measure of the scatter in the data due to random influences, a is the Root Mean Square (RMS) of the assumed errors. These definitions are a way of interpreting measurements, from the assumption that one really is measuring the true value, affected by random interference. An example of the use of standard deviation can be seen in the error bars of the figures in this chapter. Whenever one quotes an average value, the number of data and the standard deviation should also be quoted in order to give meaning to the value. In system administration, one is interested in the average values of any system metric which fluctuates with time. 11.8.4 The Normal Error Distribution
It has been stated that 'Everyone believes in the exponential law of errors; the experimenters because they think it can be proved by mathematics; and the mathematicians because they believe it has been established by observation' [277]. Some observational data in science closely satisfy the normal law of error, but this is by no means universally true. The main purpose of the normal error law is to provide an adequate idealization of error treatment which is simple to deal with, and which becomes increasingly accurate with the size of the data sample. The normal distribution was first derived by DeMoivre in 1733, while dealing with problems involving the tossing of coins; the law of errors was deduced theoretically in 1783 by Laplace. He started with the assumption that the total error in an observation was the sum of a large number of independent deviations, which could be either positive or negative with equal probability, and could therefore be added according to the rule explained in the previous sections. Subsequently, Gauss gave a proof of the error law based on the postulate that the most probable value of any number of equally good
