Analytical System Administration
Figure 11.8 The weekly average of maximal CPU percentage does appear to indicate the daily rhythm and shows how the huge variances over certain times near the beginning and end of the graph have probably obscured the signal in the daily graph. This indicates the way in which long term measurements must be combined with averages detect the true behaviour and thus be able to detect anomalies
Disk usage rise per session per user per hour: the average amount of increase of disk space per user per session, indicates the way in which the system is becoming loaded. This can be used to diagnose problems caused by a single user downloading a huge amount of data from the network. During normal behaviour, if users have an even productivity, this might be periodic.
Weekly period Daily period Average type Expected entropy Undetermined Undetermined Continuous Low
Latency of services: the latency is the amount of time we wait for an answer to a specific request. This value only becomes significant when the system passes a certain threshold (a kind of phase transition). Once latency begins to restrict the practices of users, we can expect it to feed back and exacerbate latencies. Thus the periodicity of latencies would only be expected in a phase of the system in which user activity was in competition with the cause of the latency itself.
Deterministic and Stochastic Behaviour
Weekly period Daily period Average type Expected entropy
Strong above threshold Strong above threshold Continuous Undetermined
Part of what one wishes to identify in looking at such variables is patterns of change. These are classifiable but not usually quantifiable. They can be relevant to policy decisions as well as in fine tuning of the parameters of an automatic response. Patterns of behaviour include Social patterns of the users. Systematic patterns caused by software systems.
Identifying such patterns in the variation of the metrics listed above is not an easy task, but it is the closest one can expect to come to a measurable effect in the a system administration context. In addition to measurable quantities, humans have the ability to form value judgments in a way that formal statistical analyses cannot. Human judgement is based on compounded experience and associative thinking, and while it lacks scientific rigour, it can be intuitively correct in a way that is difficult to quantify. The down-side of human perception is that prejudice is also a factor which is difficult to eliminate. Also, not everyone is in a position to offer useful evidence in every judgement: User satisfaction: software, system-availability, personal freedom Sysadmin satisfaction: time-saving, accuracy, simplifying, power, ease of use, utility of tools, security, adaptability.
Other heuristic impressions include the 'amount of dependency of a software component on other software systems, hosts or processes; also, the dependency of a software system on the presence of a human being. Kubicki [157] discusses metrics for measuring customer satisfaction. These involve validated questionnaires, system availability, system response time, availability of tools, failure analysis, and time before reboot measurements.
Deterministic and Stochastic Behaviour
In this section we turn to a more abstract view of a computer system: to think of it as a generalized dynamical system, i.e. a mathematical model which develops according in time, according to certain rules. Abstraction is one of the most valuable assets of the human mind: it enables us to build simple models of complex phenomena, eliminating details which are only of peripheral or dubious importance. But abstraction is a double edged sword: on the one hand, abstracting a problem can show us how that problem is really the same as a lot of other problems which we know more about; conversely, unless done with a certain clarity, it can merely plant a veil of fog over our senses, obscuring rather than assisting the truth. Our aim in this section is to think of computers as abstract dynamical systems, such as those which are routinely analysed in physics and statistical analysis. Although this will not be to every working system admin-
