The Chain Is Only as Strong as its Weakest Link
A telephony system is a complex collection of components. Each of these components has a failure rate associated with it, and as a result, the reliability of the whole system can never be higher than the lowest reliability of any of the individual components. Furthermore, we can use the product of the reliability metrics for each component to give us a rough estimate for the reliability of the whole system.
For example, consider the hypothetical system at ACME Anvil Company shown in the table below with associated reliability metrics:
Component |
Estimated Reliability |
Phone Lines from Phone Company |
99.97% |
Office PBX System |
99.98% |
Power Supply to Office and Phone System |
99.6% |
Voice Mail Server with Uninterruptible Power Supply (UPS) |
99.3% |
If any one component fails, the whole system becomes non-functional. Therefore, ACME Anvil's system cannot achieve anything higher than 99.3% reliability because the voice mail server only works 99.3% of the time. For the system as a whole, the estimated reliablity is the product of the reliability of the four components: 99.97% x 99.98% x 99.6% x 99.3% = 98.8%! Surprised? This multiplicative drop in reliability is the result of multiple components that each has its own failure rate.
Developing a Realistic Targets for the Whole System
Some applications are such that Four or Five Nines reliablity are part of the application requirements. For example, it may be safe to assume that a nuclear reactor requires 99.99% or better reliability, regardless of cost. Since Active Call Center is not licensed for control of nuclear reactors and similar applications, these applications are not discussed further here.
Most applications have Two to Three Nines reliability requirements. The usual objective in these cases is to achieve the reliability target by maximizing reliability within the available spending budget.
In formulating reliability targets, it helps to compile a table of each of the system's components and it's associated estimated reliability metric. This information can help to quickly analyze reliability for the whole system as demonstrated below.
A table of reliability rates for each component quickly reveals reliability levels above which it will be prohibitively expensive to implement a system. Recall that in the earlier ACME Anvil example, the phone lines from the phone company were only 99.97% reliable. This means that to get anything better than 99.97% reliability, special service will be required from the phone company. Presumably these services would be very expensive, and so we might inform ACME Anvil's management that anything more than 99.97% is out of their budget.
The table of reliability metrics for ACME Anvil's system also helps identify where budget dollars can be best spent to gain maximum increases in overall system reliability. Remember that the two weakest components of ACME's system are the power supply to the phone system and the voice mail server. It's very easy to boost the reliability of the power supply by adding a UPS battery backup with several hours of battery capacity. Let's assume we could get such a battery unit for $500 and that it would boost the power supply's reliability to 99.98%. The new reliability metric for the whole system would then be 99.97% x 99.98% x 99.98% x 99.3% = 99.2%. That's a significant increase in reliability for only spending $500 (from the previous 98.8% level)!
Let's suppose that ACME Anvil's CEO has informed us that they would like to have 99.5% reliability. The simple analysis performed to this point has revealed several important findings: (a) The system can easily be brought up to 99.2% reliability by adding a battery backup UPS, (b) With three of the four components after the upgrade at reliability rates of 99.97% or higher, the only other place to improve reliability for the system is at the voice mail server. Hmmm... sounds like a job for Active Call Center!
Apply the Analysis to an Appropriate Level of Detail
The basic strategy presented above is as follows:
This same general analysis can be applied to each component at very fine levels of detail. The ACME Anvil example above did not include much detail, but a more detailed analysis could easily be done. For example, we might analyze reliability of the voice mail server and find:
Server Component |
Estimated Reliability |
Processor |
99.999% |
RAM |
99.999% |
Hard Drive |
99.8% |
Software |
99.5% |
Overall (product of component metrics) |
99.3% |
Based on this analysis, management might decide to install a RAID hard drive array to increase the hard drive reliability up to 99.99% and thus increase overall server reliability to 99.4%. It's also apparent that the software's reliability is going to have to improve to reach management's 99.5% target.
In this example, the extra detail has helped narrow down the focus of the reliability improvement task to the voice mail server's software.