See also:
Maintenance has changed...
The world of equipment maintenance changed dramatically during the second half of the 20th century and it continues to do so today.
Several major influences have been responsible for driving these changes:
We now have a much better understanding of how equipment behaves, from installation to the point at which it fails.
When engineers were forced to respond to this wave of change, it became clear that traditional maintenance methods were no longer adequate - a new approach to equipment maintenance was required.
The commercial aviation industry was the first to realise that change was necessary and committed significant resources to developing a solution in the 1960s and 1970s.
The results entered the public domain in 1978 under the name "Reliability-centred Maintenance" or "RCM".
Increased expectations...
Looking back to the 1930s, we can divide up the years since then into three “generations”. We can then examine the expectations placed on the maintenance function in each of the three generations as follows:
In industry generally, there is now extreme pressure on the maintenance function to deliver maximum performance at minimum cost.
We can also look back at what was generally understood about the way in which equipment behaved and failed over the same three generations:
This new understanding of equipment failure made the civil aviation industry realise that their existing maintenance regime was flawed and even caused failures, some with catastrophic consequences.
A new approach to equipment maintenance was essential and subsequently developed. The end result was what we now call Reliability-centred Maintenance - RCM.
Having examined how the changing world of maintenance drove the development of RCM, this section describes what it is in some detail and explains.
The developers of RCM took the unusual view (at the time) that the objective of equipment maintenance should be to keep the equipment doing whatever its users want it to do, rather than to prevent failures for the sake of preventing failures.
With this emphasis on preserving what the user wants, Moubray defines RCM as:
A process used to determine what must be done to ensure that any physical asset
continues to do what its users want it to do in its present operating context
It is, therefore, no surprise that determining the operating context and what the user wants the equipment to do is the starting point for the RCM process, which is applied by asking and answering the following seven questions:
Each of these questions is considered in more detail in this and the following sections..
1. What are the functions and associated performance standards of the asset in its present operating context?
Operating Context
In order to answer question 1, it is important to have a clear understanding of the operating context of the equipment being studied. This is because the operating context can influence what should be defined as failure and, therefore, whether a maintenance task is worthwhile.
For example: consider a small diesel engine used to power trains. This engine could be the only engine in a two-car train or it could be one of eight in a much longer train. These are very different operating contexts which will result in two very different views of what constitutes failure.
If a cooling water pump fails, the engine will eventually overheat and its protection will shut it down. On the two-car train this will result in very serious operational consequences because the train will come to a halt mid-journey. On the eight-car train it will result in a loss of 12% of its traction power. The train will continue to its destination with only a minor delay.
Any maintenance task for the cooling water pump that is considered in the RCM analysis will be much more likely to be evaluated as worthwhile on the two-car train than on the eight-car train.
A similar logic will apply to many of the engine’s other failure modes, resulting in two quite different maintenance schedules for the very same engine.
Functions
A function is a statement of what the user of the equipment wants it to do and to what standard of performance. A complete set of functions for a piece of equipment represents the objectives of its maintenance schedule. The user will be very happy if the maintenance schedule keeps the equipment performing its functions!
The function list is the foundations for the remainder of the RCM analysis and the RCM analysis group will use it to deduce exactly what is meant by failure. From this ‘failed state’, the RCM analysis group can list the failure modes that could cause each failed state.
A system’s “primary function” is usually obvious, easy to determine and normally states why the system was purchased in the first place.
However, most systems are expected to perform other “secondary functions” which represent the user’s requirements for environmental and safety integrity, protection, control, economy, appearance, etc.
Functional Failures
2. In what ways does it fail to fulfil its functions?
A system or piece of equipment is said to have ‘failed’ if it is unable to perform its intended function(s) to the desired standard of performance. This includes partial failure (as well as complete failure) where the equipment still functions, but not to an acceptable standard (e.g. it may be operating too slowly or producing poor quality).
By documenting functional failures, the RCM analysis group defines the “failed state” (i.e. exactly what is meant by “failed” and “partially failed”) for the equipment’s operating context.
Failure Modes
3. What causes each functional failure?
A failure mode is any event which is reasonably likely to cause a functional failure. “Any event” is not limited to equipment failures caused by wear and tear or deterioration (sudden or slow), but also includes human error, poor procedures and design issues.
"Reasonably likely” (i.e. credible) failure modes fall into the following broad categories:
However, any unlikely failure modes that have extremely severe consequences would also be considered.
When writing failure modes, it is important to identify the cause of the failure in sufficient detail so that the RCM analysis group can identify appropriate maintenance later in the RCM process (using the RCM maintenance task selection logic).
Insufficient detail may well mean that appropriate maintenance tasks are missed, rendering the analysis ‘superficial’ (and possibly dangerous). On the other hand, if failure modes are identified in too much detail the RCM analysis group could end up wasting time unnecessarily.
Failure Effects
4. What happens when failure occurs?
The RCM analysis group needs to have sufficient information so that they can make robust decisions about how to manage each failure mode.
In particular, the effects of each failure (i.e. what would happen when the failure occurs if nothing was done to prevent it) are required. This information allows the RCM analysis group to answer the questions posed in the RCM decision logic.
The failure effects record the problems (e.g. any undesirable/costly events) that the RCM -derived maintenance schedule is intended to manage (i.e. predict or prevent).
The failure effects should, therefore, contain the following information:
The first 4 questions of the RCM process make up the information gathering phase. The answers to these questions document what the equipment or system should do (functions), how it could fail (functional failures), what causes it to fail (failure modes) and what problems result (failure effects) when it does fail.
This information becomes an excellent equipment reference which can subsequently be used to support a safety case, act as an audit trail, produce a comprehensive fault-finding guide and be the starting point for determining spare parts provisioning and how to work-around problems that arise when failures occur.
5. In what way does each failure matter?
RCM recognises that maintenance is actually far more about preventing or mitigating the consequences of failure than about preventing the failures themselves. In this way RCM focuses maintenance spend where it will do the most good.
Some failures matter a great deal (i.e. they have severe consequences) when they occur and some failures hardly matter at all (i.e. they have insignificant or trivial consequences).
It is usually worth putting effort into predicting or preventing high-consequence failures, even if they occur infrequently. On the other hand, failures that matter very little are often tolerated, even if they happen relatively frequently.
This can be illustrated by considering a simple maintenance task: listening to a bearing for any signs of rumbling. The onset of any unusual rumbling noise tells us that the bearing has already started to fail and that it must be replaced in the near future (if we wish to avoid the failure occurring). By checking the bearing for unusual noise, we are not doing the task to prevent the bearing failure; we are doing it in order to avoid the consequences of failure (which might be expensive if, say, the engine is destroyed).
RCM, therefore, categorises each failure according to the consequences of failure as follows:
Hidden: failures which are not evident to the operating crew because, on their own, they have no direct effects
Safety or Environmental: these are evident failures which either affect safety (because they could injure or kill someone) or could lead to a breach of an environmental standard or regulation that applies to the asset.
Operational: these evident failures do not affect safety or the environment, but they do have an adverse effect on production or operations
Non-operational: this category includes failures which are evident to the operating crew and which incur only the direct cost of repairing them because they do not affect safety, the environment, production or operations.
Hidden failures are usually associated with equipment or systems that provide some sort of protection (e.g. a boiler pressure relief valve). Hidden failures on their own do not have any direct consequences but they leave the protected equipment or system without the protection that they should have - in the case of a pressure relief valve failing closed, the boiler may explode if a second failure causes the boiler to over-pressurise.
There are many ways in which a failure with Operational consequences can incur costs; these include lost production, increased operating costs, degradation in product quality, poor customer service, etc.
Proactive Tasks
6. What can be done to predict or prevent each failure?
Once each failure mode has been categorised according to the consequences of failure, a structured decision logic is used to select maintenance tasks. The RCM decision logic first looks to see if it is appropriate to perform a scheduled task to predict when the failure mode is going to occur.
If such a task is not appropriate, RCM then considers whether the failure should be prevented by regularly restoring the item’s original resistance to failure before it fails and if not, whether a scheduled replacement of the item (before it fails) is appropriate.
Predicting Failure - On-Condition Maintenance
This entails monitoring the equipment in order to identify a detrimental change (i.e. a warning) that indicates that the failure is in the process of happening (early enough so that action can be taken before the failure actually occurs). This is known as Condition-based Maintenance or Condition Monitoring.
How often the equipment needs to be monitored is governed by the time it would take from when the warning can be identified to the point at which full failure occurs. This is illustrated in the diagram below: the warning is shown at point P (Potential failure) and the full failure occurs at point F (Functional failure).
On-condition maintenance
The monitoring task should be carried out at an interval which is less than the time between P and F (know as the P-F interval). If it is practical to monitor for point P and the P-F interval is long enough for action to be taken to reduce, avoid or eliminate the consequences of failure then it may be possible to do the condition monitoring task.
Preventing Failure - Scheduled Restoration or Discard Maintenance
If failure cannot be predicted as it begins to occur, then RCM looks to see if it can be prevented from occurring. This would mean performing some sort of intervention before a failure even begins.
In the RCM task selection logic, the available choices are:
Scheduled restoration and scheduled discard tasks are carried outbefore the wear-out zone (i.e. towards the end of “life”, which is the age at which its conditional probability of failure begins to rise rapidly).
Sometimes these tasks are carried out earlier (i.e. the task is carried out more often) if the consequences of failure are very severe. This will increase the frequency of the scheduled task and provides a “safety factor”.
Sceduled Restoration and Discard Maintenance
Default Actions
7. What should be done if a suitable proactive task cannot be found?
The RCM task selection logic ensures that proactive tasks are identified only for those failure modes that need them. When a suitable proactive task cannot be found there still remains the question of what else could be done in order to manage the failure mode.
In addressing this question, RCM takes special note of the consequences of failure. For example, where the consequences are purely economic, RCM permits No Scheduled Maintenance (or Run-to-Failure) as a valid default action; however, doing nothing is not an option if the failure mode has safety or environmental consequences.
RCM Default Actions
The possible default actions are:
It is not possible for one person to answer all the questions that RCM asks. The solution is to bring together a group of people (the “RCM analysis group”) who have technical knowledge about the equipment, knowledge of its operation (within its current operating context) and a basic understanding of RCM itself (through suitable training).
A sound understanding of the RCM process is also required in order to guide the RCM analysis group through the RCM process and achieve consensus in answering the questions. This role is fulfilled by an RCM facilitator.
RCM analysis group members are drawn from equipment maintainers, operators, possibly manufacturers/suppliers and occasionally specialists. The most important factor is that they know and understand the equipment being analysed using the RCM process.
The aim is to reduce the size of the “black hole” in knowledge (i.e. the black area in the box representing “all there is to know about the equipment” in the diagram). Inevitably, there will be some gaps in the group’s combined knowledge, but at the end of the RCM analysis each group member will usually have acquired useful knowledge about the equipment from other members of the group.
Applying RCM
Under the guidance of the RCM facilitator, the group follows the RCM process. The outputs of the analysis are:
When the RCM analysis is complete, the output should be audited by whoever has overall responsibility for the equipment or system. This is so they can satisfy themselves that the analysis has been carried out correctly and that it is both sensible and defensible.
The final step is to implement the results of the RCM analysis when the audit is complete.
RCM has been applied in a wide range of industries in most countries throughout the world. Correctly applied, RCM produces a maintenance schedule that is optimised for the equipment in its operating context; the aim is to achieve inherent levels of equipment reliability and availability. The RCM derived maintenance and the process itself bring about the following benefits:
Safety - Greater safety and environmental protection:
Performance - Improved operating performance:
Cost Effectiveness - Greater cost effectiveness:
Life-Cycle Cost - Reduced life-cycle costs by optimising the maintenance workloads and providing a clearer view of spares and staffing requirements
Equipment Life - Longer useful life of expensive items due to an increased use of On condition maintenance techniques
Maintenance Data - A comprehensive maintenance data base which:
provides a better understanding of the equipment in its operating context
leads to more accurate drawings and manuals
allows maintenance schedules to be more adaptable to changing circumstances
documents the knowledge held by individuals on each piece of equipment
Motivation - Greater motivation of individuals, particularly those involved in the review process. This gives improved understanding of the equipment in its operating context and wider "ownership" of the resulting maintenance schedules
Teamwork - Better teamwork brought about by the highly-structured group approach to analysing and addressing maintenance problems.
Conclusion
RCM yields results very quickly; most organisations can complete an RCM review on existing equipment and achieve substantial benefits in a matter of months.
It is also an ideal approach for determining the maintenance requirements of new equipment of all kinds. When applied correctly, it transforms both the maintenance requirements themselves and the way in which the maintenance function as a whole is perceived.
Our role is to impart an understanding of RCM to clients and provide support and guidance in its application; our goal is for clients to become competent to apply RCM themselves.
This is achieved via a combination of: