Reliability-centred Maintenance (RCM)
See also:
-  BackgroundMaintenance has changed... The world of equipment maintenance changed dramatically during the second half of the 20th century and it continues to do so today. Several major influences have been responsible for driving these changes: - An enormous increase in the number of physical assets (such as buildings, factories, public and personal transport) that require maintenance
- Equipment has become extremely complex - for example, it is now rare to find anything that does not contain a computer or some electronics
- Industries (such as manufacturing and mass transport) now put a much greater emphasis on safety and environmental integrity
- Significant developments in the technology available to maintainers for predicting failure
 We now have a much better understanding of how equipment behaves, from installation to the point at which it fails. When engineers were forced to respond to this wave of change, it became clear that traditional maintenance methods were no longer adequate - a new approach to equipment maintenance was required. The commercial aviation industry was the first to realise that change was necessary and committed significant resources to developing a solution in the 1960s and 1970s. The results entered the public domain in 1978 under the name "Reliability-centred Maintenance" or "RCM".  Increased expectations... Looking back to the 1930s, we can divide up the years since then into three “generations”. We can then examine the expectations placed on the maintenance function in each of the three generations as follows: - First Generation: prior to the Second World War, equipment was relatively simple and over-designed, so it tended to be reasonably reliable. The failures that did occur didn’t matter too much and were quick and easy to repair. There was little need for the planned maintenance systems that are commonplace today.
- Second Generation: the Second World War quickly led to increased demand for many types of manufactured goods and severely limited the supply of skilled labour to industry. In response, factory equipment became more mechanised and more complex. Failures (and their downtime) began to matter more so “preventive” maintenance systems were developed in an attempt to prevent them - usually these were fixed interval overhauls.
- Third Generation: the last 30-40 years have seen an enormous increase in demand for manufactured goods and mass transportation. Industry responded with ever more automation and complexity in order to reduce the manpower needed to meet this demand; this in turn greatly increased costs of ownership and maintenance costs.
 In industry generally, there is now extreme pressure on the maintenance function to deliver maximum performance at minimum cost. 
-  How Equipment FailsWe can also look back at what was generally understood about the way in which equipment behaved and failed over the same three generations: - First Generation: it was widely believed that new equipment had a very low probability of failure and that this remained the case for a long period of time. After a certain age, the equipment would "wear-out" and, therefore, become more likely to fail.
- Second Generation: an understanding of the concept of "infant mortality" led to the notion of an initial high probability of failure (which quickly settled down), followed by a long period of low failure probability before wear-out resulted in equipment becoming more likely to fail. Plotting conditional probability of failure against time on a graph produces the classic "bathtub curve". Equipment maintenance consisted of nursing the equipment through the "bedding in" phase and then overhauling (or replacing) it before it reached the wear-out phase.an attempt to prevent them - usually these were fixed interval overhauls.
- Third Generation: in the 1960s and 1970s the civil aviation industry undertook an extensive research project into the ways in which equipment behaves and, in particular, how it fails. This research showed that only 4% of civil aviation equipment failures actually fitted the classic bathtub failure pattern and that there were, in fact, an additional five failure patterns - most failures in the aviation industry conform to the sixth pattern.
 This new understanding of equipment failure made the civil aviation industry realise that their existing maintenance regime was flawed and even caused failures, some with catastrophic consequences. A new approach to equipment maintenance was essential and subsequently developed. The end result was what we now call Reliability-centred Maintenance - RCM. 
-  What Is RCM?Having examined how the changing world of maintenance drove the development of RCM, this section describes what it is in some detail and explains. The developers of RCM took the unusual view (at the time) that the objective of equipment maintenance should be to keep the equipment doing whatever its users want it to do, rather than to prevent failures for the sake of preventing failures. With this emphasis on preserving what the user wants, Moubray defines RCM as:  A process used to determine what must be done to ensure that any physical asset continues to do what its users want it to do in its present operating context  It is, therefore, no surprise that determining the operating context and what the user wants the equipment to do is the starting point for the RCM process, which is applied by asking and answering the following seven questions:  - What are the functions and associated performance standards of the asset in its present operating context?
- In what ways does it fail to fulfil its functions?
- What causes each functional failure?
- What happens when failure occurs?
- In what way does each failure matter?
- What can be done to predict or prevent each failure?
- What should be done if a suitable proactive task cannot be found?
 Each of these questions is considered in more detail in this and the following sections.. 
-  Gathering the Information1. What are the functions and associated performance standards of the asset in its present operating context? Operating Context In order to answer question 1, it is important to have a clear understanding of the operating context of the equipment being studied. This is because the operating context can influence what should be defined as failure and, therefore, whether a maintenance task is worthwhile. For example: consider a small diesel engine used to power trains. This engine could be the only engine in a two-car train or it could be one of eight in a much longer train. These are very different operating contexts which will result in two very different views of what constitutes failure. If a cooling water pump fails, the engine will eventually overheat and its protection will shut it down. On the two-car train this will result in very serious operational consequences because the train will come to a halt mid-journey. On the eight-car train it will result in a loss of 12% of its traction power. The train will continue to its destination with only a minor delay. Any maintenance task for the cooling water pump that is considered in the RCM analysis will be much more likely to be evaluated as worthwhile on the two-car train than on the eight-car train. A similar logic will apply to many of the engine’s other failure modes, resulting in two quite different maintenance schedules for the very same engine. Functions  A function is a statement of what the user of the equipment wants it to do and to what standard of performance. A complete set of functions for a piece of equipment represents the objectives of its maintenance schedule. The user will be very happy if the maintenance schedule keeps the equipment performing its functions! The function list is the foundations for the remainder of the RCM analysis and the RCM analysis group will use it to deduce exactly what is meant by failure. From this ‘failed state’, the RCM analysis group can list the failure modes that could cause each failed state. A system’s “primary function” is usually obvious, easy to determine and normally states why the system was purchased in the first place. However, most systems are expected to perform other “secondary functions” which represent the user’s requirements for environmental and safety integrity, protection, control, economy, appearance, etc. Functional Failures 2. In what ways does it fail to fulfil its functions? A system or piece of equipment is said to have ‘failed’ if it is unable to perform its intended function(s) to the desired standard of performance. This includes partial failure (as well as complete failure) where the equipment still functions, but not to an acceptable standard (e.g. it may be operating too slowly or producing poor quality). By documenting functional failures, the RCM analysis group defines the “failed state” (i.e. exactly what is meant by “failed” and “partially failed”) for the equipment’s operating context. 
-  Failure Mode and EffectsFailure Modes 3. What causes each functional failure? A failure mode is any event which is reasonably likely to cause a functional failure. “Any event” is not limited to equipment failures caused by wear and tear or deterioration (sudden or slow), but also includes human error, poor procedures and design issues. "Reasonably likely” (i.e. credible) failure modes fall into the following broad categories: - those that have occurred before on the same or similar equipment
- those that are successfully being prevented by the existing maintenance tasks
- those that do not fit the first two categories, but which are considered to be real possibilities in the future for the equipment’s operating context.
 However, any unlikely failure modes that have extremely severe consequences would also be considered. When writing failure modes, it is important to identify the cause of the failure in sufficient detail so that the RCM analysis group can identify appropriate maintenance later in the RCM process (using the RCM maintenance task selection logic). Insufficient detail may well mean that appropriate maintenance tasks are missed, rendering the analysis ‘superficial’ (and possibly dangerous). On the other hand, if failure modes are identified in too much detail the RCM analysis group could end up wasting time unnecessarily. Failure Effects 4. What happens when failure occurs? The RCM analysis group needs to have sufficient information so that they can make robust decisions about how to manage each failure mode. In particular, the effects of each failure (i.e. what would happen when the failure occurs if nothing was done to prevent it) are required. This information allows the RCM analysis group to answer the questions posed in the RCM decision logic. The failure effects record the problems (e.g. any undesirable/costly events) that the RCM -derived maintenance schedule is intended to manage (i.e. predict or prevent). The failure effects should, therefore, contain the following information: - how (if at all) the operating crew or organisation will know that the failure has occurred
- whether the failure affects safety or the environment and, if so, in what way
- the effects on equipment operation/output/customer service, if any
- what secondary damage (if any) occurs on other equipment/components as a result of the failure
- what action must be taken and what spare parts are required in order to repair the failure
- what contingency action would be taken (if any) by the organisation to manage the failure until the equipment is repaired and returned to service. 
  The first 4 questions of the RCM process make up the information gathering phase. The answers to these questions document what the equipment or system should do (functions), how it could fail (functional failures), what causes it to fail (failure modes) and what problems result (failure effects) when it does fail. This information becomes an excellent equipment reference which can subsequently be used to support a safety case, act as an audit trail, produce a comprehensive fault-finding guide and be the starting point for determining spare parts provisioning and how to work-around problems that arise when failures occur. 
-  Evaluating the Consequences of Failure5. In what way does each failure matter? RCM recognises that maintenance is actually far more about preventing or mitigating the consequences of failure than about preventing the failures themselves. In this way RCM focuses maintenance spend where it will do the most good.  Some failures matter a great deal (i.e. they have severe consequences) when they occur and some failures hardly matter at all (i.e. they have insignificant or trivial consequences). It is usually worth putting effort into predicting or preventing high-consequence failures, even if they occur infrequently. On the other hand, failures that matter very little are often tolerated, even if they happen relatively frequently. This can be illustrated by considering a simple maintenance task: listening to a bearing for any signs of rumbling. The onset of any unusual rumbling noise tells us that the bearing has already started to fail and that it must be replaced in the near future (if we wish to avoid the failure occurring). By checking the bearing for unusual noise, we are not doing the task to prevent the bearing failure; we are doing it in order to avoid the consequences of failure (which might be expensive if, say, the engine is destroyed). RCM, therefore, categorises each failure according to the consequences of failure as follows: Hidden: failures which are not evident to the operating crew because, on their own, they have no direct effects  Safety or Environmental: these are evident failures which either affect safety (because they could injure or kill someone) or could lead to a breach of an environmental standard or regulation that applies to the asset. Operational: these evident failures do not affect safety or the environment, but they do have an adverse effect on production or operations  Non-operational: this category includes failures which are evident to the operating crew and which incur only the direct cost of repairing them because they do not affect safety, the environment, production or operations. Hidden failures are usually associated with equipment or systems that provide some sort of protection (e.g. a boiler pressure relief valve). Hidden failures on their own do not have any direct consequences but they leave the protected equipment or system without the protection that they should have - in the case of a pressure relief valve failing closed, the boiler may explode if a second failure causes the boiler to over-pressurise. There are many ways in which a failure with Operational consequences can incur costs; these include lost production, increased operating costs, degradation in product quality, poor customer service, etc. 
-  Selecting the Maintenance TasksProactive Tasks 6. What can be done to predict or prevent each failure? Once each failure mode has been categorised according to the consequences of failure, a structured decision logic is used to select maintenance tasks. The RCM decision logic first looks to see if it is appropriate to perform a scheduled task to predict when the failure mode is going to occur. If such a task is not appropriate, RCM then considers whether the failure should be prevented by regularly restoring the item’s original resistance to failure before it fails and if not, whether a scheduled replacement of the item (before it fails) is appropriate.  Predicting Failure - On-Condition Maintenance This entails monitoring the equipment in order to identify a detrimental change (i.e. a warning) that indicates that the failure is in the process of happening (early enough so that action can be taken before the failure actually occurs). This is known as Condition-based Maintenance or Condition Monitoring. How often the equipment needs to be monitored is governed by the time it would take from when the warning can be identified to the point at which full failure occurs. This is illustrated in the diagram below: the warning is shown at point P (Potential failure) and the full failure occurs at point F (Functional failure). On-condition maintenance The monitoring task should be carried out at an interval which is less than the time between P and F (know as the P-F interval). If it is practical to monitor for point P and the P-F interval is long enough for action to be taken to reduce, avoid or eliminate the consequences of failure then it may be possible to do the condition monitoring task.  Preventing Failure - Scheduled Restoration or Discard Maintenance  If failure cannot be predicted as it begins to occur, then RCM looks to see if it can be prevented from occurring. This would mean performing some sort of intervention before a failure even begins. In the RCM task selection logic, the available choices are: - Scheduled Restoration: this is where equipment is overhauled at a fixed interval regardless of its condition at the time
- Scheduled Discard: this is where a component is removed, discarded and replaced (with a new one) at a fixed interval regardless of its condition at the time.
 Scheduled restoration and scheduled discard tasks are carried outbefore the wear-out zone (i.e. towards the end of “life”, which is the age at which its conditional probability of failure begins to rise rapidly). Sometimes these tasks are carried out earlier (i.e. the task is carried out more often) if the consequences of failure are very severe. This will increase the frequency of the scheduled task and provides a “safety factor”. Sceduled Restoration and Discard Maintenance Default Actions 7. What should be done if a suitable proactive task cannot be found? The RCM task selection logic ensures that proactive tasks are identified only for those failure modes that need them. When a suitable proactive task cannot be found there still remains the question of what else could be done in order to manage the failure mode. In addressing this question, RCM takes special note of the consequences of failure. For example, where the consequences are purely economic, RCM permits No Scheduled Maintenance (or Run-to-Failure) as a valid default action; however, doing nothing is not an option if the failure mode has safety or environmental consequences. RCM Default Actions The possible default actions are: - No Scheduled Maintenance: these failure modes are allowed to happen and are then repaired. RCM permits this default action only when the consequences of failure are economic (i.e. Operational and Non-operational consequences)
- Failure-Finding: this applies only to failures which have Hidden consequences. The protective device or system is tested at regular intervals to check whether or not it is still working (and is repaired if it is found to have failed)
- Redesign: RCM recognises that sometimes maintenance cannot satisfactorily manage a failure mode and that a one-off change may be necessary (to the equipment, the way it is used or to the people who use it). Redesign is compulsory if a proactive task cannot be found for failures modes that have Safety or Environmental consequences whereas redesign is optional for failures modes that have only economic consequences
 
-  Applying RCMIt is not possible for one person to answer all the questions that RCM asks. The solution is to bring together a group of people (the “RCM analysis group”) who have technical knowledge about the equipment, knowledge of its operation (within its current operating context) and a basic understanding of RCM itself (through suitable training). A sound understanding of the RCM process is also required in order to guide the RCM analysis group through the RCM process and achieve consensus in answering the questions. This role is fulfilled by an RCM facilitator. RCM analysis group members are drawn from equipment maintainers, operators, possibly manufacturers/suppliers and occasionally specialists. The most important factor is that they know and understand the equipment being analysed using the RCM process. The aim is to reduce the size of the “black hole” in knowledge (i.e. the black area in the box representing “all there is to know about the equipment” in the diagram). Inevitably, there will be some gaps in the group’s combined knowledge, but at the end of the RCM analysis each group member will usually have acquired useful knowledge about the equipment from other members of the group. Applying RCM Under the guidance of the RCM facilitator, the group follows the RCM process. The outputs of the analysis are: - a list of maintenance tasks to be performed by maintenance personnel at specified intervals
- a list of tasks to be performed by operating personnel at specified intervals
- a list of redesigns to be considered for implementation.
 When the RCM analysis is complete, the output should be audited by whoever has overall responsibility for the equipment or system. This is so they can satisfy themselves that the analysis has been carried out correctly and that it is both sensible and defensible. The final step is to implement the results of the RCM analysis when the audit is complete. 
-  What RCM AchievesRCM has been applied in a wide range of industries in most countries throughout the world. Correctly applied, RCM produces a maintenance schedule that is optimised for the equipment in its operating context; the aim is to achieve inherent levels of equipment reliability and availability. The RCM derived maintenance and the process itself bring about the following benefits: Safety - Greater safety and environmental protection: - improved maintenance of existing protective devices
- the systematic review of safety implications of every failure
- the application of clear strategies for preventing failure modes which can affect safety or impinge upon environmental regulations
- fewer failures caused by unnecessary maintenance
 Performance - Improved operating performance: - an emphasis on the maintenance requirements of critical equipment elements
- the extension or elimination of overhaul intervals
- shorter and more focused maintenance tasks resulting in less extensive and costly shutdowns
- fewer "burn in" problems after maintenance (by eliminating unnecessary maintenance actions)
- the identification of unreliable components
 Cost Effectiveness - Greater cost effectiveness: - less unnecessary routine maintenance
- the prevention or elimination of expensive failures
- clearer operating policies
- clearer guidelines for acquiring new maintenance technology
- Quality - Improved quality due to:
- a better understanding of equipment capacity and capability
- the clarification of equipment set-up specification and requirements
- the confirmation or redefinition of equipment operating procedures
- a clearer definition of maintenance tasks and objectives
 Life-Cycle Cost - Reduced life-cycle costs by optimising the maintenance workloads and providing a clearer view of spares and staffing requirements Equipment Life - Longer useful life of expensive items due to an increased use of On condition maintenance techniques Maintenance Data - A comprehensive maintenance data base which: provides a better understanding of the equipment in its operating context leads to more accurate drawings and manuals allows maintenance schedules to be more adaptable to changing circumstances documents the knowledge held by individuals on each piece of equipment Motivation - Greater motivation of individuals, particularly those involved in the review process. This gives improved understanding of the equipment in its operating context and wider "ownership" of the resulting maintenance schedules Teamwork - Better teamwork brought about by the highly-structured group approach to analysing and addressing maintenance problems. Conclusion RCM yields results very quickly; most organisations can complete an RCM review on existing equipment and achieve substantial benefits in a matter of months. It is also an ideal approach for determining the maintenance requirements of new equipment of all kinds. When applied correctly, it transforms both the maintenance requirements themselves and the way in which the maintenance function as a whole is perceived. 
-  Mutual Consultants' RoleOur role is to impart an understanding of RCM to clients and provide support and guidance in its application; our goal is for clients to become competent to apply RCM themselves. This is achieved via a combination of: - Highly-developed RCM training courses (for both analysis group members and facilitators)
- Contract facilitation (where appropriate)
- On-site technical support
- Supply of dedicated RCM software.
 
