Home
Welcome
Services
TMM
Compliance
Projects
FAQ
Articles
e-mail me



 

Robert Hess is a maintenance management professional with a keen understanding of key performance indicators (KPIs). His explanation of three of the industry KPIs is a very readable document. Robert has graciously allowed KelRoy Solutions to reprint the document. Robert's frustrations with non-maintenance senior management is apparent in the writing.

 

Basic Maintenance Management

 

The Goal of Maintenance Management

The basic goal of equipment maintenance management can be expressed most simply as: PREVENT FAILURES! That’s because not only do equipment failures lead to subsequent emergency repair costs, but the time required to repair the equipment during its scheduled operating time increases operational costs due to the resulting lack of production (or the cost of supplying extra temporary replacement equipment in order to maintain production).

 

Equipment costs can be sub-divided into 3 specific cost groups in order to examine ways to reduce them:

 

Maintenance costs (repair and preventive maintenance parts and labour costs)

Minimizing maintenance costs (repair and preventive maintenance parts and labour costs) is done by purchasing the optimum equipment, maintaining it in the most cost-effective manner to keep it running properly, and repairing or replacing it in the most cost-effective manner when it fails.

 

Operational costs (downtime costs due to equipment failure)

Minimizing operational costs (downtime costs due to equipment failure) is done by preventing equipment failures with proper equipment specifications and operator training, and by repairing any failures which do occur as quickly as possible by having on-site spare parts, trained technicians, backup systems, and replacement equipment available to deploy quickly.

 

Equipment replacement costs (capital costs for new equipment)

Minimizing equipment capital costs (lowering depreciation costs and extending replacement intervals) is done by maintaining equipment at its peak condition over its service life to maximize re-sale or trade-in value when equipment is finally replaced (depreciation), and by extending the replacement intervals to lower average capital costs for replacement.

 

Why Measure Equipment Performance Using Standard Indicators

Without a way to measure equipment performance in the context of failures there can be no way to monitor, compare, and improve it… and thus reduce failure costs, ie “what gets measured gets improved”. For that reason industry-standard basic equipment performance measurements (indicators) have been developed by professional maintenance managers to manage equipment (just as standard basic performance indicators were developed to allow accountants and business managers to manage finances). These standard performance indicators are the basis of the systematic maintenance management of all types of equipment. As well, the use of industry standard indicators allows maintenance professionals to quickly and accurately bench-mark and compare equipment performance industry-wide without special training.

 

Unfortunately, in many cases because senior management are usually promoted from legal and financial disciplines rather than from maintenance or engineering, they’ve never had any formal maintenance management training, and so they’ve never heard of standard performance indicators. Despite their lack of maintenance training they also soon recognize a need to measure equipment performance, but often invent complicated financially-based performance indicators of their own, or make frequent requests for complicated custom reports which their neither they or their staff fully understand or trust.

 

Basic Performance Indicators

 

Availability

The average % of scheduled operating time the equipment was available for use (ie not out of service from equipment failure).

 

Scheduled Operating Time [minutes] - Time Out of Service [minutes]
------------------------------------------------------------------------------------------- X 100 = % Availability
Scheduled Operating Time [minutes]

 

A good availability indicator is 99%. Specific equipment or equipment classes or high priority equipment which have lower than acceptable availability should have their PM program re-evaluated and upgraded, and/or the equipment specification rewritten before new equipment is purchased, with an emphasis on revising specifications to improve or strengthen the design of the components that have failed in the past.

 

Example of Weekly Equipment Availability Indicators graphed over 4 weeks

 

 

Availability was close to 100% the first week; however the curve trended lower every week after that, which means the equipment is becoming less available as down-time increases due to either multiple small failures or one or more serious failures which took a long time to repair. This equipment needs major overhaul or replacement quickly to get availability back up over 99%.

 

Reliability (MTBF: Mean Time between Failures)

The average time between failures. Note: I have modified the standard formula used to calculate reliability by subtracting the time out of service from the scheduled operating time, so as to account for the fact that while the equipment is out of service it cannot be considered operational.

 

Scheduled Operating Time [minutes] – Time Out of Service [minutes]
-------------------------------------------------------------------------------------------- = minutes Reliability
Number of Failures

 

A perfect reliability indicator is infinite minutes, since the formula shows an error when trying to divide 0 failures into any scheduled operating time. In order to get some data to allow for a coherent line graph a long enough period to show at least one failure for each piece of equipment being analyzed can be used if possible (ie 1 month, 1 quarter, etc). If there have been no failures in the most recent period, then reliability can be displayed in a columnar graph showing the total minutes between each of the past failures, with the most recent column marked “ongoing - to date”. Specific equipment or equipment classes or higher priority equipment which have lower than acceptable reliability should have its PM program re-evaluated and upgraded, and/or the equipment specification rewritten before new equipment is purchased, with an emphasis on revising specifications to improve or strengthen the design of the components that have failed in the past.

 

Reliability is often erroneously reported as a percentage by senior managers unfamiliar with standard equipment performance indicators who do not understand the difference between availability (% available during scheduled operating time) and reliability (minutes between failures).

 

Example of Weekly Equipment Reliability Indicators graphed over 4 weeks

 

 

The Scheduled Operating Time for this equipment is 24 hours a day or 10,080 minutes a week, and 10,080 is the number generated by the formula when there is 1 failure. If there were no failures in a week 10,080 would be added to the next week. In this case multiple failures caused reliability to be 4,500 minutes between failures during the first week, however as the number of failures decreased over the next 3 weeks the reliability increased close to 10,000 minutes and the curve stayed flat until trending slightly lower in week 4.

 

Maintainability

The average time to repair equipment after a failure.

 

Time Out of Service [minutes]
----------------------------------------- = minutes Maintainability
Number of Failures

 

A very good maintainability indicator is 30 minutes, which means that in most cases the technicians were able to repair the equipment in about half an hour. If there are no failures during the period being analyzed for a specific piece of equipment there automatically will be no time out of service either, and therefore the maintainability indicator will be 0/0 = 0. In order to get some data to allow for coherent graphing a long enough period to show at least one failure for each piece of equipment being analyzed can be used if possible (ie 1 month, 1 quarter, etc. Specific equipment, equipment classes, or high priority equipment which has higher than acceptable maintainability should have its specification rewritten before new equipment is purchased, with an emphasis on revising specifications to allow for easier access to components, better access to service manuals, more comprehensive and informative service manuals, improved access to parts and supplies, improved technician (and in some cases operator) training, as well as improving or strengthening the design of the components that have failed in the past to improve availability and reliability.

 

Example of Weekly Equipment Maintainability Indicators graphed over 4 weeks

 

 

 

In the first week the repair jobs done to this equipment took an average of 100 minutes. The jobs done in week 2 took an average of 500 minutes, however in week 3 the average job time dropped to under 100 minutes. In week 4 job time rose to an average of 300 minutes. The repair jobs need to be analyzed individually to determine why there is such a variation in average job time, in order to make sure there are no delays caused by lack of staff, delays in getting parts, etc.

 

Calculating the Basic Performance Indicators From the Data

The key indicators are all derived from calculations which use standard basic failure data captured for each individual failure - for each separate piece of equipment. The indicators are further refined for analysis by grouping, sorting, averaging, and totaling them by the different factors which are pertinent to the type of equipment and its use (ie equipment class, priority, manufacturer, model, season, day, maintenance crew, location, etc, etc).

 

Collecting the Data Necessary to Calculate the Basic Performance Indicators

The maintenance data necessary to calculate the basic performance indicators can be captured in paper log books, separate paper work orders, or input directly into special maintenance management computer software as a work order data record. If the data is initially captured in a paper log book or paper work order it will later have to be re-entered into an electronic calculator or computer spread-sheet program so it can be processed to derive the key indicators. Because the space required for the minimum essential data requirement is actually very small, the data can be collected in the form of a traditional paper log book… one row for each failure record.

 

Once the primary decision to keep detailed equipment failure records has been implemented, the small amount of extra time required to create each record on a separate piece of paper (ie a work order) rather than just add a row to a log book can be justified because once a separate work order is created more data than just the minimum allowed by the available space in a log book can be recorded with very little extra time incurred. Creating several work order copies at the same time (either by using carbon copy paper forms or by printing extra copies from a computer if the work orders are computer generated) allows copies to be used by the work control centre for scheduling, the parts department for materials tracking, and by the technician on the job site to record other job data such as parts and labour cost data, failure analysis data, charge-back data, job follow-up information, sketches and photographs, etc, etc.

 

The mandatory minimum job data which must be collected for each job on each separate piece of equipment is:

 

  • Failure/Job Identification Number; a sequential unique number used to identify the failure, and written on all the job documents (also known as a log row number, a work ticket number, or a work order number). The unique record identification number can be derived from the date and number of jobs that day, obtained from the page and row number of the log book, pre-printed on sequential paper work order forms (which can be multi-part), or automatically generated by the maintenance management computer program.

 

  • Equipment Identification Number; a unique number permanently marked on each piece of equipment so it can be identified quickly and accurately. The equipment identification number is the maintenance system link to all types of previously entered equipment data such as location, priority, responsibility, safety plan, name-plate data, technical manuals, technical drawings, warranty information, etc, etc, etc. As well it is often linked to the equipment location so it can be located on the facility/building as-built drawings.

 

  • Failure Start Date and Time; which is used to identify the weekly period the equipment went out of service (if job finish date and time are recorded too then the time out of service can be calculated by subtracting the job start time from the job finish time).

 

  • Time Out of Service during Scheduled Operating Time; also known as “unscheduled downtime” or just “downtime”, the length of time in minutes the equipment is out of service during scheduled operating time - from the moment it fails to the moment it is back in service. This is usually longer than the actual labour time the technicians need to carry out the repair job (unless they happen to be very near the equipment with the right parts and tools when the failure occurs).

 

  • Number of Failures: in order to carry out the performance calculations it is necessary to count the number of failures, which can be done simply by counting the log book rows or separate work orders that were recorded against the piece of equipment for the rollup period (ie weekly). A log record or work order must be created for every failure, even when no repair work is carried out.

 

Indicator Reporting and Display

The results should be distributed weekly on printed paper tabular reports, enhanced with corresponding line or bar graphs (wherever possible use a standardized company-wide graph layout to enhance staff comprehension and subsequent decision-making… since line graphs with time (ie weeks) along the bottom (the “X” axis) are the most common, that’s what is recommended). Because many companies regard reliability as the most important key indicator, a large wall chart tracking the overall equipment reliability over the past several years to the present date can be a valuable addition to staff work areas or meeting rooms.

 

The most efficient use of the data is to display a series of indicator results for short periods of time in basic line graphs which show the results over longer periods, such as weekly results for a year (x axis = 52 weekly time periods, y axis = calculated weekly indicator).

 

Using the Indicators to Improve Equipment Performance

The PM program and equipment specifications should be adjusted or refined when the indicators are considered outside normal range (ie low availability or reliability, or high maintainability), or when the slope of the graph shows a steady trend towards a decrease in availability. Adjusting the PM program can range anywhere from reducing preventive maintenance on certain classes of low priority equipment which has near-perfect reliability and availability, eliminating preventive maintenance on other classes of equipment on the grounds it is a waste of valuable resources because even if it does fail it can be repaired quickly by simply changing the battery (called a “run-to-fail” philosophy), or increasing preventive maintenance on equipment which is not considered reliable or available enough.

 

Implementation Steps (Usually implementation takes place over a period of 1 – 2 years)

  1. Carry out an inventory of all equipment, first marking each piece with a unique permanent identification number (a permanent ink felt pen works fine) and then recording the identification number with the make, model, location, year installed, serial number, size, etc. Leave space on the inventory form for the Scheduled Operating Time data – see #2 below.

 

  1. In order to be able to get accurate data for “Time Out of Service during Scheduled Operating Time” for each failure it is necessary to set an operating time schedule for each piece of equipment. This schedule is used to both determine the time out of service data for each failure, and to calculate a total time for a rollup period (ie weekly) which is then used in calculating the indicators. Data collection and calculations can be greatly simplified if the scheduled operating time is assumed to be 24 hours/day. The total for the period is expressed in minutes, ie as minutes per week (1 week = 60 X 24 X 7 = 10,080 minutes). Weeks are a good basic unit of time for equipment performance monitoring because they are consistently 7 days, while the number of minutes in months and years changes over time and causes fluctuations in indicators which can be confusing. Add this information to the equipment inventory data.

 

  1. Design a site specific log book or work order form which will capture the minimum data required to calculate the performance indicators, as well as any other data which the technicians feel would be helpful (ask them for their input, but try to keep extra fields to a minimum because too much data input will increase resistance to creating a record of every failure (see “Roadblocks” below). Phase in the use of log books or work orders by operational staff and technicians in stages over a year until they are finally recording every failure and the subsequent repairs (if any).

 

  1. Designate enough clerical staff to be able to finish inputting the previous week’s log book or work order data into a computer database or spreadsheet program (ie Microsoft Access or Excel) by the middle of the next week, and then do the calculations for the indicators in a day or two so that performance indicator reports and graphs can be produced and distributed to technicians and management by the end of that week, meaning that the latest performance indicators and reports will never be more than 2 weeks old.

 

  1. After performance indicators have been produced every week for several months install a chart showing the Reliability history in a main staff area. Soon after that management should start referring to the curve of the chart when discussing maintenance performance issues, and encourage supervisors and technicians to set their own goals and targets. Reward them publicly when the goals or targets are met.

 

  1. In order to ensure all the failure and repair data is being collected in the log books or work orders the maintenance manager should conduct regular spot checks by selecting a failure at random and then tracking the creation of the failure report, it’s input into the database which is used to produce the weekly reports, and the release of an accurate performance indicator report according to the weekly reporting schedule.

 

  1. Management should base supervisor performance goals and evaluations for raises and bonuses on the performance indicators for equipment which is their responsibility.

 

Implementation Issues

Many companies make the mistake of attempting to implement expensive and complicated specialized maintenance management computer software, and/or advanced maintenance functions such as failure analysis, predictive maintenance, and total productive maintenance even before they have a functional basic maintenance management system running using paper forms and providing regular accurate equipment performance indicators. It is impossible to implement specialized maintenance management computer software or advanced maintenance functions without a basic maintenance management system in place first, and attempts to do so will always end in a costly failure… ie “you have to walk before you can run”.

 

The main road-block to setting up an effective basic maintenance management system, whether computer-based or not, is staff resistance to the suggestion that collecting the failure data required to calculate the basic equipment performance indicators requires the creation of a separate record for each failure, no matter how small or unimportant it seems to them. The separate records can also be used to plan, schedule, and track jobs, as well as to collect other data (such as failure analysis, job cost, detailed lists of parts used, etc, etc), but the essential data required to calculate the basic performance indicators must be collected for every equipment failure and its subsequent repair job or the maintenance system data will be unable to be used to evaluate the key maintenance performance indicators.

 

In some cases it may be necessary to designate operational staff (as opposed to maintenance staff, especially if the maintenance staff are contractors) to manage the data collection on a daily basis, since by omitting to record some failures (ie those which are quickly fixed, such as tightening a fastener or resetting breakers) the maintenance staff can skew the data to produce indicators which suggest things are better than they really are. In the same way that the amount of money in the bank must match the financial statement or no one can write a cheque without fear that there may not be sufficient funds to cover it, the number of failures and the related failure data must match the actual failures that occurred or the staff will quickly stop trusting or monitoring the performance indicator reports and graphs… and eventually stop completing the log book or work orders altogether, since even one missing failure makes all the indicators useless.

 

 

 

 

In progress

 

Preventive Maintenance

 

When new equipment is installed and tested, before it is put it into service it should be run in (simulated) operational conditions so that its performance (ie output, amps, hp, speed, acceleration, deceleration, etc) can be tested and recorded, for use in establishing benchmark figures for future comparisons of similar units after repair and rebuild, or when planning the requirements for the purchase of similar equipment, or designing new facilities.


Other reports

 

Detailed budget for next 12 months

Total shop labour billed to total shop wage package paid including management wages

Total shop gross profit percentage

Accounts receivable balance as a % to total sales for the month

Consumables (oil, tire, battery, parts) inventory turnover trend

Site efficiency (formula? – billed labour hours/total labour hours)

Average labour hours billed per invoice

Total labour hours sold for each of maintenance, repairs, diagnostic

Average labour hours & $ billed per technician

Actual cost per billed hour

Actual cost per hour

 

Data needed?

 

Break out labour by pm, cm, diagnostic & not billed – labour rate for each?

Total work order sales by labour & parts

Parts by consumables & OEM & jobber