Catalogue of measures to eliminate the causes of malfunctions
Like Problem Management, Incident Management is a service management process that exists within the ITIL methodology (IT Infrastructure Library). In contrast to Incident Management, however, it does not address acute incidents or malfunctions, but offers a comprehensive, standardised catalogue of measures to identify and permanently eliminate sources of problems. This allows recurring malfunctions to be reactively eliminated, ensuring that such incidents in production and logistics (warehousing and manufacturing) are prevented in the future.
Instead of a troubleshooting hotline – as is the case with Incident Management – the focus of Problem Management is on sustainably analysing the causes of errors. The identified problem is then corrected with a change (Change Management). While Incident Management usually has to operate under high time pressure, Problem Management activities are usually less time-critical.
Nevertheless, they require specific know-how from their coordinators for the root cause analysis and ultimately for developing a suitable solution for the underlying resolution of problems. Hence the requirement profile is not just technical, the focus here is also on managing complex fault situations at the IT services level.
Proven procedure model
The first step entails identifying the problem and recording it in detail. Immediately afterwards, it is categorised and prioritised. The agreements within the service level agreement (SLA) are given special consideration – also and especially in terms of time.
Furthermore, the scope of a problem is determined in the diagnostic process. If the effects on business operation are explosive, it is classified as a "major problem", which must be resolved as a matter of priority. It may also be possible to use interim solutions until the problem has been finally resolved.
Such workarounds hold the fort until the final resolution is found, if this will take an unacceptably longer period of time. As a last point, both workarounds and clearly identified problems – referred to as "known errors" – for which solutions have already been implemented are documented.
Active knowledge management for accelerated problem-solving
This database supports knowledge management, also by linking newly notified incidents with already detected and solved problems. This way, already-known solutions for troubleshooting (known errors) and workarounds can be adopted from problem tickets.
All activities are stored as history, so that traceability – and thus also revision safety – is guaranteed at any time. Simultaneously, this procedure entails the involvement of several specialist departments.
The demands placed on the coordination expertise of those responsible for Problem Management are correspondingly high.
In addition to the exact definition of the service processes in Problem Management, there is also a definition of control-relevant key figures. A corresponding evaluation of these key figures across priorities, categories and also over time provides valuable information on the current level of Incident Management. At the same time, this allows statements to be made about the quality of the IT infrastructure in question.
Service operation processes
As a service operation process according to ITIL, Problem Management focuses on preventing the occurrence of incidents on the basis of a standardised catalogue of measures for root cause analysis.
IGZ will be happy to assist you in the joint development of a suitable service catalogue for you: optimised in terms of performance and cost and individually tailored to your needs.