An ITIL project in the real world

Saturday, August 05, 2006

How to measure improvements brought by our Incident Mgt project (1).

It's now almost one month that we have rolled out changes for improving the way we handle interruptions of service. What improvements did we achieve yet?

  • MEASURE 1: Average time to resolve Incidents

    Definition: time between initial record of the Incident and resolution by the specialist (not including extra time until user acknowledged that the service is back up and running properly.)

    Altogether, our average time to resolve Incidents hasn't changed much yet... It means that analysts haven't really started to change anything towards Incident handling. ie: they are not giving more priority or effort towards this task today than before.

    How will we make improvements happen then?

    1. In a few weeks, we'll start automatic and Service Desk escalation to management for each and every Incident that is not resolved as fast as expected. This should be an incentive to improve behavior...
    2. We will work closely with the teams and analysts that do not resolve Incidents in time. Instead of bothering everyone, we will focus on teams that are not providing a proper response, and focus on Incidents that were not handled in time. This should help teams focus on Incidents according to priorities - and will avoid unnecessary escalation to management.
    3. We will advertise team performance. That will show management and teams that other teams can do better.
    Limitations with this measure

    Although that plain measure is very good from a high level and customer perspective, we will need narrower measures to decide on specific actions to further improve performance. The problem with this global "average time to resolve" measure is that it is influenced by factors that are not related to speed at which analysts try to solve Incidents. Ex:

    1. To which extent does level 1 support record Incidents?

      At our company, Service Desk specialists tend to overlook Incident recording, specially when an immediate resolution is provided! When management increases or decreases SD focus on Incident recording, there is a dramatic impact on this measure - much more than any other increase or decrease of service support quality!

      Of course we need to make sure that we do as much as we can to facilitate Incident recording - at least for the getting a more accurate picture of the situation. This is part of our project actually. It will make us look more efficient but in reality this will not be providing immediate improvement to the user community.
    2. Some Incidents can only be resolved with changes.

      If we want to really improve the situation when installations or code changes etc are required, we will need to review our procurement policies, contracts with suppliers, organization of support, etc. . This has not been tackled by our project, so there are few chances things improve in this area right now.

    Also, it is an "average" which means that few exceptional bad/difficult Incidents can drag the results in such a way that the average is not representative of the quality of service provided. Next, it does not take into account that some Incidents need to be resolved extremely fast, because they have a high impact, and some Incidents may be resolved later than other Incidents, because they have less impact on our customers.

    For these reasons, we should probably pick another measure... why not, "Percentage of Incidents resolved in time"?

    I'll discuss this in my next post.


Post a Comment

<< Home