An ITIL project in the real world

Monday, December 25, 2006

My Incident Management project is now closed :-)

After our successfull roll out and a few months closely watching how well each group or analyst follows the process, we have gradually reached the service level we targeted! It wasn't straightforward, certainly challenging, but very rewarding.

The hardest part will be keeping up the good work ;-)

I hope you will have success with your initiatives too :D

Wednesday, November 29, 2006

How to put IE7 menu back on top, and hide that new search box

Microsoft did a pretty bold move in removing the user's freedom of choosing where we want our address bar and menu displayed, and forcing their search box to be displayed. Fortunately there are some ways around that change: some registery tweaks.

To put the IE7 menu back on top: create a .reg file with this content:

Windows Registry Editor Version 5.00[HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\Toolbar\WebBrowser]"ITBar7Position"=dword:00000001

To revert back the change:

Windows Registry Editor Version 5.00[HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\Toolbar\WebBrowser] "ITBar7Position"=dword:00000000

To hide that extra search box they added, that is redundant if you already have, say... the Google toolbar:

Windows Registry Editor Version 5.00[HKEY_CURRENT_USER\Software\Policies\Microsoft\Internet Explorer\InfoDelivery\Restrictions]"NoSearchBox"=dword:00000001

To bring it back on:

Windows Registry Editor Version 5.00[HKEY_CURRENT_USER\Software\Policies\Microsoft\Internet Explorer\InfoDelivery\Restrictions]"NoSearchBox"=dword:00000000

To allow 16 simultaneous connections (that improves speed of downloading pages:)

Windows Registry Editor Version 5.00[HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings]"MaxConnectionsPerServer"=dword:00000010"MaxConnectionsPer1_0Server"=dword:0000010...

Hope this helps you too ;-)

Wednesday, August 09, 2006

How to measure improvements brought by our Incident Mgt project (3).

MEASURE 3: Market Share

Definition:
Number of user calls recorded per user per month.

What's the use?
I am not sure that "Market Share" is a very good name for this. Does anybody know another way to call it? Anyway, here's what it is good at: this gives you an idea whether 1st level support is doing one of their key tasks, which is... recording calls! Although it can sound obvious that 1st level support should record all calls, as an analyst, if this is not made very clear by management (and supervisors), it is very easy to slip away from it. Ex: "The user called and I solved her problem in 30 secs. Why would I record that? What's the use? It will take me more time to record it than it took me to resolve it!" It is not uncommun for an analyst to prefer to be the "hero" who solves 10 things like a Superman, than solving these 10, and recording what you have done. What extra reward do you get from recording that?

That's when this measure comes in handy. I've been told that industry average for recording calls should be roughly 1.1 record / user / month. Of course it varies a lot from one industry to another. And whether you have a majority of plant workers, or a huge marketing team ;-) But that can give you some idea where you're at with this activity.

Limitations with this measure
The objective is not to record more calls than there really is, to have nice numbers. Actually, if we had very good problem management, change management, availability and capacity management, etc, we should not have so many Incidents no? Still, if your company is big enough, you may be able to use this measure to highlight which L1 support team is not doing that part of their job properly. And believe me, at my place, it is quite obvious.

Another limitation is that it is not taking into account the quality of calls recorded... this would require a softer measure.

Sunday, August 06, 2006

How to measure improvements brought by our Incident Mgt project (2).

MEASURE 2: Percentage of Incidents resolved in time

Definition: Number of Incidents resolved as per agreed on OLAs&SLAs divided by total number of Incidents in the period.

Will it help us? I think that can be an excellent measure. We can use it to identify services that lack proper support, or teams that do not react in a timely manner... It should be a great tool to fine tune our process - but we won't be able to compare with the situation before the project, because we did not have any kind of OLA...

Limitations with this measure
I have to say I like this measure. If Incidents are not recorded, it shows bad. If you stay within targets, it has a positive impact. If you over do it, it does not make the measure any better.

However, that measure alone will not be sufficient. What if analysts do not record anything? Or just a few Incidents that are resolved extremely fast? The numbers will not make any sense... We'll need some more. But I will keep this one.

Saturday, August 05, 2006

How to measure improvements brought by our Incident Mgt project (1).

It's now almost one month that we have rolled out changes for improving the way we handle interruptions of service. What improvements did we achieve yet?


  • MEASURE 1: Average time to resolve Incidents

    Definition: time between initial record of the Incident and resolution by the specialist (not including extra time until user acknowledged that the service is back up and running properly.)

    Altogether, our average time to resolve Incidents hasn't changed much yet... It means that analysts haven't really started to change anything towards Incident handling. ie: they are not giving more priority or effort towards this task today than before.

    How will we make improvements happen then?

    1. In a few weeks, we'll start automatic and Service Desk escalation to management for each and every Incident that is not resolved as fast as expected. This should be an incentive to improve behavior...
    2. We will work closely with the teams and analysts that do not resolve Incidents in time. Instead of bothering everyone, we will focus on teams that are not providing a proper response, and focus on Incidents that were not handled in time. This should help teams focus on Incidents according to priorities - and will avoid unnecessary escalation to management.
    3. We will advertise team performance. That will show management and teams that other teams can do better.
    Limitations with this measure

    Although that plain measure is very good from a high level and customer perspective, we will need narrower measures to decide on specific actions to further improve performance. The problem with this global "average time to resolve" measure is that it is influenced by factors that are not related to speed at which analysts try to solve Incidents. Ex:

    1. To which extent does level 1 support record Incidents?

      At our company, Service Desk specialists tend to overlook Incident recording, specially when an immediate resolution is provided! When management increases or decreases SD focus on Incident recording, there is a dramatic impact on this measure - much more than any other increase or decrease of service support quality!

      Of course we need to make sure that we do as much as we can to facilitate Incident recording - at least for the getting a more accurate picture of the situation. This is part of our project actually. It will make us look more efficient but in reality this will not be providing immediate improvement to the user community.
    2. Some Incidents can only be resolved with changes.

      If we want to really improve the situation when installations or code changes etc are required, we will need to review our procurement policies, contracts with suppliers, organization of support, etc. . This has not been tackled by our project, so there are few chances things improve in this area right now.

    Also, it is an "average" which means that few exceptional bad/difficult Incidents can drag the results in such a way that the average is not representative of the quality of service provided. Next, it does not take into account that some Incidents need to be resolved extremely fast, because they have a high impact, and some Incidents may be resolved later than other Incidents, because they have less impact on our customers.

    For these reasons, we should probably pick another measure... why not, "Percentage of Incidents resolved in time"?

    I'll discuss this in my next post.




Friday, July 14, 2006

How the Incident Mgt changes were received by the teams...

Well, we had a mix of attitudes and reactions.

For most of them, it sounded all logical - some even questioned why we were not already doing it before.

Some were more reluctant, focusing more on the "why we need to do this" than on the "how can I make it happen".

Some were even clearly against the changes that they will need to go through. However, as they went through a process of analysing and accepting the move, they now seem to be the most supportive ones!

The most difficult will be handling the "silent majority". Those who do not have any comment , that seem to cope with the change, but actually themselves do not plan to change anything in their good old habits...

Thursday, July 13, 2006

The roll-out of our first Incident Management initiatives is completed!

We got everyone trained to our new processes, and to the changes in the tool. This was not an easy task - when you get to the 20th training session you feel like you are repeating yourself a little! The good thing though, was that from session to session, we were able to improve the balance between
  1. theory, ie "What is expected from all of us, why we need to change, what are the roles & responsibilities",,
  2. fun, with a movie + exercise that demo the shortcomings of the "quick and dirty" old way of dealing with Incidents,
  3. practice, with hands on exercises on the tool,

shifting from 60% theory, 15% fun, 25% exercises 3h sessions to 40% theory, 20% fun, 40% exercises 2h30 sessions. For the unlucky remote teams that received a remote training, we had to skip the fun part, and replace the exercises by a quick demo, to squeeze the whole session down 1h to 1h30!

After completing this roll-out, we decided to delay two of the other initiatives of 2 to 3 months, in order to secure the first wave of deployments. Teaching about change, and making sure habits change is not straightforward, and will require careful monitoring and tuning for a few months. If we don't want to waste what we have already done, we need to secure these changes!