Event Management Good Practices

ITIL v3 has introduced events. Even though some time has passed, a distinction between events and incidents can still cause a heated debate. In this post, I will attempt to show you how you can benefit from incorporating properly into your ITSM processes.

The Past

I remember being involved in monitoring vast number of batch jobs per day. Each of the jobs generated substantial number of entries into log tables and files. Certain events triggered automatic generation of incident tickets, but for the most part it was up to the console operator to determine if anything was going wrong.

Well, the truth was, that the only things that really mattered to the operators were job failures or significant processing delays. Only those events caused incidents. The rest were simply ignored. The number of events was just too big to pay attention. Before ITIL v3 they were not necessarily called events and there was a reason for it. An event either caused an incident, or just wasn't worth noticing, much less giving a name.

The Present

You might wonder what kind of events should you track, or when do they become incidents. The fact is that it very much depends on your IT architecture. However, ITIL gives you a rule of thumb in the incident definition. Let me frame that in the Event Management context. Incident occurs due to an event, or group of correlated events, which are causing or will cause a service interruption if not handled.

That means, you need to:

  1. Take a careful look at various events which are generated and recorded in your IT environment.
  2. Determine which events need handling, i.e. come up with a set of conditions.
  3. Implement a notification mechanism that will raise an incident or problem based on the pre-defined conditions.
Essentially, you are setting up the Correlation Engine, as ITIL defines it. However, keep in mind that people can only handle so much and the biggest strength lies in automation.

The Future

The IT Skeptic has an interesting discussion going about Event Management Best Practices, which actually led me to write this article. In one of the comments, John M. Worthington makes a statement about Service Monitoring Intelligence.

Those three words bring a lot of promise. Wouldn't you like to see more automation, where your support team is automatically alerted only when there is a need for human intervention? Or a self-learning Correlation Engine, which adapts based on past actions taken by the support staff?

OK, let's get to the ground again. The bottom line is: you will need to find out on your own which events are a cause for concern, and which are not. Think of them as device drivers, while ITIL is Windows. There is one Windows (well, sort of), but it runs on many devices and each of them needs to be programmed individually. So does your company.


Post a Comment