Tuesday, December 15, 2009

Network Troubleshooting - some thoughts

After doing several late night and weekend cutover and integration projects I realized that much of my network troubleshooting ability is not based on a specific checklist of items (though I am going to build one) but on disciplines that I learned while in engineering school.

Specifically, much of my time is spent gathering known variables, quickly writing up what the problem statement and conditions are and then forming a hypothesis that I can work off of to solve the problem at hand. Often, keeping the scope of the problems small and discrete helps, utilizing the engineering principle of KISS (Keep It Simple Stupid) minimize the impact of feeling overwhelmed by an issue. This is especially true when you are in a stressful situation due to limited time or the inability to rollback a solution - both are to be avoid at all cost but often in the world of consulting they are why you are brought into a project in the first place. I guess that goes with the territory.

There are lots of great resources out there that define the principles of engineering so I won't bother with links to those. I have found that the major process I use is not so different from the design principles you use to design and deploy networks. Cisco has a whole methodoly build around this and those that have suffered through their study materials know the PPDIOO mantra.

I think I prefer the more classic engineering school outline, something like:
Identify and understand the problem
Gather information
Generate several solutions
Choose the best of those solutions (KISS)
Prototype the solution
Deploy the solution
Redesign/Retest
Report on results

Each problem you face can be broken down and solved with this method, the difference between those who do network troubleshooting well and those who do not seems to be in the ability to quickly gather and analysis a situation, pick the best solution and then rapidly deploy and tweek the solution. This often comes with age and experience but I am amazed at how many colleagues I have watched over the years who do not follow any of these principles while troubleshooting problems. I primarily have noticed the lack of the engineering principle trait in those who did not go through formal engineer school or a technical trade. I have found that those with a military background adhere to a similar principle though slightly adjusted to the function the military provides. They seem to function well in solving these sort of technical problems also so clearly there is a reason I meet some many former military in the networking field.

Another funny item I have noticed over the years is the fact that no matter how much planning and scripting you do it is the small things that seem to get you. I am not entirely sure why this is, perhaps they are easily overlooked while planning and deploying. Regardless, to catch them the KISS principle is remarkably useful in ferreting out those small problems.

I by no means claim that my engineering degree makes me a better troubleshooter than some of my peers who do not have one, I have meet some amazing people in this field and clearly some people simply have the native instinct and problem solving skills needed to outperform others.
- Ed

1 comment:

Network diagnostic tools said...

Networks are like children. There are days when they behave the way you expect, and there are days when they do anything but. You said it right "no matter how much planning and scripting you do it is the small things that seem to get you".