Archive for the “Uncategorized” Category

A bad habit I’ve noticed in many a techie:

The tendency to thrash around and wildly speculate about the root cause of whatever production issue they’re facing. They tweak code and configuration following some random hypothesis or another, hoping that the issue will magically go away. It must surely be clear that this is a horribly inefficient way to solve a problem?

What’s required is data, data that we can use to home in on the source of the fault. We could wade through log files but this is inefficient and ought to be the last resort. Ideally we’d have some idea of what to look for beforehand.

Instrumentation is one tool we can use to guide our efforts. It can tell us things like how much memory is used, how much load there is, how many users are logged in, rate and types of request, cache hits etc.

Self-tests are also useful as they can exercise common operations, perform internal consistency checks and provide feedback on what’s working and not.

We can also get online memory dumps and there are tools like dtrace and tcpdump.

Given all these possibilities, why do we indulge in wild speculation? Perhaps it’s because we’ve foolishly left ourselves no choice:

  1. Instrumentation that should be a rich source of useful information is often limited to what is available from the operating system because we neglect to instrument our own code.
  2. As with instrumentation, we don’t make the time to implement self-test facilities.
  3. Only a few of us bother to learn about tools such as dtrace.
  4. Logging even if we could wade through it all is implemented in such a fashion that it cannot be turned on in production because the performance cost is too high.
  • Share/Bookmark

Comments 6 Comments »