Chaos

  • You don’t know whether a program has failed until a user tells you.
  • You have no idea why the program failed.
  • You don’t even know whether the program is wrong or the user’s expectations are wrong because the program or business process is completely undocumented.
  • Your support team can’t read the logs because they don’t have access to the server.

Normal

  • When a customer tells you that something isn’t working, you can look at the logs and sometimes find out where it failed.
  • Sometimes you have enough information in the logs to look in the database and find out why the program failed.
  • Sometimes you even know which users were affected.
  • The program filled up a server’s disk because circular logging wasn’t enabled and you caused an outage.
  • You enabled circular logging but set it to 10MB which only lasts for 60 seconds so you can’t find anything useful anyway.
  • Your support team can’t read the logs because the logs are so big that their text editor crashes.

Useful

  • Log messages are recorded at severity levels which match their actual severity, i.e.: ERROR isn’t used for DEBUG messages.
  • Log messages contain enough information to investigate problems.
  • Log messages are retained for long enough to be useful.
  • When you get the time, you occasionally look at the logs create a support ticket if you find an error or warning.
  • Your support team can’t read the logs because they can’t read a stack trace and infer meaning from it, other than “it’s broken”.
  • There are so many errors being logged that you wouldn’t dare to get the system to email you in case you crash the email server.

Elite

  • Logs are collected centrally and retained for a set period using something like logstash or a RabbitMQ log4net appender.
  • You can produce a graph of error rates for applications over time.
    • This is regularly reviewed and attempts made to reduce the number of errors.
  • Your support team can read and understand the messages, because they make sense.
  • An error being logged is actually cause for concern.
  • Your support team contact customers who’ve experienced a problem to apologise and explain what’s being done to resolve the issue before the customer has had chance to raise a support ticket.
  • You feel a deep sense of inner calm.
Advertisements