Syslog


Syslog is actually much tougher to talk about than most other monitoring subjects.  Why?  At its root it is incredibly simple.  Take a log entry, add a severity code, a facility code, a timestamp and host information then send it across the network to a central logging server via UDP on port 514.

Besides being simple, there isn’t an official standard for syslog.  At its core, what I have described above is all there is to syslog.  Some vendors have added their own custom formatting within syslog messages.  There is an RFC that describes and tries to ‘contain’ syslog based on observations from many popular syslog instances.  There is also and RFC that defines a standard for syslog going forward while still trying to support the legacy simplicity of syslog being used by the majority of syslog implementations.  The problem is that syslog is so old, and just like trying to teach an old dog new tricks, most vendors aren’t going to re-write their syslog implementations to follow new guidelines unless there is a compelling and valuable reason.  Compelling and valuable are not something that the new standard attempts include.

So let’s get into it:

I mentioned that syslog uses UDP port 514 by default.  UDP is unreliable.  Syslog hands off the message to UDP and says send it.  There is no mechanism to verify it was received or that there is even a device on the remote side to receive it.  There have been too many times that I have worked in an environment where I found that the log server that devices were sending to didn’t exist, or they existed but were so overloaded that they were dropping runtime messages (imagine how many more messages will be dropped during a wide scale event).  Over time, legacy configs were being copied and more and more syslog messages were sent to nowhere or a server that couldn’t handle the load.  To help solve the unreliable transport limitation of UDP, some implementations of syslog support TCP as the transport protocol.  If the sender and receiver both support syslog over TCP, you do get reliable delivery at the transport level (and longer messages lengths with TCP fragmentation).  Unfortunately, most syslog is simply handing the message down to the transport, now TCP, and saying deliver.  If TCP can’t establish a connection, times out, or runs into another error, like the remote side drops the connection due to capacity or throttling, there is no mechanism for syslog to cache or hold messages to resend once connectivity is established.

There is also no security included in syslog.  Syslog is meant to simply send the message in plain text from a source to a destination in the easiest way possible.  Some devices and software do support adding encryption, like TLS, at the transport level.  Again, it must be supported and compatible on both the sending end and the receiving end.  TLS runs over TCP, so you do get all of the benefits of TCP listed above with TLS.

From a monitoring perspective, besides all the problems above, the message that is sent for a particular event and device is unique.  No two vendors write syslog messages the same.  To make matters worse, some vendors don’t use the same messages across multiple of their own products.  Also, syslog messages can change with software upgrades (not that this is common).  To complicate matters, there is no dictionary of syslog messages that I have ever found from any vendor (if you know where to find one, please let me know).  To determine what syslog message is sent for a particular event, you need to replicate that event and see what syslog message is returned.  Easy if it is an interface down.  More complicated if it is a fan outage or a temperature exceeded event that requires more effort to simulate.  There are some configuration and support guides that can help with finding syslog messages generated for specific events, but this will only get you part of the way, the rest is all through testing.

I do believe that syslog does have a key part in an organization’s comprehensive monitoring plan, but not as the primary source of event detection.  A comparison of all monitoring protocols and where they fit into a comprehensive plan will follow in a future article.

Two recommendations that I follow, when available, for devices configured to send syslog messages:

  1. Set time to UTC if you are a global organization.  If your company has one office or all offices in the same time zone, this is overkill.
  2. Add sequence numbers to syslog messages.  This allows you to determine if messages are missing when reviewing the logs. 

Leave a Reply

Your email address will not be published. Required fields are marked *