Monitoring Your Colocation (Part 1)

So you’ve got your equipment moved into the data center – your networking is configured, and everything is up and running smoothly. Time to kick back and crack open a beer, right? Well yes, do reward yourself for a job well done… but much as we’d like colocation to be a “set it and forget it” proposition, you do need a means in place to monitor a few aspects of your colocation in an ongoing way, as keeping an eye on things is one of  the ongoing responsibilities that you bear.

Monitoring is a often-neglected part of the overall colocation strategy. Basic monitoring does not take very much effort to set up, and it can be a life-saver when the chips are down. Consider this: would you rather find out about problems as soon as they arise (from your automated monitoring systems), or – ugh – from your clients complaining about sluggish or completely unavailable systems, perhaps hours into the problem? The former, I’m sure, so read on.

With colocation, monitoring tasks fall into three major groups: usage, performance, and uptime. In this topic we’ll take a look at each, and review some options for addressing monitoring needs.

Monitoring Usage

The two resources you’ll need to monitor on an ongoing basis are bandwidth usage and power usage. As these resources are being sold to you by your provider(s), you can rest assured that they are already monitoring them. Some providers may make their monitoring data available to you, giving you an easy solution for keeping an eye on these factors. However, provider monitoring data may be lacking important detail, as thus may not be your best choice. For example, providers will typically monitor bandwidth purely in terms of raw bits in and bits out. Additional data such as the type of traffic (http, smtp, dns, etc) and the remote host addresses involved, is unlikely to be available. With regards to power, your provider will monitor your sustained usage, but my have no information on variations in usage as a function of time, or historical data on when changes in sustained usage occurred – both can be important when attempting to analyze and understand changes in demand for power.

If you are happy with the monitoring data from your provider, so be it. Otherwise, consider implementing your own monitoring to supplement the data from your provider.

One of the most popular monitoring software solutions is Cacti. Cacti contains a wealth of features for monitoring a wide variety of devices – just about anything that supports SNMP (Simple Network Management Interface) can be monitored and graphed by Cacti. Alerts are also a function supported by Cacti, so with a properly configured system, emails alerts can be generated for a variety of events, such as in interface going down, usage crossing a threshold, etc. Cacti is community supported and free (donations gladly accepted).

Bandwidth usage graph from Cacti. Note: 95th percentile usage denoted by red line.
Bandwidth usage graph from Cacti. Note: 95th percentile calculation, denoted by red line.

Cacti (and most other systems), can also monitor power usage, provided that your power is supplied by a metered PDU with a network interface that supports SNMP.  Cacti can graph your power usage over time, just like a bandwidth graph, and provide alerting features as well. Other popular monitoring tools include Nagios, Zabbix, and the venerable oldster of the group, MRTG. All of these systems have their individual strengths and weaknesses and choosing one over the other comes down to your individual needs.

Another thing in common among the systems discussed is that they don’t gather Layer 3 (IP level) traffic data. Bandwidth data collected and graphed will be of the “bits in, bits out” variety, just like what you provider will offer. To get more info than that, you’ll need a system that gathers and analyses system logs from your Layer 3 devices that are receiving or forwarding the traffic (your servers and/or routers). There are many popular solutions for that task, including Apache Flume, fluentd, and Scribe. Each of these tools is designed to gather logs from many different devices, store them in one place, and provide (or work with) analysis tools. If that is more power than you need, an one-off log analysis tool can be installed directly on the server receiving traffic. One such tool that is quite popular is Wireshark.

A final point to consider: the tools discussed thus far are self-hosted solutions, meaning that you will install, configure and run them on your own server(s). If you don’t wish to take on the responsibility of administering your own monitoring systems, you can instead choose a hosted or cloud-based subscription monitoring service. These type of service provides you with a suite of ready-to-go monitoring and alerting tools, which you manage via a web interface, in exchange for a monthly fee. Once such service is Anturis.