Joe Miller bio photo

Joe Miller

Ops/Engineering. Continuous DevOping at Webscale

Twitter Github

In a previous post I described two methods for routing metrics generated by Sensu clients to Graphite:

  • use a pipe handler to send metrics via TCP to graphite
  • use Graphite’s AMQP (rabbitmq) support.

Method #1 was simply described for completeness. It is not scalable and shouldn’t be used except for very small workloads. Pipe handlers involve a fork() by sensu-server for every metric received.

At the time I recommended method #2 which was more efficient - Sensu would simply copy the metric from its own results queue to another queue that Graphite would be listening on, since both Sensu and Graphite can talk to RabbitMQ.

However, Graphite’s AMQP support is fairly lacking, in my opinion. It does not seem to be getting much attention on the regular Graphite support forums and the code around AMQP has not changed much. The docs section describing its configuration remains an empty TODO.

The main reason I don’t like the AMQP approach anymore is that it does not work well with Graphite clusters. I prefer to build a Graphite cluster where each node is identically configured. Each node would connect to a an AMQP queue, pop a metric off of the queue in a load-balanced fashion, then let carbon-relay’s routing rules figure out where to send the metric. It does not work this way. Instead each graphite node would pull each metric posted to the queue, duplicating effort on each node in the cluster. This is wasteful and limits the capacity of the cluster needlessly.

Newer, better ways

My new preferred method for sending metrics to Graphite is to use TCP with a load-balancer in front of Graphite’s carbon-relay instances in the case of a multi-node cluster.

This was not really possible when the initial blog post was written, but since that time Sensu has added support for extensions handlers in addition to the original pipe handlers. Extensions are Ruby code that is loaded and run inside the sensu-server process. They are much more efficient than fork()’ing to handle each event.

There are two extension handlers available for sending metrics to Graphite:

  • Sensu-server TCP handler: Ships with sensu-server. Very simple, takes the event['output'] string and sends it untouched over a TCP socket to a destination.
  • @grepory’s WizardVan: More features, supports OpenTSDB and Graphite, buffering support, re-connect, backoff, etc.

Here is a quick example of configuring each of these extension handlers.

Sensu-server TCP Handler

Configuring the TCP handler that ships with Sensu is easy and is documented in the handlers section of the Sensu docs.

The TCP handler is very basic and will simply copy the output of the check directly over the socket. This works out fine for most Sensu metric checks since the defacto standard for most is to output graphite’s line-oriented format.

Example tcp handler:

{
  "handlers": {
    "graphite_line_tcp": {
      "type": "tcp",
      "socket": {
        "host": "metrics.dom.tld",
        "port": 2003
      }
    }
  }
}

Add the graphite_line_tcp handler to your metric checks:

{
  "checks": {
    "vmstat_metrics": {
      "type": "metric",
      "handlers": ["graphite_line_tcp"],
      "command": "/etc/sensu/plugins/vmstat-metrics.rb --scheme stats.:::name:::",
      "interval": 60,
      "subscribers": ["webservers"]
    }
  }
}

WizardVan (aka, sensu-metrics-relay) Extension

A more advanced TCP extension handler is available from @grepory and goes by the code-name WizardVan or sensu-metrics-relay (same thing, but I was confused for a moment).

WizardVan does not come shipped with Sensu but installation instructions are available on its Github page. In the future it may be easier to install by shipping as a rubygem.

WizardVan also takes advantage of another newer Sensu feature known as mutators which provide the ability for WizardVan to send metrics to either Graphite or OpenTSDB or both.

By default, WizardVan assumes that metrics are in Graphite format and so configuring it for use with Graphite is straight-forward:

Here is a general example for configuring WizardVan. See the docs for more options.

{
  "handlers": {
    "relay": {
        "graphite": {
            "host": "graphite.dom.tld",
            "port": 2003
        },
        "opentsdb": {
            "host": "tsdb.dom.tld",
            "port": 4424
        }
    }
  }
}

For further information on configuring WizardVan see the README.

NOTE: Unless you have a very high (hundreds/sec) rate of metrics you may need to lower WizardVan’s MAX_QUEUE_SUZE to something less than 16KB (try 128). Hopefully soon this will be configurable instead of hardcoded.