TODO
TODO
numbersd has identical aggregation characteristics to StatsD. It differs significantly in terms of philosophy and intended usage. Below are some of the behaviours available.
A listener is a scheme, host, and port specification for a listen socket which will accept and parse
metrics from incoming connections. They are specified with either a tcp://
or udp://
scheme to
control the type of listening socket.
Multiple listeners can be passed as a comma seperated list to the --listeners
flag
to listen upon multiple ports and protocols simultaneously.
If an HTTP port is specified, numbersd will start an embedded HTTP server. GET requests to the following request paths will be responsed with an appropriate content type:
/overview.json
Internal counters and runtime information./numbersd.whisper
Low resolution time series in Graphite compatible format. (Identical to&rawData=true
)/numbersd.json
JSON representation of the.whisper
format above
The .whisper
response type is intended to be used from Nagios or other monitoring tools
to connect directly to a numbersd
instance running alongside an application.
There are a number of check_graphite
Nagios NPRE plugins available which should work identically
to pointing directly at an instance of Graphite.
(Yes, plural)
As with all list styled command flags a list of tcp schemed URIs can be specified to simultaneously connect to multiple backend Graphite instnaces.
Broadcasters perform identically to StatsD's repeater
backend. They simply forward on received metrics
to a list of tcp and udp schemed URIs.
The intent being, you can listen on TCP and then broadcast over a UDP connection, or vice versa.
Downstreams again take a list of tcp and udp schemed URIs, with the closest simalarity being StatsD's
statsd-backend
plugin.
The metrics that can be safely aggregated without losing precision or causing 'slopes' (such as counters)
are forwarded upon flush
, all the others are forwarded unmodified.
The intent of many of the behaviours above, was to provide more granular mechanisms for scaling and organising a herirachy of metric aggregators. Here are some scenarios that prompted the development of numbersd.
Using UDP for stats delivery is great, it makes it very easy to write a client and emit metrics but due to the lack of reliable transmission it makes unsuitable for more critical tasks, even on a network under your control.
An example monitoring workflow I've observed in production, looks something like:
- Application emits unreliable UDP packets that are (hopefully) delivered to a monolithic aggregator instance.
- The aggregator sends packets over a TCP connection to Graphite.
- Nagios invokes an NPRE check on the application host.
- The NPRE check reaches out across the network to the Graphite API to quantify application health.
There are 4 actors involved in Figure 1: the Application, Network, Aggregator, Graphite, and Nagios.
For monitoring to be (remotely) reliable we have to make some assumptions .. so in this case lets' remove the Network (assume reliable UDP transmission) and Nagios (13 year old software always works) from the equation.
If either the aggregator, or Graphite is temporarily unavailable the NPRE check local to the application will fail and potentially raise a warning/critical alert.
By removing both the aggregator and Graphite from the monitoring workflow, it becomes a romantic dinner date for two between the application and Nagios:
- The application emits UDP packets via the loopback interface to a local numbersd daemon.
- NumbersD pushes metrics over a TCP connection to Graphite.
- Nagios invokes an NPRE check on the application host.
- The NPRE check calls the local numbersd daemon's
/numbersd.whisper
time series API.
This has two primary advantages. Firstly, reliability - by ensuring UDP packets are only transmitted on the localhost. And secondly, by seperating the concerns of metric durability/storage/visualisation and monitoring, two separate single point of failures have been removed from the monitoring workflow.
Multiple Graphites
A conceited federation heirarchy
Broadcast metrics to a single point, where the monitoring check happens
At present, it is assumed the user knows some of the Haskell eco system and in particular wrangling cabal-dev to obtain dependencies. I plan to offer pre-built binaries for x86_64 OSX and Linux in future.
You will need reasonably new versions of GHC and the Haskell Platform which
you can obtain here, then run make install
in the root directory to compile numbersd.
There is also a Chef Cookbook which can be used to install numbersd, if that's how you swing: https://github.com/brendanhay/numbersd-cookbook
Command line flags are used to configure numbersd, a full table of all the flags is available here.
Flag | Default | Format | About | Statsd Equivalent |
---|---|---|---|---|
--listeners |
udp://0.0.0.0:8125 |
URI,.... |
Incoming stats UDP address and port | address , port |
--http |
PORT |
HTTP port to serve the overview and time series on | mgmt_address , mgmt_port |
|
--resolution |
60 |
INT |
Resolution in seconds for time series data | |
--interval |
10 |
INT |
Interval in seconds between key flushes to subscribed sinks | flushInterval |
--percentiles |
90 |
INT,... |
Calculate the Nth percentile(s) for timers | percentThreshold |
--events |
EVENT,... |
Combination of receive, invalid, parse, or flush events to log | debug , dumpMessages |
|
--prefix |
STR |
Prepended to keys in the http interfaces and graphite | log |
|
--graphites |
URI,... |
Graphite hosts to deliver metrics to | graphiteHost , graphitePort |
|
--broadcasts |
URI,... |
Hosts to broadcast raw, unaggregated packets to | repeater |
|
--downstreams |
URI,... |
Hosts to forward aggregated, statsd formatted counters to | statsd-backend |
URI
Combination of scheme, host, and port. The scheme must be one of(tcp|udp)
.PORT
Port number. Must be within the valid bindable range for non-root users.INT
A valid Haskell Int type.STR
An ASCII encoded string.EVENT
Internal event types must be one of(receive|invalid|parse|flush)
.[...]
All list types are specified a comma seperated string containing no spaces. For example:--listeners udp://0.0.0.0:8125,tcp://0.0.0.0:8126
is a valid[URI]
list.
After a successful compile, the ./numbersd
symlink should be pointing to the built binary.
For any problems, comments or feedback please create an issue here on GitHub.
numbersd is released under the Mozilla Public License Version 2.0