Skip to content

A consistent-hashing relay for statsd and carbon metrics

License

Notifications You must be signed in to change notification settings

simonhf/statsrelay

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Statsrelay

Statsrelay is a consistent-hashing relay for statsd and carbon metrics.

Build Status Coverity Status Mailing List

License

MIT License Copyright (c) 2016 Uber Technologies, Inc.

Build

Dependencies:

  • automake
  • pkg-config
  • libev (>= 4.11)
  • libyaml

On Debian/Ubuntu:

apt-get install automake pkg-config libev-dev libyaml-devel

./autogen.sh
./configure
make clean
make
make check
make install

Use

Usage: statsrelay [options]
  -h, --help                   Display this message
  -v, --verbose                Write log messages to stderr in addition to syslog
                               syslog
  -l, --log-level              Set the logging level to DEBUG, INFO, WARN, or ERROR
                               (default: INFO)
  -c, --config=filename        Use the given hashring config file
                               (default: /etc/statsrelay.yaml)
  -t, --check-config=filename  Check the config syntax
                               (default: /etc/statsrelay.yaml)
  --version                    Print the version
statsrelay --config=/path/to/statsrelay.yaml

This process will run in the foreground. If you need to daemonize, use start-stop-script, daemontools, supervisord, upstart, systemd, or your preferred service watchdog.

By default statsrelay binds to 127.0.0.1:8125 for statsd proxying, and it binds to 127.0.0.1:2003 for carbon proxying.

For each line that statsrelay receives in the statsd format "statname.foo.bar:1|c\n", the key will be hashed to determine which backend server the stat will be relayed to. If no connection to that backend is open, the line is queued and a connection attempt is started. Once a connection is established, all queued metrics are relayed to the backend and the queue is emptied. If the backend connection fails, the queue persists in memory and the connection will be retried after one second. Any stats received for that backend during the retry window are added to the queue.

Each backend has its own send queue. If a send queue reaches max-send-queue bytes (default: 128MB) in size, new incoming stats are dropped until a backend connection is successful and the queue begins to drain.

All log messages are sent to syslog with the INFO priority.

Upon SIGHUP, the config file will be reloaded and all backend connections closed. Note that any stats in the send queue at the time of SIGHUP will be dropped.

If SIGINT or SIGTERM are caught, all connections are killed, send queues are dropped, and memory freed. statsrelay exits with return code 0 if all went well.

To retrieve server statistics, connect to TCP port 8125 and send the string "status" followed by a newline '\n' character. The end of the status output is denoted by two consecutive newlines "\n\n"

stats example:

$ echo status | nc localhost 8125

global bytes_recv_udp gauge 0
global bytes_recv_tcp gauge 41
global total_connections gauge 1
global last_reload timestamp 0
global malformed_lines gauge 0
backend:127.0.0.2:8127:tcp bytes_queued gauge 27
backend:127.0.0.2:8127:tcp bytes_sent gauge 27
backend:127.0.0.2:8127:tcp relayed_lines gauge 3
backend:127.0.0.2:8127:tcp dropped_lines gauge 0

Config Options

There are a few options you can use to control the behavior of statsrelay, which can be set in /etc/statsrelay.yaml. Here is a minimal config:

carbon:
  bind: 127.0.0.1:9085
  tcp_cork: true
  validate: true
  shard_map:
    0: 10.10.10.10:9085

Besides the bind and shard_map settings (explained elsewhere), here is what the boolean options do:

  • tcp_cork enabled the TCP_CORK option on TCP sockets; it's enabled by default and in many cases will significantly decrease the number of small TCP sockets that statsrelay emits (at a small penalty of up to 200ms latency in some cases).
  • validate tries to validate incoming data before forwarding it to statsd or carbon; it's on by default

Scaling With Virtual Shards

Statsrelay implements a virtual sharding scheme, which allows you to easily scale your statsd and carbon backends by reassigning virtual shards to actual statsd/carbon instance or servers. This technique also applies to alternative statsd implementations like statsite.

Consider the following simplified example with this config file:

statsd:
  bind: 127.0.0.1:8125
  validate: true
  shard_map:
    0: 10.0.0.1:9000
    1: 10.0.0.1:9000
    2: 10.0.0.1:9001
    3: 10.0.0.1:9001
    4: 10.0.0.2:9000
    5: 10.0.0.2:9000
    6: 10.0.0.2:9001
    7: 10.0.0.2:9001
carbon:
  ...

In this file we've defined two actual backend hosts (10.0.0.1 and 10.0.0.2). Each of these hosts is running two statsd instances, one on port 9000 and one on port 9001 (this is a good way to scale statsd, since statsd and alternative implementations like statsite are typically single threaded). In a real setup, you'd likely be running more statsd instances on each server, and you'd likely have more repeated lines to assign more virtual shards to each statsd instance. At Uber we use 4096 virtual shards, with a much smaller number of actual backend instances.

Internally statsrelay assigns a zero-indexed virtual shard to each line in the file; so 10.0.0.1:9000 has virtual shards 0 and 1, 10.0.0.1:9001 has virtual shards 2 and 3, and so on.

Let's say that the backend server 10.0.0.1 has become overloaded, and we want to add a new server to the configuration. We might do that like this:

statsd:
  bind: 127.0.0.1:8125
  validate: true
  shard_map:
    0: 10.0.0.1:9000
    1: 10.0.0.3:9000
    2: 10.0.0.1:9001
    3: 10.0.0.3:9001
    4: 10.0.0.2:9000
    5: 10.0.0.2:9000
    6: 10.0.0.2:9001
    7: 10.0.0.2:9001
carbon:
  ...

In the new configuration we've moved one of the two virtual shards for 10.0.0.1:9000 to 10.0.0.3:9000, and we've moved one of the two virtual shards for 10.0.0.1:9001 to 10.0.0.3:9001. In other words, we've reassigned the mapping for virtual shard 1 and virtual shard 3. Note that when you do this, you want to maintain the same number of virtual shards always, so you probably want to pick a large number of virtual shards to start (say, 1024 virtual shards, meaning the configuration file should have 1024 lines). You should have many duplicated lines in the config file when you do this.

To do optimal shard assignment, you'll want to write a program that looks at the CPU usage of your shards and figures out the optimal distribution of shards. How you do that is up to you, but a good technique is to start by generating a statsrelay config that has many virtual shards evenly assigned, and then periodically have a script that finds which actual backends are overloaded and reassigns some of the virtual shards on those hosts to less loaded hosts (or to new hosts).

If you don't initially assign enough virtual shards and then later expand to more, everything will work, but data migration for carbon will be a bit trickier; see below.

A Note On Carbon Scaling

Statsrelay can do relaying for carbon lines just like statsd. The strategy for scaling carbon using virtual shards is exactly the same. One important difference, however, is that when you move a carbon shard you'll want to move the associated whisper files as well. You can do this using the stathasher binary that is built by statsrelay. By pointing that command at your statsrelay config, you can send it key names on stdin and have the virtual shard ids printed to stdout.

Using this technique you can script the reassignment of whisper files. The general idea is to walk the filesystem and gather all of the unique keys stored in carbon backends on a host. You can then get an idea for how expensive each virtual shard is based on the storage space, number of whisper files, and possibly I/O metrics for each virtual shard. By gathering the weights for each virtual shard on a host, you can figure out the optimal way to redistribute the mapping of virtual shards to actual carbon backends.

Note that when you move carbon instances, you also probably want to migrate the whisper files as well. This ensures that you retain historical data, and that graphite will get the right answer if it queries multiple carbon backends. You can migrate the whisper files by rsyncing the files you've identified as belonging to a moved virtual shard using the stathasher binary described above. Remember to take care to ensure that the old whisper files are deleted on the old host.

About

A consistent-hashing relay for statsd and carbon metrics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C 73.5%
  • Python 24.4%
  • Other 2.1%