Skip to content

virtua-network/watchman

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Watchman

A file watching service.

Purpose

Watchman exists to watch files and record when they actually change. It can also trigger actions (such as rebuilding assets) when matching files change.

System Requirements

Watchman is known to compile and pass its test suite on:

  • Linux systems with inotify
  • OS X and BSDish systems (FreeBSD 9.1, OpenBSD 5.2) that have the kqueue(2) facility
  • Illumos and Solaris style systems that have port_create(3C)

Watchman relies on the operating system facilities for file notification, which means that you will likely have very poor results using it on any kind of remote or distributed filesystem.

Concepts

  • Watchman does not follow symlinks. It knows they exist, but they show up the same as any other file in its reporting.
  • Watchman encourages you to avoid race conditions by providing an abstract clock notation, named cursors to aid in maintaining proper state around polling operations and a command trigger facility.
  • Watchman waits for a watched directory to settle down before it will start to trigger notifications or command execution.
  • Watchman is conservative, preferring to err on the side of caution, which means that it considers the files to be freshly changed when you start to watch them.

Usage

Watchman is provided as a single executable binary that acts as both the watchman service and a client of that service. The simplest usage is entirely from the command line, but real world scenarios will typically be more complex and harness the streaming JSON protocol directly.

Command Line Options

The following options are recognized if they are present on the command line before any non-option arguments.

 -U, --sockname=PATH   Specify alternate sockname

 -o, --logfile=PATH    Specify path to logfile

 -p, --persistent      Persist and wait for further responses

 -n, --no-save-state   Don't save state between invocations

 --statefile=PATH      Specify path to file to hold watch and trigger state

 -f, --foreground      Run the service in the foreground

 -s, --settle          Number of milliseconds to wait for filesystem to settle

 -j, --json-command    Instead of parsing CLI arguments, take a single json
                       object from stdin

 --no-pretty           Don't pretty print the JSON response
  • See log-level for an example of when you might use the persistent flag
  • The default settle period is 20 milliseconds

Environment and Files

Watchman will use the $USER or $LOGNAME environmental variables to determine the current userid, and the $TMPDIR or $TMP environmental variables to determine a temporary directory, falling back to /tmp if neither are set.

If you configured watchman using the --enable-statedir option:

  • The default sockname is <STATEDIR>/$USER
  • The default logfile is <STATEDIR>/$USER.log
  • The default statefile is <STATEDIR>/$USER.state. You can turn off the use of the state file via the --no-save-state option. The statefile is used to persist watches and triggers across process restarts.

Otherwise:

  • The default sockname is $TMP/.watchman.$USER
  • The default logfile is $TMP/.watchman.$USER.log
  • The default statefile is $TMP/.watchman.$USER.state. You can turn off the use of the state file via the --no-save-state option. The statefile is used to persist watches and triggers across process restarts.

Available Commands

A summary of the watchman commands follows. The commands can be executed either by the command line tool or via the JSON protocol. When using the command line, be aware of shell quoting; you should make a point of quoting any filename patterns that you want watchman to process, otherwise the shell will expand the pattern and pass a literal list of the files that matched at the time you ran the command, which may be very different from the set of commands that match at the time a change is detected!

A quick note on JSON: in this documentation we show JSON in a human readable pretty-printed form. The watchman client executable itself will pretty-print its output too. The actual JSON protocol uses newlines to separate JSON packets. If you're implementing a JSON client, make sure you read the section on the JSON protocol carefully to make sure you get it right!

Where you see [patterns] in the command syntax, we allow filename patterns that match according the following rules:

Pattern syntax

  • We support fnmatch(3) glob style pattern matches.
  • A ! followed by space followed by a pattern will negate the sense of the pattern match, so ! *.js matches any file(s) that do not match the glob *.js.
  • A -p indicates that the following pattern is a perl compatible regular expression. For example, \.c$ matches files with a .c suffix. You need to specify the -p switch for each pattern, although it is recommended that you condense a series of patterns into an alternation instead, as this is faster. For example, strongly prefer a single (foo|bar) pcre over two separate foo and bar patterns.
  • A -P indicates that the following pattern is a perl compatible with the PCRE_CASELESS flag set. In other words, the pattern will be matched case insensitively.
  • A -X switches the pattern parser into exclusion mode. Exclusion mode remains enabled until it is turned off by -I. Patterns that parsed in exclusion mode will cause matched files to be excluded from the set of files that we return in the match results. This can be useful to specify a set of files that you don't care about at the start of your pattern set, then later on use -I to indicate the set that you do want to include.
  • A -I switches the pattern parser into inclusion mode, which is the default.
  • A -- indicates the end of the set of patterns.

Command: watch

Requests that the specified dir is watched for changes. Watchman will track all files and dirs rooted at the specified path.

From the command line:

watchman watch ~/www

Note that, when you're using the CLI, you can specify the root as ~/www because the shell will resolve ~/www to /home/wez/www, but when you use the JSON protocol, you are responsible for supplying an absolute path.

JSON:

["watch", "/home/wez/www"]

Watchman will realpath(3) the directory and start watching it if it isn't already. A newly watched directory is processed in a couple of stages:

  • Establishes change notification for the directory with the kernel
  • Queues up a request to crawl the directory
  • As the directory contents are resolved, those are watched in a similar fashion
  • All newly observed files are considered changed

Unless no-state-save is in use, watches are saved and re-established across a process restart.

Command: trigger

Sets up a trigger such that if any files under the specified dir that match the specified set of patterns change, then the specified command will be run and passed the list of matching files.

From the command line:

watchman -- trigger /path/to/dir triggername [patterns] -- [cmd]

Note that the first -- is to distinguish watchman CLI switches from the second --, which delimits patterns from the trigger command. This is only needed when using the CLI, not when using the JSON protocol.

JSON:

["trigger", "/path/to/dir", "triggername", <patterns>, "--", <cmd>]

For example:

watchman -- trigger ~/www jsfiles '*.js' -- ls -l

or in JSON:

["trigger", "/home/wez/www", "jsfiles", "*.js", "--", "ls", "-l"]

If, say, ~/www/scripts/foo.js is changed, then watchman will chdir to ~/www then invoke ls -l scripts/foo.js.

Watchman waits for the filesystem to settle before processing any triggers, batching the list of changed files together before invoking the registered command.

Deleted files are counted as changed files and are passed the command in exactly the same way.

Watchman will arrange for the standard input stream of the triggered process to read from a file that contains a JSON array of changed file information. This is in the same format as the file list returned by the since command.

Watchman will limit the length of the command line so that it does not exceed the system configured argument length limit, which you can see for yourself by running getconf ARG_MAX.

If the length limit would be exceeded, watchman simply stops appending arguments. Any that are not appended are discarded; watchman will not queue up and execute a second instance of the trigger process. It cannot do this in a race-free manner. If you are watching a large directory and there is a risk that you'll exceed the command line length limit, then you should use the JSON input stream instead, as this has no size limitation.

Watchman will only run a single instance of the trigger process at a time. That avoids fork-bomb style behavior in cases where your trigger also modifies files. When the process terminates, watchman will re-evaluate the trigger criteria based on the clock at the time the process was last spawned; if a file list is generated watchman will spawn a new child with the files that changed in the meantime.

Unless no-save-state is in use, triggers are saved and re-established across a process restart.

Command: trigger-list

Returns the set of registered triggers associated with a root directory.

watchman trigger-list /root

Command: find

Finds all files that match the optional list of patterns under the specified dir. If no patterns were specified, all files are returned.

watchman find /path/to/dir [patterns]

Command: query (beta starting in v1.5)

watchman -j <<-EOT
["query", "/path/to/root", {
  "suffix": "php",
  "expression": ["allof",
    ["type", "f"],
    ["not", "empty"],
    ["ipcre", "test", "basename"]
  ],
  "fields": ["name"]
}]
EOT

Executes a query against the specified root. This example uses the -j flag to the watchman binary that tells it to read stdin and interpret it as the JSON request object to send to the watchman service. This flag allows you to send in a pretty JSON object (as shown above), but if you're using the socket interface you must still format the object as a single line JSON request as documented in the protocol spec.

The first argument to query is the path to the watched root. The second argument holds a JSON object describing the query to be run. The query object is processed by passing it to the query engine (see File queries below) which will generate a set of matching files.

The query command will then consult the fields member of the query object; if it is not present it will default to:

"fields": ["name", "exists", "new", "size", "mode"]

For each file in the result set, the query command will generate a JSON object value populated with the requested fields. For example, the default set of fields will return a response something like:

{
    "version": "1.5",
    "clock": "c:80616:59",
    "files": [
        {
            "exists": true,
            "mode": 33188,
            "name": "argv.c",
            "size": 1340,
        }
    ]
}

If the fields member consists of a single entry, the files result will be a simple array of values; "fields": ["name"] produces:

{
    "version": "1.5",
    "clock": "c:80616:59",
    "files": ["argv.c", "foo.c"]
}

Command: since

watchman since /path/to/dir <clockspec> [patterns]

Finds all files that were modified since the specified clockspec that match the optional list of patterns. If no patterns are specified, all modified files are returned.

The response includes a files array, each element of which is an object with fields containing information about the file:

{
    "version": "1.1",
    "clock": "c:80616:59",
    "files": [
        {
            "atime": 1357797739,
            "cclock": "c:80616:1",
            "ctime": 1357617635,
            "dev": 16777220,
            "exists": true,
            "gid": 100,
            "ino": 20161390,
            "mode": 33188,
            "mtime": 1357617635,
            "name": "argv.c",
            "nlink": 1,
            "oclock": "c:80616:39",
            "size": 1340,
            "uid": 100
        }
    ]
}

The fields should be largely self-explanatory; they correspond to fields from the underlying struct stat, but a couple need special mention:

  • cclock - The "created" clock; the clock value representing the time that this file was first observed, or the clock value where this file changed from deleted to non-deleted state.
  • oclock - The "observed" clock; the clock value representing the time that this file was last observed to have changed.
  • exists - whether we believe that the file exists on disk or not. If this is false, most of the other fields will be omitted.
  • new - this is only set in cases where the file results were generated as part of a time or clock based query, such as the since command. If the cclock value for the file is newer than the time you specified then the file entry is marked as new. This allows you to more easily determine if the file was newly created without having to maintain a lot of state.

Command: log-level

Changes the log level of your connection to the watchman service.

From the command line:

watchman --persistent log-level debug

JSON:

["log-level", "debug"]

This command changes the log level of your client session. Whenever watchman writes to its log, it walks the list of client sessions and also sends a log packet to any that have their log level set to match the current log event.

Valid log levels are:

  • debug - receive all log events
  • error - receive only important log events
  • off - receive no log events

Note that you cannot tap into the output of triggered processes using this mechanism.

Log events are sent unilaterally by the server as they happen, and have the following structure:

{
  "version": "1.0",
  "log": "log this please"
}

Command: log

Generates a log line in the watchman log.

watchman log debug "log this please"

Command: shutdown-server

This command causes your watchman service to exit with a normal status code.

Command: subscribe

Available starting in version 1.6

Subscribes to changes against a specified root and requests that they be sent to the client via its connection. The updates will continue to be sent while the connection is open. If the connection is closed, the subscription is implicitly removed.

This makes the most sense in an application connecting via the socket interface, but you may also subscribe via the command line tool if you're interested in observing the changes for yourself:

watchman -j -p <<-EOT
["subscribe", "/path/to/root", "mysubscriptionname", {
  "expression": ["allof",
    ["type", "f"],
    ["not", "empty"],
    ["suffix", "php"]
  ],
  "fields": ["name"]
}]
EOT

The example above registers a subscription against the specified root with the name mysubscriptionname.

The response to a subscribe command looks like this:

{
  "version":   "1.6",
  "subscribe": "mysubscriptionname"
}

When the subscription is first established, the expression term is evaluated and if any files match, a subscription notification packet is generated and sent, unilaterally to the client.

Then, each time a change is observed, and after the settle period has passed, the expression is evaluated again. If any files are matched, the server will unilaterally send the query results to the client with a packet that looks like this:

{
  "version": "1.6",
  "clock": "c:1234:123",
  "files": ["one.php"],
  "root":  "/path/being/watched",
  "subscription": "mysubscriptionname"
}

The subscribe command object allows the client to specify a since parameter; if present in the command, the initial set of subscription results will only include files that changed since the specified clockspec, equivalent to using the query command with the since generator.

["subscribe", "/path/to/root", "myname", {
  "since": "c:1234:123",
  "expression": ["not", "empty"],
  "fields": ["name"]
}]

The suggested mode of operation is for the client process to maintain its own local copy of the last "clock" value and use that to establish the subscription when it first connects.

Command: unsubscribe

Available starting in version 1.6

Cancels a named subscription against the specified root. The server side will no longer generate subscription packets for the specified subscription.

["unsubscribe", "/path/to/root", "mysubscriptionname"]

Clockspec

For commands that query based on time, watchman offers a couple of different ways to measure time.

  • number of seconds since the unix epoch (unix time_t style)
  • clock id of the form c:123:234
  • recommended: a named cursor of the form n:whatever

The first and most obvious is passing a unix timestamp. Watchman records the observed time that files change and allows you to find file that have changed since that time. Using a timestamp is prone to race conditions in understanding the complete state of the file tree.

Using an abstract clock id insulates the client from these race conditions by ticking as changes are detected rather than as time moves. Watchman returns the current clock id when it delivers match results; you can use that value as the clockspec in your next time relative query to get a race free assessment of changed files.

As a convenience, watchman can maintain the last observed clock for a client by associating it with a client defined cursor name. For example, you could enumerate all the "C" source files on your first invocation of:

watchman since /path/to/src n:c_srcs *.c

and when you run it a second time, it will show you only the "C" source files that changed since the last time that someone queried using "n:c_srcs" as the clock spec.

File queries

This section documents the new file query engine and syntax that is present starting in version 1.5.

Watchman file queries consist of 1 or more generators that feed files through the expression evaluator.

Generators

Watchman provides 4 generators:

  • since: generates a list of files that were modified since a specific clockspec
  • suffix: generates a list of files that have a particular suffix
  • path: generates a list of files based on their path and depth
  • all: generates a list of all known files

Generators are analagous to the list of paths that you specify when using the find(1) utility, but are implemented in watchman with a bit of a twist because watchman doesn't need to crawl the filesystem in realtime and instead maintains a couple of indexes over the tree.

A query may specify any number of generators; each generator will emit its list of files and this may mean that you see the same file output more than once if you specified the use of multiple generators that all produce the same file.

Expressions

A watchman query expression consists of 0 or more expression terms. If no terms are provided then each file evaluated is considered a match (equivalent to specifying a single true expression term).

Otherwise, the expression is evaluated against the file and produces a boolean result. If that result is true then the file is considered a match and is added to the output set.

An expression term is caonically represented as a JSON array whose zeroth element is a string containing the term name.

["termname", arg1, arg2]

If the term accepts no arguments you may use a short form that consists of just the term name expressed as a string:

"true"

Expressions that match against file names may match against either the basename or the wholename of the file. The basename is the name of the file within its containing directory. The wholename is the name of the file relative to the watched root.

allof

The allof expression term evaluates as true if all of the grouped expressions also evaluated as true. For example, this expression matches only files whose name ends with .txt and that are not empty files:

["allof", ["match", "*.txt"], ["not", "empty"]]

Each array element after the term name is evaluated as an expression of its own:

["allof", expr1, expr2, ... exprN]

Evaluation of the subexpressions stops at the first one that returns false.

anyof

The anyof expression term evaluates as true if any of the grouped expressions also evaluated as true. The following expression matches files whose name ends with either .txt or .md:

["anyof", ["match", "*.txt"], ["match", "*.md"]]

Each array element after the term name is evaluated as an expression of its own:

["anyof", expr1, expr2, ... exprN]

Evaluation of the subexpressions stops at the first one that returns true.

not

The not expression inverts the result of the subexpression argument:

["not", "empty"]

true

The true expression always evaluates as true.

"true"
["true"]

false

The false expression always evaluates as false.

"false"
["false"]

suffix

The suffix expression evaluates true if the file suffix matches the second argument. This matches files name foo.php and foo.PHP but not foophp:

["suffix", "php"]

Suffix expression matches are case insensitive.

match and imatch

The match expression performs an fnmatch(3) match against the basename of the file, evaluating true if the match is successful.

["match", "*.txt"]

You may optionally provide a third argument to change the scope of the match from the basename to the wholename of the file.

["match", "*.txt", "basename"]
["match", "dir/*.txt", "wholename"]

match is case sensitive; for case insensitive matching use imatch instead; it behaves identically to match except that the match is performed ignoring case.

pcre and ipcre

The pcre expression performs a Perl Compatible Regular Expression match against the basename of the file. This pattern matches test_plan.php but not mytest_plan:

["pcre", "^test_"]

You may optionally provide a third argument to change the scope of the match from the basename to the wholename of the file.

["pcre", "txt", "basename"]
["pcre", "txt", "wholename"]

pcre is case sensitive; for case insensitive matching use ipcre instead; it behaves identically to pcre except that the match is performed ignoring case.

To use this feature, you must configure watchman --with-pcre.

name and iname

The name expression performs exact matches against file names. By default it is scoped to the basename of the file:

["name", "Makefile"]

You may specify multiple names to match against by setting the second argument to an array:

["name", ["foo.txt", "Makefile"]]

This second form can be accelerated and is preferred over an anyof construction.

You may change the scope of the match via the optional third argument:

["name", "path/to/file.txt", "wholename"]
["name", ["path/to/one", "path/to/two"], "wholename"]

Finally, you may specify case insensitive evaluation by using iname instead of name.

type

Evaluates as true if the type of the file matches that specified by the second argument; this matches regular files:

["type", "f"]

Possible types are:

  • b: block special file
  • c: character special file
  • d: directory
  • f: regular file
  • p: named pipe (fifo)
  • l: symbolic link
  • s: socket
  • D: Solaris Door
empty

Evaluates as true if the file exists, has size is 0 and is a regular file or directory.

"empty"
["empty"]
exists

Evaluates as true if the file exists

"exists"
["exists"]

Build/Install

You can use these steps to get watchman built:

./autogen.sh
./configure
make

System Specific Preparation

Linux inotify Limits

The inotify(7) subsystem has three important tunings that impact watchman.

  • /proc/sys/fs/inotify/max_user_instances impacts how many different root dirs you can watch.
  • /proc/sys/fs/inotify/max_user_watches impacts how many dirs you can watch across all watched roots.
  • /proc/sys/fs/inotify/max_queued_events impacts how likely it is that your system will experience a notification overflow.

You obviously need to ensure that max_user_instances and max_user_watches are set so that the system is capable of keeping track of your files.

max_queued_events is important to size correctly; if it is too small, the kernel will drop events and watchman won't be able to report on them. Making this value bigger reduces the risk of this happening.

Watchman has two simple strategies for mitigating an overflow of max_queued_events:

  • It uses a dedicated thread to consume kernel events as quickly as possible
  • When the kernel reports an overflow, watchman will assume that all the files have been modified and will re-crawl the directory tree as though it had just started watching the dir.

This means that if an overflow does occur, you won't miss a legitimate change notification, but instead will get spurious notifications for files that haven't actually changed.

Max OS File Descriptor Limits

The default per-process descriptor limit on current versions of OS X is extremely low (256!). Since kqueue() requires an open descriptor for each watched file, you will very quickly run into resource limits if you do not raise them in your system configuration.

Watchman will attempt to raise its descriptor limit to match kern.maxfilesperproc when it starts up, so you shouldn't need to mess with ulimit; just raising the sysctl should do the trick.

The following will raise the limits to allow 10 million files total, with 1 million files per process until your next reboot.

sudo sysctl -w kern.maxfiles=10485760
sudo sysctl -w kern.maxfilesperproc=1048576

Putting the following into a file named /etc/sysctl.conf on OS X will cause these values to persist across reboots:

kern.maxfiles=10485760
kern.maxfilesperproc=1048576

Implementation details

Watchman runs as a single long-lived server process per user. The watchman command will take care of spawning the server process if necessary.

The server process listens on a unix domain socket and processes commands using a simple line based protocol. More advanced tools can talk to this socket directly. You can also do it yourself:

nc -U /tmp/.watchman.$USER

Streaming JSON protocol

Watchman uses a stream of JSON object or array values as its protocol.

  • Each request is a JSON array followed by a newline.
  • Each response is a JSON object followed by a newline.

By default, the protocol is request-response based, but some commands can enable a mode wherein the server side can unilaterally decide to send JSON object responses that contain log or trigger information.

Clients that enable these modes will need to be prepared to receive one or more of these unilateral responses at any time. Note that the server will ensure that only complete JSON objects are sent; it will not emit one in the middle of another JSON object response.

Request format

Sending the since command is simply a matter of formatting it as JSON. Note that the JSON text must be a single line (don't send a pretty printed version of it!) and be followed by a newline \n character:

["since", "/path/to/src", "n:c_srcs", "*.c"] <NEWLINE>

Contributing?

If you're thinking of hacking on watchman we'd love to hear from you! Feel free to use the Github issue tracker and pull requests discuss and submit code changes.

We (Facebook) occasionally have to ask for a "Contributor License Agreement" from someone who sends in a patch or code that we want to include in the codebase. This is a legal requirement (a similar situation applies in other projects, such as Apache and other ASF projects or the Linux kernel). It doesn't apply in all cases, just for "big" patches.

If we ask you to fill out a CLA we'll direct you to our online CLA page where you can complete it easily. We use the same form as the Apache CLA so that friction is minimal.

Tools

We use a tool called arc to run tests and perform lint checks. arc is part of Phabricator and can be installed by following these steps:

mkdir /somewhere
cd /somewhere
git clone git://github.com/facebook/libphutil.git
git clone git://github.com/facebook/arcanist.git

Add arc to your path:

export PATH="$PATH:/somewhere/arcanist/bin/"

With arc in your path, re-running configure will detect it and adjust the makefile so that arc lint will be run as part of make, but you can run it yourself outside of make.

You can run the unit tests using:

arc unit

If you'd like to contribute a patch to watchman, we'll ask you to make sure that arc unit still passess successfully and we'd ideally like you to augment the test suite to cover the functionality that you're adding or changing.

Future

  • add a command to remove registered triggers
  • add a hybrid of since and trigger that allows a connected client to receive notifications of file changes over their unix socket connection as they happen. Disconnecting the client will disable those particular notifications.
  • Watchman does not currently follow symlinks. It would be nice if it did, but doing so will add some complexity.

License

Watchman is made available under the terms of the Apache License 2.0. See the LICENSE file that accompanies this distribution for the full text of the license.

About

watches files and takes action when they change

Resources

License

Stars

Watchers

Forks

Packages

No packages published