Skip to content

Commit

Permalink
watchman: try to detect symlink changes more reliably on osx
Browse files Browse the repository at this point in the history
Summary: Looking for a way to mitigate
facebook#172 wherein fsevents won't
send us notifications about certain classes of changes involving
symlinks; the main one we're aware of is where a symlink is changed to
point to a file that doesn't (yet) exist on the filesystem.

The approach in this diff is to add a configuration option
(`recheck_dirs_interval_ms`) to specify how long we'll wait before we go
back and re-examine candidate files.

This parameter is checked after we've settled and defaults to the
trigger settle interval on mac, but is turned off for Linux.

Once we reach this point in time, we perform a query over the recently
changed file nodes and collect all dir nodes or the dir-node parents of
the files that have changed since we last checked for this category of
change.

If this list is empty then nothing has changed so we have no further
work to do.

Otherwise, we then search all symlink file nodes and add them to the
set, and we then re-examine this complete list of file nodes and
directories to try to discover changes that fsevents failed to inform us
of.

After this check we look to see how many nodes were fixed up; if there
were any fixed up nodes we'll re-set the settle to its base.

I found a couple of bugs while looking at this:

1. We didn't respect the recursive flag in `crawler` and would
unconditionally recursively crawl every item contained in a dir.

2. If we re-examined a file node that was already !file->exists and
realized that it still didn't exist, we'd observe a change for it
even though it logically could not have changed (it was deleted and
still is deleted).

3. YATOCTOU (Yet Another...) when we `lstat(2)`; we can traverse
symlinks in the path, so we add `w_lstat` which uses our strict
checking for the path.  This fixes a bug where we'd want to assess
a path that had previously been deleted when its parent transitioned
from a dir to a symlink and then we'd flip-flop between exists and
!exists.

4. I added an additional fstat in our opendir implementation on mac;
there is a TOCTOU-style issue where we can return a dir handle for
something that isn't actually a dir any more (for dir->symlink
transitions) and the deferred detection of the error condition is
suboptimal.  We now stand a greater chance of detecting and dealing with
this condition when we open the dir.

Test Plan: `make integration` has no regressions.  Then I repeatedly
checked out between two revisions in a large repo where there are a
large number of dir and symlink transitions:

```
$ hg co master ; rm .hg/watchman.state ; hg status --time
$ hg co arcpatch-XYZ ; rm .hg/watchman.state ; hg status --time
```

without this diff, the second step would consistently report 3 files
as modified in the `hg status` output.  A subsequent `hg status`
wouldn't show the same files due to the way that it records and
re-checks notable files.

To validate that we're not busy waiting or polling too aggressively:

```
$ watchman -p --server-encoding=json log-level debug | jq .log
```

is quiet once we've settled.

Activity monitor shows 0% CPU in this state.
  • Loading branch information
wez committed Nov 30, 2015
1 parent ae8e197 commit 7edcb21
Show file tree
Hide file tree
Showing 3 changed files with 209 additions and 14 deletions.
61 changes: 60 additions & 1 deletion opendir.c
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,46 @@ static int open_dir_as_fd_no_symlinks(const char *path, int flags) {
#endif
}

// Like lstat, but strict about symlinks in any path component
int w_lstat(const char *path, struct stat *st) {
#ifdef HAVE_OPENAT
char *pathcopy = strdup(path);
char *slash;
int dir_fd, res, err;

if (!pathcopy) {
return -1;
}

slash = strrchr(pathcopy, '/');
if (!slash) {
free(pathcopy);
errno = EINVAL;
return -1;
}

*slash = '\0';
dir_fd = open_dir_as_fd_no_symlinks(pathcopy, O_NOFOLLOW|O_CLOEXEC);
if (dir_fd == -1) {
free(pathcopy);
errno = ENOTDIR;
return -1;
}

errno = 0;
res = fstatat(dir_fd, slash + 1, st, AT_SYMLINK_NOFOLLOW);
err = errno;

free(pathcopy);
close(dir_fd);

errno = err;
return res;
#else
return lstat(path, st); // tadaa!
#endif
}

/* Opens a directory making sure it's not a symlink */
DIR *opendir_nofollow(const char *path)
{
Expand Down Expand Up @@ -169,6 +209,8 @@ struct watchman_dir_handle *w_dir_open(const char *path) {
// bulkstat and need to disable it. We will remove this option in
// a future release of watchman
if (cfg_get_bool(NULL, "_use_bulkstat", true)) {
struct stat st;

dir->fd = open_dir_as_fd_no_symlinks(path,
O_NOFOLLOW | O_CLOEXEC | O_RDONLY);
if (dir->fd == -1) {
Expand All @@ -178,6 +220,21 @@ struct watchman_dir_handle *w_dir_open(const char *path) {
return NULL;
}

if (fstat(dir->fd, &st)) {
err = errno;
close(dir->fd);
free(dir);
errno = err;
return NULL;
}

if (!S_ISDIR(st.st_mode)) {
close(dir->fd);
free(dir);
errno = ENOTDIR;
return NULL;
}

dir->attrlist.bitmapcount = ATTR_BIT_MAP_COUNT;
dir->attrlist.commonattr = ATTR_CMN_RETURNED_ATTRS |
ATTR_CMN_ERROR |
Expand Down Expand Up @@ -223,8 +280,10 @@ struct watchman_dir_ent *w_dir_read(struct watchman_dir_handle *dir) {
retcount = getattrlistbulk(dir->fd, &dir->attrlist,
dir->buf, sizeof(dir->buf), FSOPT_PACK_INVAL_ATTRS);
if (retcount == -1) {
int err = errno;
w_log(W_LOG_ERR, "getattrlistbulk: error %d %s\n",
errno, strerror(errno));
errno, strerror(err));
errno = err;
return NULL;
}
if (retcount == 0) {
Expand Down
160 changes: 147 additions & 13 deletions root.c
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@ static bool apply_ignore_vcs_configuration(w_root_t *root, char **errmsg)
// While we're at it, see if we can find out where to put our
// query cookie information
if (root->query_cookie_dir == NULL &&
lstat(fullname->buf, &st) == 0 && S_ISDIR(st.st_mode)) {
w_lstat(fullname->buf, &st) == 0 && S_ISDIR(st.st_mode)) {
// root/{.hg,.git,.svn}
root->query_cookie_dir = w_string_path_cat(root->root_path, name);
}
Expand Down Expand Up @@ -839,7 +839,7 @@ static void stat_path(w_root_t *root,
err = 0;
} else {
struct stat struct_stat;
res = lstat(path, &struct_stat);
res = w_lstat(path, &struct_stat);
err = res == 0 ? 0 : errno;
w_log(W_LOG_DBG, "lstat(%s) file=%p dir=%p\n", path, file, dir_ent);
if (err == 0) {
Expand All @@ -859,8 +859,12 @@ static void stat_path(w_root_t *root,
stop_watching_dir(root, dir_ent);
}
if (file) {
w_log(W_LOG_DBG, "lstat(%s) -> %s so marking %.*s deleted\n",
path, strerror(err), file->name->len, file->name->buf);
if (file->exists) {
w_log(W_LOG_DBG, "lstat(%s) -> %s so marking %.*s deleted\n",
path, strerror(err), file->name->len, file->name->buf);
file->exists = false;
w_root_mark_file_changed(root, file, now);
}
} else {
// It was created and removed before we could ever observe it
// in the filesystem. We need to generate a deleted file
Expand All @@ -869,9 +873,9 @@ static void stat_path(w_root_t *root,
file = w_root_resolve_file(root, dir, file_name, now);
w_log(W_LOG_DBG, "lstat(%s) -> %s and file node was NULL. "
"Generating a deleted node.\n", path, strerror(err));
file->exists = false;
w_root_mark_file_changed(root, file, now);
}
file->exists = false;
w_root_mark_file_changed(root, file, now);
} else if (res) {
w_log(W_LOG_ERR, "lstat(%s) %d %s\n",
path, err, strerror(err));
Expand Down Expand Up @@ -1126,11 +1130,13 @@ void w_root_process_path(w_root_t *root,
}

if (w_string_equal(full_path, root->root_path)
|| flags & W_PENDING_CRAWL_ONLY) {
crawler(root, coll, full_path, now, flags & W_PENDING_RECURSIVE);
|| (flags & W_PENDING_CRAWL_ONLY) == W_PENDING_CRAWL_ONLY) {
crawler(root, coll, full_path, now,
(flags & W_PENDING_RECURSIVE) == W_PENDING_RECURSIVE);
} else {
stat_path(root, coll, full_path, now, flags & W_PENDING_RECURSIVE,
flags & W_PENDING_VIA_NOTIFY, pre_stat);
stat_path(root, coll, full_path, now,
(flags & W_PENDING_RECURSIVE) == W_PENDING_RECURSIVE,
(flags & W_PENDING_VIA_NOTIFY) == W_PENDING_VIA_NOTIFY, pre_stat);
}
}

Expand Down Expand Up @@ -1339,8 +1345,12 @@ static void crawler(w_root_t *root, struct watchman_pending_collection *coll,
w_string_t *full_path = w_string_path_cat_cstr(dir->path,
dirent->d_name);
if (full_path) {
w_log(W_LOG_DBG, "in crawler[%.*s], calling process_path on %.*s\n",
dir->path->len, dir->path->buf,
full_path->len, full_path->buf);
w_root_process_path(root, coll, full_path, now,
W_PENDING_RECURSIVE, dirent);
(recursive || !file || !file->exists) ?
W_PENDING_RECURSIVE : 0, dirent);
w_string_delref(full_path);
} else {
w_log(W_LOG_ERR, "OOM during crawl\n");
Expand Down Expand Up @@ -1496,6 +1506,105 @@ static void process_triggers(w_root_t *root)
root->last_trigger_tick = root->pending_trigger_tick;
}

/* fsevents won't tell us about creation events for dangling symlinks;
* we have to check to find those for ourselves. To manage this, every
* time we transition into being initially settled, we'll collect a
* list of dirs that were modified since the last settle event and rescan
* them (non-recursive).
* We'll do this inspection in the context of the IO thread.
* If we return true it means that we found something that the watcher
* missed.
*/
static bool recheck_dirs(w_root_t *root,
struct watchman_pending_collection *coll) {
struct watchman_file *f;
struct timeval now;

if (root->last_recheck_tick >= root->ticks) {
return false;
}

w_log(W_LOG_DBG, "recheck!, last_recheck_tick=%d root->ticks=%d\n",
root->last_recheck_tick, root->ticks);

gettimeofday(&now, NULL);

// First pass: collect a list of recently changed dirs
for (f = root->latest_file; f; f = f->next) {
// check dirs, but only if we've seen them change recently
if (f->otime.ticks <= root->last_recheck_tick) {
continue;
}

if (S_ISDIR(f->stat.mode)) {
// This was a dir, so check it again
w_pending_coll_add_rel(coll, f->parent, f->name->buf, now, 0);
} else {
// Crawl the parent dir
w_pending_coll_add(coll, f->parent->path, now, 0);
}
}

if (w_pending_coll_size(coll) == 0) {
// We're all up to date
root->last_recheck_tick = root->ticks;
w_log(W_LOG_DBG,
"recheck complete, last_recheck_tick=%d root->ticks=%d\n",
root->last_recheck_tick, root->ticks);
return false;
}

w_log(W_LOG_DBG, "Re-checking %d dirs (before considering symlinks)\n",
w_pending_coll_size(coll));

// Now that we know that something recently changed, let's go looking
// for symlinks and explicitly check their containing dirs for changes too
for (f = root->latest_file; f; f = f->next) {
if (S_ISLNK(f->stat.mode) && f->exists) {
// Re-examine this symlink
w_pending_coll_add_rel(coll, f->parent, f->name->buf, now, 0);
// and re-crawl its parent dir to discover other potentially newly
// created symlinks that were previously dangling
w_pending_coll_add(coll, f->parent->path, now, 0);
}
}

w_log(W_LOG_DBG, "Re-checking %d dirs (after considering symlinks)\n",
w_pending_coll_size(coll));
// Move the recheck window forward, and bump the tick counter
// ready to observe anything new in the crawl(s) that we trigger next
root->last_recheck_tick = root->ticks++;

// Now re-examine the list of dirs
while (w_root_process_pending(root, coll, false)) {
;
}
w_log(W_LOG_DBG, "recheck complete, last_recheck_tick=%d root->ticks=%d\n",
root->last_recheck_tick, root->ticks);

// This is a bit icky, but let's find out how many things we observed
// change as a result of this check; if none, then we return false
for (f = root->latest_file; f; f = f->next) {
if (f->otime.ticks <= root->last_recheck_tick) {
// No more changes in the appropriate time range;
// nothing changed as a result of this recheck
return false;
}
if (f->otime.ticks == root->ticks) {
// Yep, something was updated.
// Log what it was so that we can debug situations where we don't
// really settle.
w_log(W_LOG_DBG, "Re-check: %.*s/%.*s was updated at tick=%d\n",
f->parent->path->len, f->parent->path->buf,
f->name->len, f->name->buf,
f->otime.ticks);
return true;
}
}

return false;
}

static bool handle_should_recrawl(w_root_t *root)
{
if (root->should_recrawl && !root->cancelled) {
Expand Down Expand Up @@ -1803,7 +1912,7 @@ static void notify_thread(w_root_t *root)

static void io_thread(w_root_t *root)
{
int timeoutms, biggest_timeout;
int timeoutms, biggest_timeout, recheck_timeout;
struct watchman_pending_collection pending;

timeoutms = root->trigger_settle;
Expand All @@ -1820,6 +1929,19 @@ static void io_thread(w_root_t *root)
// And convert to milliseconds
biggest_timeout *= 1000;

recheck_timeout = (int)cfg_get_int(root, "recheck_dirs_interval_ms",
#ifdef __APPLE__
// default to "on" for mac because of fsevents bugs. We pick the settle
// period as the default so that we check just before we dispatch the
// first round of notifications; this gives us a chance to detect
// dangling symlink changes before we tell folks about them.
root->trigger_settle
#else
0
#endif
);
recheck_timeout = MIN(recheck_timeout, biggest_timeout);

w_pending_coll_init(&pending);

while (!root->cancelled) {
Expand All @@ -1839,6 +1961,8 @@ static void io_thread(w_root_t *root)
;
}
root->done_initial = true;
// We just crawled everything, no need to recheck right now
root->last_recheck_tick = root->ticks + 1;
w_root_unlock(root);
if (cfg_get_bool(root, "iothrottle", false)) {
w_ioprio_set_normal();
Expand All @@ -1858,7 +1982,7 @@ static void io_thread(w_root_t *root)

if (!pinged && w_pending_coll_size(&pending) == 0) {
// No new pending items were given to us, so consider that
// we may not be settled.
// we may now be settled.

w_root_lock(root);
if (!root->done_initial) {
Expand All @@ -1867,6 +1991,16 @@ static void io_thread(w_root_t *root)
continue;
}

// If we just settled, the timeout will be the base timeout.
// This is an appropriate time to second guess the watcher
// and locate anything that we think we may have missed
if (recheck_timeout > 0 && timeoutms >= recheck_timeout) {
if (recheck_dirs(root, &pending)) {
// We're no longer settled, so reset the timeout
timeoutms = root->trigger_settle;
}
}

process_subscriptions(root);
process_triggers(root);
if (consider_reap(root)) {
Expand Down
2 changes: 2 additions & 0 deletions watchman.h
Original file line number Diff line number Diff line change
Expand Up @@ -521,6 +521,7 @@ struct watchman_root {
uint32_t pending_trigger_tick;
uint32_t pending_sub_tick;
uint32_t last_age_out_tick;
uint32_t last_recheck_tick;
time_t last_age_out_timestamp;
time_t last_cmd_timestamp;
time_t last_reap_timestamp;
Expand Down Expand Up @@ -969,6 +970,7 @@ void handle_open_errno(w_root_t *root, struct watchman_dir *dir,
void stop_watching_dir(w_root_t *root, struct watchman_dir *dir);
DIR *opendir_nofollow(const char *path);
uint32_t u32_strlen(const char *str);
int w_lstat(const char *path, struct stat *st);

extern struct watchman_ops fsevents_watcher;
extern struct watchman_ops kqueue_watcher;
Expand Down

0 comments on commit 7edcb21

Please sign in to comment.