Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Magic rollback gets confused with multiplexed SSH connections #106

Open
brprice opened this issue Jul 23, 2021 · 4 comments
Open

Magic rollback gets confused with multiplexed SSH connections #106

brprice opened this issue Jul 23, 2021 · 4 comments

Comments

@brprice
Copy link

brprice commented Jul 23, 2021

[ I'm not sure whether this should be considered a bug, or a PEBKAC. If the latter, it would be nice to document as a pitfall to avoid]

In my .ssh/config, I have set up automatic connection sharing via

Host *
        ControlMaster auto
        ControlPath ~/.ssh/%r@%h:%p

Thus, all ssh sessions to the same host go via the same connection. However, this means that the ssh session initiated by the rollback checker does not necessarily test whether ssh still works - it may just be multiplexed into an existing connection!

Consider the following (slightly contrived) scenario:

  • You are about to do a deploy which will break ssh so that new connections are rejected but existing ones carry on working (e.g. by disabling the ssh daemon)
  • Before doing so, you ssh <deploy-host> for some reason (to check disk usage, perhaps)
  • You leave that ssh session open, and in another terminal you run the deploy
  • The deploy's ssh sessions are multiplexed into the existing one. This doesn't matter for the copy and activate, but means that the rollback-check ssh erroneously succeeds (as it uses the still running existing connection is still up, and thus doesn't check whether a fresh connection will succeed)
  • Thus the deploy is activated and seems successful
  • You now close the manually-opened ssh connection ("disk usage checker" in this example)
  • Access is now lost to the remote box! Any new ssh <deploy-host> will now fail, as it won't accept the new connection.

I have seen this scenario happen when testing in a VM. I have not seen any issue without the manual ssh connection. (But since deploy-rs will ssh multiple times, I am slightly concerned there could be a race condition where the "activate" connection is not closed in time and thus is shared with the "rollback check" session resulting in the same issue. However, I have not checked the code - this concern may well be baseless)

The solution would be to add the -S none option to (the rollback-check's) ssh options, if we decide this is a bug we should fix, rather than user error.

@2xsaiko
Copy link
Contributor

2xsaiko commented May 15, 2022

I made this 2xsaiko@2dcbd4c to improve this issue, so that you have the option of using different SSH arguments and a different user for the SSH connection for the rollback check. This actually allows connection multiplexing to work correctly (I hope) instead of disabling it. (I actually wanted to get this to work correctly because of #54, I want to use a Yubikey as well)

Snippet from my configuration:

{
  deploy.nodes = optionalAttrs (hasAttr name network) {
    ${name} = {
      hostname = network.${name}.dnsName;
      sshUser = "root";
      user = "root";
      sshOpts = ["-i" "admin/id" "-S" "admin/%r@%h:%p" "-o" "ControlMaster=auto" "-o" "ControlPersist=60"];
      checkSshUser = "saiko";
      checkSshOpts = [];

      profiles.system = {
        path = deploy-rs.lib.${system}.activate.nixos self.nixosConfigurations.${name};
      };
    };
  };
}

Note the new checkSshUser and checkSshOpts options. I haven't submitted a PR yet though since I want to improve it a bit more.

@tecosaur
Copy link

Is there a reason why sshOpts = ["-o" "ControlMaster=no"] isn't a good solution?

@brprice
Copy link
Author

brprice commented Oct 17, 2022

Is there a reason why sshOpts = ["-o" "ControlMaster=no"] isn't a good solution?

That certainly works, and I would be happy if we decided that was the recommended solution and documented this pitfall to avoid.

Personally I'd prefer to automatically disable multiplexing for magic rollback, since I don't see any situation where one would want to multiplex that connection, and it would be nice (as a user) to not have to be aware that I should add that to every deploy-rs config (or not enable multiplexing in .ssh/config globally).

@brprice
Copy link
Author

brprice commented Jul 1, 2023

Is there a reason why sshOpts = ["-o" "ControlMaster=no"] isn't a good solution?

Actually, on second thought, I think this should be ControlPath=none.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants