Skip to content

Latest commit

 

History

History
364 lines (280 loc) · 12.2 KB

0031-dynamic-app-db-registration.md

File metadata and controls

364 lines (280 loc) · 12.2 KB
authors state
Roman Tkachenko ([email protected])
implemented

RFD 31 - Dynamic Registration for Apps and Databases

What

Provide ability for users to register/deregister web apps (for App Access) and databases (for Database Access) without having to modify static yaml configuration or spin up/bring down app/database agents.

Why

Currently, users have two options for adding/removing web apps and databases in a Teleport cluster:

  • Update yaml configuration of a particular app_service or db_service to add/remove an app or a database entry and restart the service.
  • Bring up a new instance of an app_service or a db_service agent to register a new app/database, or stop an existing one to deregister.

This is inconvenient for multiple reasons:

  • Updating static yaml configuration and restarting the agent is manual and affects all other services served by this agent (incl. other apps/databases).
  • Bringing up or stopping an app/database service agent can be challenging from operational perspective.
  • It is unfriendly to automation and from the integration standpoint i.e. no easy way for a CI tool or an external plugin to connect a new app/database.
  • Similar to above, it makes it impossible for tools like our Terraform provider to manage apps/databases as resources.

Scope

Given the shortcomings of the current approach, this RFD proposes the way to:

  • Allow Teleport users with appropriate permissions to add/remove web apps and databases using CLI and API.

The following is out of scope of the initial implementation:

  • Providing ability to add/remove web apps and databases via Teleport web UI.
  • Auto-discovering app and database endpoints in Kubernetes clusters is kept out of initial implementation scope but we do discuss future work options (see relevant Github issue and Kubernetes section).

How

The basic design premise is to employ a resource-based approach, already familiar to Teleport users:

  • Web apps and databases are represented as yaml resources that can be managed using tctl create/get/rm commands.
  • App/database service agents watch their respective resources and update their proxied app/database configurations appropriately.
  • We will use labels to instruct app/database service agents to watch for specific app/database resources.

UX

Before diving into implementation details, let's explore a few scenarios of how different UX personas may use this feature. For all scenarios we are assuming that Teleport cluster has an app/database service agent running and configured to watch for appropriate resources.

Cluster admin

Teleport cluster admin will use tctl resource commands to create a yaml resource representing a web app or a database, or remove it.

$ tctl create grafana.yaml
$ tctl rm db_servers/aurora

When the resource is created, an appropriate app/database service agent will pick it up and start proxying.

In the future, it might make sense for us to expose this functionality in web UI, in which case interactive Teleport users with appropriate permissions will be able to do it as well.

CI system

An external system or a user can run tctl commands using identity files and otherwise is no different than using tctl locally on the auth server.

$ tctl auth sign --ttl=24h --user=drone --out=drone.pem
# Via auth server.
$ tctl --auth-server=192.168.0.1:3025 -i drone.pem create grafana.yaml
# Via proxy (e.g. for Cloud).
$ tctl --auth-server=192.168.0.1:3080 -i drone.pem create grafana.yaml

API user

Teleport API client will provide methods for integrators to create/update/delete web apps and databases. Once created or deleted, the change will be picked up by an appropriate agent.

See API changes section below for more details.

Terraform provider

Terraform provider uses API client for resource management so it will use the client's get, create and delete methods as any other API user.

This, again, assumes that there is a app/database service in the cluster that adds/removes web apps and databases.

Cloud user

Cloud users should be able to use tctl commands with impersonation which is similar to the CI/external user scenario.

Managing app/database resources

The specs of app/database resources closely resemble their respective yaml configuration sections.

Application resource example:

# grafana-app.yaml
kind: app
version: v3
metadata:
  name: grafana
  description: Grafana
  labels:
    env: dev
spec:
  uri: http://localhost:3000
  public_addr: grafana.example.com
  rewrite:
    headers:
    - name: X-Custom-Trait-Env
      value: "{{external.env}}"
  dynamic_labels:
    date:
      command: ["/bin/date"]
      period: 1m

Database resource example:

# redshift-db.yaml
kind: db
version: v3
metadata:
  name: redshift
  description: Amazon Redshift
  labels:
    env: aws
spec:
  protocol: postgres
  uri: redshift-cluster-1.abcdefg.us-east-1.redshift.amazonaws.com:5439
  ca_cert: pem data
  aws:
    region: us-east-1
    redshift:
      cluster_id: redshift-cluster-1
  dynamic_labels:
    date:
      command: ["/bin/date"]
      period: 1m

Users can use tctl commands to manage these resources.

Create app/database resource:

$ tctl create grafana-app.yaml
$ tctl create redshift-db.yaml

View app/database resource:

$ tctl get app
$ tctl get db

Remove app/database resource:

$ tctl get app/grafana
$ tctl rm db/redshift

Configuring app/database service

In addition to apps/databases from static yaml configuration, app/database services can be configured to watch for the above resources with specific labels.

App agent:

app_service:
  enabled: "yes"

  # Watch for apps with app=ops&env=dev or env=test labels.
  resources:
  - labels:
      "app": "ops"
      "env": "dev"
  - labels:
      "env": "test"

  # Static apps configuration, optional.
  apps:
    ...

Database agent:

db_service:
  enabled: "yes"

  # Watch for any database with env label set.
  resources:
  - labels:
      "env": "*"

  # Static databases configuration, optional.
  databases:
    ...

If no resource matchers are specified, no resources are being watched. To make an agent watch for any app/database, an explicit wildcard selector must be specified:

resources:
- labels:
    "*": "*"

Static vs dynamic configuration

There must be a way to distinguish between apps and databases from the static yaml configuration vs ones registered as dynamic resources. This is needed, for example, to prevent users from deleting those registered statically.

We will use the same approach as in RFD16 Dynamic Configuration and use teleport.dev/origin label to denote whether the app/database resource comes from static configuration or created dynamically. The rules are:

  • Users can't delete apps/databases with teleport.dev/origin: config-file label.
  • Users can't set teleport.dev/origin label in their resources - it will be auto-set to dynamic when they're created.
  • Users can't create apps/databases with the same name as one of existing static apps/databases (to avoid various conflict resolution issues), and vice versa.

Multiple replicas

When multiple app/database agents pick up the same web app or database resource, they all add it to their proxying configuration. The connections that go to this app/database follow the same load-balancing rules which already exist today. This is no different than registering the same app/database multiple times via static configuration today.

Security implications

In the current Teleport security model, the trust is established between components of the system i.e. when a new app or database service agent joins the cluster, it needs to present a valid auth token. Once the service has registered, updating its configuration to connect new web apps or databases does not require re-registering the service with the cluster.

Dynamic app/database registration plugs into this model - when a new app or database resource is created, it is picked up by a service that is already part of the cluster. Managing app/database resources themselves is subject to RBAC checks so only users with appropriate permissions to app_server and db_server resources are allowed to create/remove them.

Another potential implication is that an attacker with the privileged enough access can replace an existing app/database definition, for example with the intention to route traffic to a different endpoint. This scenario wouldn't be different from the attacker gaining permissions to modify any other part of the Teleport cluster so proper usage of Teleport RBAC system and access requests should alleviate this concern.

API changes

Teleport API client currently provides the following methods for managing app/database servers (copied from api/client/client.go):

GetAppServers(ctx context.Context, namespace string, skipValidation bool) ([]types.Server, error)
UpsertAppServer(ctx context.Context, server types.Server) (*types.KeepAlive, error)
DeleteAppServer(ctx context.Context, namespace, name string) error
GetDatabaseServers(ctx context.Context, namespace string, skipValidation bool) ([]types.DatabaseServer, error)
UpsertDatabaseServer(ctx context.Context, server types.DatabaseServer) (*types.KeepAlive, error)
DeleteDatabaseServer(ctx context.Context, namespace, hostID, name string) error

An application or a database server represents a single proxied instance of an application or a database. These resources and their API methods were designed for internal usage by Teleport components (e.g. when an agent registers an application or a database with the cluster) and aren't supposed to be managed by users directly.

To support managing application and database resources defined above, a new set of API methods is introduced:

CreateApp(ctx context.Context, app types.App) error
UpdateApp(ctx context.Context, app types.App) error
GetApps(ctx context.Context) ([]types.App, error)
GetApp(ctx context.Context, name string) (types.App, error)
DeleteApp(ctx context.Context, name string) error
CreateDatabase(ctx context.Context, db types.Database) error
UpdateDatabase(ctx context.Context, db types.Database) error
GetDatabases(ctx context.Context) ([]types.Database, error)
GetDatabase(ctx context.Context, name string) (types.Database, error)
DeleteDatabase(ctx context.Context, name string) error

Kubernetes

Discovering web app and database endpoints in Kubernetes clusters is kept out of the initial implementation scope. Initially, users can use CLI or API approach described in this RFD to add or remove web app and database Teleport resources, which works for clusters deployed both on-prem and in Kubernetes.

There are options though to make the process of adding new apps and databases friendlier and more "idiomatic" for Kubernetes users, by implementing a controller that will be monitoring specific K8s resources.

The simpler option is to monitor Service objects with specific labels or annotations. It may work for very simple use-cases but the downside is that Service object can't fully express Teleport application (e.g. public address, rewrite configuration, etc.) or a database endpoint (e.g. protocol, cloud specific settings, etc.). Unless maybe we use annotations to encode this information which isn't ideal.

A better approach is to use Kubernetes CRDs and define custom resources e.g. teleport.sh/app and teleport.sh/db which provides versioning and ability to define all custom fields these resources need. These resources will be registered when users install our Helm chart.

When run inside Kubernetes, app and database service agents will start a K8s controller which will be watching these custom resources using selector described above to filter them by labels, so no new configuration is necessary.

Kubernetes improvements are tracked in #4832.