Skip to content

Latest commit

 

History

History
185 lines (120 loc) · 7.34 KB

0097-teleport-connect-usage-metrics.md

File metadata and controls

185 lines (120 loc) · 7.34 KB
authors state
Grzegorz Zdunek ([email protected])
implemented (v12.1.1)

RFD 97 - Teleport Connect usage metrics

Required Approvers

  • Engineering:? @ravicious @zmb3
  • Product: @klizhentas @xinding33

What

Collect downloads and usage metrics of Teleport Connect.

Why

Currently, the team has no information on how many users download, install and use Teleport Connect on a daily basis. In order to effectively plan the development of the product, the team should also know what is the adoption of new features, which ones are the most popular and which are problematic for users.

Details

Collecting events

Events for Connect will be collected on the client side. TypeScript code will have a stateless metrics service that will forward them to the gRPC handler exposed by tsh daemon, which will ultimately submit them to a service called prehog. To prevent flooding backend with a large number of small requests, events will be batched before sending to prehog. The batching mechanism has already been implemented in UsageReporter that will be used for collecting cluster events. tsh daemon will try to reuse the same code as much as possible (by providing its own batching parameters and submit function). Events will be sent once every hour (this may change) and before closing the app.

NOTE: At the time of implementing this RFD, prehog service does not support requests containing batched events. Instead, events are sent one-by-one from a client side when a "batch" is ready. Having a batching mechanism ready will allow us to quickly enable it when the server side adds support.

It was considered to use an authorized endpoint provided by cluster's Auth Server, but it seems to not work well for Connect for a few reasons:

  • Some events may not belong to any cluster (at the time of writing this RFD there is no such event, but the solution should be future-proof).
  • Batch can contain events from multiple clusters.
  • Batch can be sent after the session expires.

Anonymization

NOTE: The anonymization solution described below applies only to events that are associated with a cluster. Events that do not belong to any cluster but contain sensitive data will have to be anonymized in a different way.

Each event that contains sensitive data, like cluster name needs to be anonymized. It will be done in tsh daemon, the same way as in Auth Server - using HMAC with unique cluster id as the key. Connect will reuse the same code. The only issue with anonymizing events client-side is lack of cluster id that is kept in Auth Server. To remedy this, when the app starts and retrieves cluster information, it should also retrieve the cluster id. The Electron app will then send a cluster id to the daemon with each event that needs to be anonymized. The daemon will anonymize the events before sending to prehog.

Storing events

Events will be sent to a public endpoint in prehog (intended for use only by Connect) that translates them into the PostHog's data model.

Connect events will share the same project with clusters and website events. It will allow to perform queries that need both sources of data, like calculating what is the percentage of users logging to a cluster with Connect.

Some event properties, like OS can be saved as a user property. These properties are then stored directly on each event. For example, when the first emitted event sets a user property os: windows, each next event will have this property set.

To differentiate events coming from multiple application instances, each event needs to have distinct_id field. It will be supplied with UUID generated by Connect (with connect. prefix added by prehog). The value will be created on the start and stored in a file in the app data directory, so it will not change between restarts.

NOTE: As stated above, Connect events are tied to the application instance (or just the client machine). It means that PostHog's Person for Connect and for cluster will be a different thing. They should not be merged.

User agreement

On the start, Teleport Connect will ask user to opt in to volunteer anonymized metrics and usage-data with standard message "Do you agree to Teleport Connect collecting anonymized usage data? This will help us to improve the product". It will also show a link to FAQ.

If the user refuses, Connect will not send any usage data.

How will collecting metrics support product development?

In the initial version, it should help with getting answers to the following questions:

How many unique users download and use Teleport Connect today?

To answer the first part of the question, download counts from goteleport.com/download are needed. These will be collected from access logs from CloudFront CDN.

To calculate how many users use Teleport Connect on a daily basis, a metric like DAU (Daily Active Users) can be used. This metric can be based on a specific event, but in this case it should be calculated using any event. For example, user logged only once in a given day to refresh certs for a DB proxy connection - such user can be considered as active.

What features are the most popular?

Usage of each feature will be measured basing on events from the events section. They will allow to generate various statistics, like the most common kinds of connections or just show the usage of particular features like Access Requests.

What platforms are the most popular?

This will be measured in two ways.

  • Based on downloads count for each platform.
  • Based on a real usage - every event will contain the OS field. These events will be then aggregated by unique users.

How usage grows or shrinks over time?

PostHog allows to create Trends basing on DAU. It can be used to show how usage changes in a given period of time.

Events

connect.cluster.login

Successful login to a cluster.

Event properties:

  • tp.cluster_name: string (anonymized)
  • tp.user_name: string (anonymized)
  • tp.connector_type: string
  • connect.os: string (set on a user properties)
  • connect.arch: string (set on a user properties) - CPU architecture
  • connect.os_version: string (set on a user properties)
  • connect.app_version: string (set on a user properties)

connect.protocol.use

Connecting to the protocol.

Event properties:

  • tp.cluster_name: string (anonymized)
  • tp.user_name: string (anonymized)
  • connect.protocol: one of ssh/db/kube

connect.accessRequest.create

Creating an access request.

Event properties:

  • tp.cluster_name: string (anonymized)
  • tp.user_name: string (anonymized)
  • connect.access_request_kind: one of role, resource

connect.accessRequest.review

Reviewing an access request.

Event properties:

  • tp.cluster_name: string (anonymized)
  • tp.user_name: string (anonymized)

connect.accessRequest.assumeRole

Assuming a requested role.

Event properties:

  • tp.cluster_name: string (anonymized)
  • tp.user_name: string (anonymized)

connect.fileTransfer.run

Running file transfer.

Event properties:

  • tp.cluster_name: string (anonymized)
  • tp.user_name: string (anonymized)
  • connect.file_transfer_is_upload: boolean

Updating user job role.

Event properties:

  • connect.user.job_role: string (set on a user properties)