authors | state |
---|---|
Grzegorz Zdunek ([email protected]) |
implemented (v12.1.1) |
- Engineering:? @ravicious @zmb3
- Product: @klizhentas @xinding33
Collect downloads and usage metrics of Teleport Connect.
Currently, the team has no information on how many users download, install and use Teleport Connect on a daily basis. In order to effectively plan the development of the product, the team should also know what is the adoption of new features, which ones are the most popular and which are problematic for users.
Events for Connect will be collected on the client side. TypeScript code will have a stateless metrics service that will
forward them to the gRPC handler exposed by tsh daemon
, which will ultimately submit them to a service
called prehog
. To prevent flooding backend with a large number of small
requests, events will be batched before sending to prehog
. The batching mechanism has already been implemented
in UsageReporter
that will be used for collecting cluster events. tsh daemon
will try to reuse the same code as much
as possible (by providing its own batching parameters and submit function). Events will be sent once every hour (this
may change) and before closing the app.
NOTE: At the time of implementing this RFD,
prehog
service does not support requests containing batched events. Instead, events are sent one-by-one from a client side when a "batch" is ready. Having a batching mechanism ready will allow us to quickly enable it when the server side adds support.
It was considered to use an authorized endpoint provided by cluster's Auth Server
, but it seems to not work well for
Connect for a few reasons:
- Some events may not belong to any cluster (at the time of writing this RFD there is no such event, but the solution should be future-proof).
- Batch can contain events from multiple clusters.
- Batch can be sent after the session expires.
NOTE: The anonymization solution described below applies only to events that are associated with a cluster. Events that do not belong to any cluster but contain sensitive data will have to be anonymized in a different way.
Each event that contains sensitive data, like cluster name needs to be anonymized. It will be done in tsh daemon
, the
same way as in Auth Server
- using HMAC with unique cluster id
as the key. Connect will reuse the same code.
The only issue with anonymizing events client-side is lack of cluster id
that is kept in Auth Server
. To remedy
this, when the app starts and retrieves cluster information, it should also retrieve the cluster id
. The Electron app
will then send a cluster id
to the daemon with each event that needs to be anonymized. The daemon will anonymize the
events before sending to prehog
.
Events will be sent to a public endpoint in prehog
(intended for use only by Connect) that translates them into
the PostHog's data model.
Connect events will share the same project with clusters and website events. It will allow to perform queries that need both sources of data, like calculating what is the percentage of users logging to a cluster with Connect.
Some event properties, like OS can be saved as a user property. These properties are
then stored directly on each event. For example, when
the first emitted event sets a user property os: windows
, each next event will have this property set.
To differentiate events coming from multiple application instances, each event needs to have distinct_id
field. It
will be supplied with UUID generated by Connect (with connect.
prefix added by prehog
). The value will be created
on the start and stored in a file in the app data directory, so it will not change between restarts.
NOTE: As stated above, Connect events are tied to the application instance (or just the client machine). It means that PostHog's Person for Connect and for cluster will be a different thing. They should not be merged.
On the start, Teleport Connect will ask user to opt in to volunteer anonymized metrics and usage-data with standard message "Do you agree to Teleport Connect collecting anonymized usage data? This will help us to improve the product". It will also show a link to FAQ.
If the user refuses, Connect will not send any usage data.
In the initial version, it should help with getting answers to the following questions:
To answer the first part of the question, download counts from goteleport.com/download
are needed. These will be
collected from access logs from CloudFront CDN.
To calculate how many users use Teleport Connect on a daily basis, a metric like DAU (Daily Active Users) can be used. This metric can be based on a specific event, but in this case it should be calculated using any event. For example, user logged only once in a given day to refresh certs for a DB proxy connection - such user can be considered as active.
Usage of each feature will be measured basing on events from the events section. They will allow to generate various statistics, like the most common kinds of connections or just show the usage of particular features like Access Requests.
This will be measured in two ways.
- Based on downloads count for each platform.
- Based on a real usage - every event will contain the OS field. These events will be then aggregated by unique users.
PostHog allows to create Trends basing on DAU. It can be used to show how usage changes in a given period of time.
Successful login to a cluster.
Event properties:
tp.cluster_name
: string (anonymized)tp.user_name
: string (anonymized)tp.connector_type
: stringconnect.os
: string (set on a user properties)connect.arch
: string (set on a user properties) - CPU architectureconnect.os_version
: string (set on a user properties)connect.app_version
: string (set on a user properties)
Connecting to the protocol.
Event properties:
tp.cluster_name
: string (anonymized)tp.user_name
: string (anonymized)connect.protocol
: one ofssh
/db
/kube
Creating an access request.
Event properties:
tp.cluster_name
: string (anonymized)tp.user_name
: string (anonymized)connect.access_request_kind
: one ofrole
,resource
Reviewing an access request.
Event properties:
tp.cluster_name
: string (anonymized)tp.user_name
: string (anonymized)
Assuming a requested role.
Event properties:
tp.cluster_name
: string (anonymized)tp.user_name
: string (anonymized)
Running file transfer.
Event properties:
tp.cluster_name
: string (anonymized)tp.user_name
: string (anonymized)connect.file_transfer_is_upload
: boolean
Updating user job role.
Event properties:
connect.user.job_role
: string (set on a user properties)