Tempo is a Jaeger/Zipkin/OpenCensus compatible backend. It is not OpenTelemetry compatible only b/c that doesn't exist yet. Tempo ingests batches in any of the mentioned formats, buffers them and then writes them to GCS.
See the example folder for various ways to get started running tempo locally.
Tempo is built around the Cortex architecture. It vendors Cortex primarily for the ring/lifecycler code.
Distributors vendor the OpenTelemetry Collector to reuse their receiver code and then use consistent ring hashing to split up a batch and push it to ingesters based on trace id.
Ingesters batch traces until a configurable timeout is hit and then push them into a headblock. Blocks are cut periodically and shipped to the backend (gcs).
Queriers request trace ids both from ingesters and the backend and return the set of batches matching the requested trace id.
Compactors iterate over all blocks looking for candidates for compaction. They are scaleable and use a consistent ring to decide ownership of a given set of blocks.
tempo-query is jaeger-query with a hashicorp go-plugin to support querying tempo.
tempo-vulture is tempo's bird based consistency checking tool. It queries Loki, extracts trace ids and then queries tempo. It metrics 404s and traces with missing spans.
tempo-cli is place to put any utility functionality related to tempo. Currently it only supports dumping header information for all blocks from gcs.
go run ./cmd/tempo-cli -gcs-bucket ops-tools-tracing-ops -tenant-id single-tenant
TempoDB is contained in the tempo repository but is meant to be a stand alone key value database built on top of cloud object storage (gcs/s3).
If you are getting into the project it would be worth reviewing the list of issues to get a feel for existing work on Tempo. Below are some of the most important issues/features to resolve before considering Tempo Beta.
- Determine and fix the reason for partial traces
- Provide a "meta" configuration layer that tightens up config and protects Tempo from upstream changes in Cortex config. This also includes the decision about whether or not Tempo should support ring storage mechanisms besides gossip.
- Organize data storage around a page and implement a page aligned cache.
- Clean up bad blocks with the compactor. This is importtant b/c otherwise bad blocks live forever.
- Update otelcol dependency and move to otel proto. This should come with significant performance gains.
- Add a code of conduct, contributing guidelines and changelog.
And then, in order to offer hosted Tempo we would need to work out integration with Grafana.com APIs, authorization, and other things I'm not thinking of.