Depending on where you work, the latency in your data warehouse is often several hours to days old. This problem gets exacerbated as data volumes grow.
Artie Transfer reads from the change data capture (CDC) stream and provides an easy out of the box solution that only requires a simple configuration file and will replicate the data in your transactional database to your data warehouse. To do this, Transfer has the following features built-in:
- Automatic retries & idempotency. We take reliability seriously and it's feature 0. Latency reduction is nice, but doesn't matter if the data is wrong. We provide automatic retries and idempotency such that we will always achieve eventual consistency.
- Automatic table creation. Transfer will create the table in the designated database if the table doesn't exist.
- Error reporting. Provide your Sentry API key and errors from data processing will appear in your Sentry project.
- Schema detection. Transfer will automatically detect column changes and apply them to the destination.
- Scalable architecture. Transfer's architecture stays the same whether we’re dealing with 1GB or 100+ TB of data.
- Sub-minute latency. Transfer is built with a consumer framework and is constantly streaming messages in the background. Say goodbye to schedulers!
Take a look at the Getting started on how to get started with Artie Transfer!
As you can see from the architecture above, Transfer sits behind Kafka and expects CDC messages to be in a particular format. Please see the currently supported section on what sources and destinations are supported.
The optimal set-up looks something like this:
- Kafka topic per table (so we can toggle number of partitions based on throughput)
- Partition key is the primary key for the table (so we avoid out-of-order writes at the row level)
To see the current supported databases, check out the Supported section
To run Artie Transfer's stack locally, please refer to the examples folder.
Transfer is aiming to provide coverage across all OLTPs and OLAPs databases. Currently Transfer supports:
-
Message Queues
- Kafka (default)
- Google Pub/Sub
-
- Snowflake
- BigQuery
-
- MongoDB
- PostgreSQL, we support the following replication slot plug-ins:
pgoutput, decoderbufs, wal2json
- MySQL
If the database you are using is not on the list, feel free to file for a feature request.
Note: If any of these limitations are blocking you from using Transfer. Feel free to contribute or file a bug and we'll get this prioritized!
The long term goal for Artie Transfer is to be able to extend the service to have as little of these limitations as possible.
Artie Transfer's telemetry guide
Transfer is written in Go and uses counterfeiter to mock. To run the tests, run the following commands:
make generate
make test
Artie Transfer is released through GoReleaser, and we use it to cross-compile our binaries on the releases as well as our Dockerhub. If your operating system or architecture is not supported, please file a feature request!
Artie Transfer is licensed under ELv2. Please see the LICENSE file for additional information. If you have any licensing questions please email [email protected].