Klepto is a tool for copying and anonymising data
Klepto is a tool that copies and anonymises data from other sources.
- Readme Languages
- Intro
- Requirements
- Installation
- Usage
- Steal Options
- Configuration File Options
- Examples
- Contributing
- License
Klepto helps you to keep the data in your environment as consistent as possible by copying it from another environment's database.
You can use Klepto to get production data but without sensitive customer information for your testing or local debugging.
- Copy data to your local database or to stdout, stderr
- Filter the source data
- Anonymise the source data
- PostgreSQL
- MySQL
If you need to get data from a database type that you don't see here, build it yourself and add it to this list. Contributions are welcomed :)
- Active connection to the IT VPN
- Latest version of pg_dump installed (Only required when working with PostgreSQL databases)
Klepto is written in Go with support for multiple platforms. Pre-built binaries are provided for the following:
- macOS (Darwin) for x64, i386, and ARM architectures
- Windows
- Linux
You can download the binary for your platform of choice from the releases page.
Once downloaded, the binary can be run from anywhere. We recommend that you move it into your $PATH
for easy use, which is usually at /usr/local/bin
.
Klepto uses a configuration file called .klepto.toml
to define your table structure. If your table is normalized, the structure can be detected automatically.
For dumping the last 10 created active users, your file will look like this:
[[Tables]]
Name = "users"
[Tables.Anonymise]
email = "EmailAddress"
username = "FirstName"
password = "SimplePassword"
[Tables.Filter]
Match = "users.status = 'active'"
Limit = 10
[Tables.Filter.Sorts]
created_at = "desc"
After you have created the file, run:
Postgres:
klepto steal \
--from="postgres://user:pass@localhost/fromDB?sslmode=disable" \
--to="postgres://user:pass@localhost/toDB?sslmode=disable" \
MySQL:
klepto steal \
--from="user:pass@tcp(localhost:3306)/fromDB?sslmode=disable" \
--to="user:pass@tcp(localhost:3306)/toDB?sslmode=disable" \
Behind the scenes Klepto will establishes the connection with the source and target databases with the given parameters passed, and will dump the tables.
The available options can be seen by running klepto steal -h
We recommend to always set the following parameters:
concurrency
to alleviate the pressure over both the source and target databases.read-max-conns
to limit the number of open connections, so that the source database does not get overloaded.
You can set a number of keys in the configuration file. Below is a list of all configuration options, followed by some examples of specific keys.
Matchers
- Variables to store filter data. You can declare a filter once and reuse it among tables.Tables
- A Klepto table definition.Name
- The table name.IgnoreData
- A flag to indicate whether data should be imported or not. If set to true, it will dump the table structure without importing data.Filter
- A Klepto definition to filter results.Match
- A condition field to dump only certain amount data. The value should correspond to an existingMatchers
definition.Limit
- The number of results to be fetched.Sorts
- Defines how the table is sorted.
Anonymise
- Indicates which columns to anonymise.Relationships
- Represents a relationship between the table and referenced table.Table
- The table name.ForeignKey
- The table's foreign key.ReferencedTable
- The referenced table name.ReferencedKey
- The referenced table primary key.
You can dump the database structure without importing data by setting the IgnoreData
value to true
.
[[Tables]]
Name = "logs"
IgnoreData = true
Matchers are variables to store filter data. You can declare a filter once and reuse it among tables:
[[Matchers]]
Latest100Users = "ORDER BY users.created_at DESC LIMIT 100"
[[Tables]]
Name = "users"
[Tables.Filter]
Match = "Latest100Users"
[[Tables]]
Name = "orders"
[[Tables.Relationships]]
ForeignKey = "user_id"
ReferencedTable = "users"
ReferencedKey = "id"
[Tables.Filter]
Match = "Latest100Users"
See examples for more.
You can anonymise specific columns in your table using the Anonymise
key. Anonymisation is performed by running a Faker against the specified column.
[[Tables]]
Name = "customers"
[Tables.Anonymise]
email = "EmailAddress"
firstName = "FirstName"
[[Tables]]
Name = "users"
[Tables.Anonymise]
email = "EmailAddress"
password = "literal:1234"
This would replace these 4 columns from the customer
and users
tables and run fake.EmailAddress
and fake.FirstName
against them respectively. We can use literal:[some-constant-value]
to specify a constant we want to write for a column. In this case, password = "literal:1234"
would write 1234
for every row in the password column of the users table.
Available data types can be found in fake.go. This file is generated from https://github.com/icrowley/fake (it must be generated because it is written in such a way that Go cannot reflect upon it).
We generate the file with the following:
$ go get github.com/ungerik/pkgreflect
$ fake master pkgreflect -notypes -novars -norecurs vendor/github.com/icrowley/fake/
Column's value can be conditionally anonymised by writing an anonymisation expression. For evaluating anonymisation expressions, we use Expr
package, and its language definition can be found here.
To conditionally anonymise a column, prefix the anonymisation expression with cond:
, as seen in the example below. You can access other row's columns within the anonymisation expression.
[[Tables]]
Name = "Account"
[Tables.Anonymise]
account_title = 'cond:Value(row, "account_type_id") == "2" ? Anon("FullName") : Skip()'
contact_email = 'cond:IsNil(row, "contact_email") ? Skip() : Anon("EmailAddress")'
Each expression is evaluated in an isolated environment. The environment comes with a few predefined variables and functions which we can use.
The variables are:
row
: a variable of typedatabase.Row
which contains all the columns of the row we're anonymisingcolumn
: a variable containing the value of the column we're currently anonymising
The functions are:
-
Value(row database.Row, columnName string) string
will return the value of the given row and column; the function will always return a string, regardless of the underlying database type -
Anon(fakerType string) *Option
will return a value generated by faker function of given name (just like using the faker without a conditional expression) -
Skip() *Option
will return an empty anonymisation value, signaling the anonymisation engine to skip anonymisation of the current column. No data replacement will be done -
IsNil(row database.Row, columnName string) bool
will return a boolean indicating whether the column value of given row is a nil value -
Literal(str string) *Option
will return the string argument as an anonymisation value
The Relationships
key represents a relationship between the table and referenced table.
To dump the latest 100 users with their orders:
[[Tables]]
Name = "users"
[Tables.Filter]
Limit = 100
[Tables.Filter.Sorts]
created_at = "desc"
[[Tables]]
Name = "orders"
[[Tables.Relationships]]
# behind the scenes klepto will create a inner join between orders and users
ForeignKey = "user_id"
ReferencedTable = "users"
ReferencedKey = "id"
[Tables.Filter]
Limit = 100
[Tables.Filter.Sorts]
created_at = "desc"
For linux
GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build -o klepto-linux-amd64
For macos
GOOS=darwin GOARCH=amd64 CGO_ENABLED=0 go build -o klepto-darwin-amd64
Example configuration files for intfood and the ordering tool can be found on Klepto Examples on Confluence.
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.
This project is licensed under the MIT License - see the LICENSE file for details