Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
bigquery_kms_key.py	bigquery_kms_key.py
noxfile.py	noxfile.py
requirements-test.txt	requirements-test.txt
requirements.txt	requirements.txt

Using customer-managed encryption keys

This sample demonstrate how to use cryptographic encryption keys for the I/O connectors in an Apache Beam pipeline. For more information, see the Using customer-managed encryption keys docs page.

Before you begin

Follow the Getting started with Google Cloud Dataflow page, and make sure you have a Google Cloud project with billing enabled and a service account JSON key set up in your GOOGLE_APPLICATION_CREDENTIALS environment variable. Additionally, for this sample you need the following:

Enable the APIs: BigQuery and Cloud KMS API.

Create a Cloud Storage bucket.

export BUCKET=your-gcs-bucket
gsutil mb gs://$BUCKET

Create a symmetric key ring. For best results, use a regional location. This example uses a global key for simplicity.
```
export KMS_KEYRING=samples-keyring
export KMS_KEY=samples-key

# Create a key ring.
gcloud kms keyrings create $KMS_KEYRING --location global

# Create a key.
gcloud kms keys create $KMS_KEY --location global \
  --keyring $KMS_KEYRING --purpose encryption
```
Note: Although you can destroy the key version material, you cannot delete keys and key rings. Key rings and keys do not have billable costs or quota limitations, so their continued existence does not impact costs or production limits.

Grant Encrypter/Decrypter permissions to the Dataflow, Compute Engine, and BigQuery service accounts. This grants your Dataflow, Compute Engine and BigQuery service accounts the permission to encrypt and decrypt with the CMEK you specify. The Dataflow workers use these service accounts when running the pipeline, which is different from the user service account used to start the pipeline.

export PROJECT=$(gcloud config get-value project)
export PROJECT_NUMBER=$(gcloud projects list --filter $PROJECT --format "value(PROJECT_NUMBER)")

# Grant Encrypter/Decrypter permissions to the Dataflow service account.
gcloud projects add-iam-policy-binding $PROJECT \
  --member serviceAccount:service-$PROJECT_NUMBER@dataflow-service-producer-prod.iam.gserviceaccount.com \
  --role roles/cloudkms.cryptoKeyEncrypterDecrypter

# Grant Encrypter/Decrypter permissions to the Compute Engine service account.
gcloud projects add-iam-policy-binding $PROJECT \
  --member serviceAccount:service-$PROJECT_NUMBER@compute-system.iam.gserviceaccount.com \
  --role roles/cloudkms.cryptoKeyEncrypterDecrypter

# Grant Encrypter/Decrypter permissions to the BigQuery service account.
gcloud projects add-iam-policy-binding $PROJECT \
  --member serviceAccount:bq-$PROJECT_NUMBER@bigquery-encryption.iam.gserviceaccount.com \
  --role roles/cloudkms.cryptoKeyEncrypterDecrypter

Clone the python-docs-samples repository.

git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git

Navigate to the sample code directory.

cd python-docs-samples/dataflow/encryption-keys

Create a virtual environment and activate it.
```
virtualenv env
source env/bin/activate
```
Once you are done, you can deactivate the virtualenv and go back to your global Python environment by running deactivate.
Install the sample requirements.
```
pip install -U -r requirements.txt
```

BigQuery KMS Key example

bigquery_kms_key.py

The following sample gets some data from the NASA wildfires public BigQuery dataset using a customer-managed encryption key, and dump that data into the specified output_bigquery_table using the same customer-managed encryption key.

Make sure you have the following variables set up:

# Set the project ID, GCS bucket and KMS key.
export PROJECT=$(gcloud config get-value project)
export BUCKET=your-gcs-bucket

# Set the region for the Dataflow job.
# https://cloud.google.com/compute/docs/regions-zones/
export REGION=us-central1

# Set the KMS key ID.
export KMS_KEYRING=samples-keyring
export KMS_KEY=samples-key
export KMS_KEY_ID=$(gcloud kms keys list --location global --keyring $KMS_KEYRING --filter $KMS_KEY --format "value(NAME)")

# Output BigQuery dataset and table name.
export DATASET=samples
export TABLE=dataflow_kms

Create the BigQuery dataset where the output table resides.

# Create the BigQuery dataset.
bq mk --dataset $PROJECT:$DATASET

To run the sample using the Dataflow runner.

python bigquery_kms_key.py \
  --output_bigquery_table $PROJECT:$DATASET.$TABLE \
  --kms_key $KMS_KEY_ID \
  --project $PROJECT \
  --runner DataflowRunner \
  --temp_location gs://$BUCKET/samples/dataflow/kms/tmp \
  --region $REGION

Note: To run locally you can omit the --runner command line argument and it defaults to the DirectRunner.

You can check your submitted Cloud Dataflow jobs in the GCP Console Dataflow page or by using gcloud.

gcloud dataflow jobs list

Finally, check the contents of the BigQuery table.

bq query --use_legacy_sql=false "SELECT * FROM `$PROJECT.$DATASET.$TABLE`"

Cleanup

To avoid incurring charges to your GCP account for the resources used:

# Remove only the files created by this sample.
gsutil -m rm -rf "gs://$BUCKET/samples/dataflow/kms"

# [optional] Remove the Cloud Storage bucket.
gsutil rb gs://$BUCKET

# Remove the BigQuery table.
bq rm -f -t $PROJECT:$DATASET.$TABLE

# [optional] Remove the BigQuery dataset and all its tables.
bq rm -rf -d $PROJECT:$DATASET

# Revoke Encrypter/Decrypter permissions to the Dataflow service account.
gcloud projects remove-iam-policy-binding $PROJECT \
  --member serviceAccount:service-$PROJECT_NUMBER@dataflow-service-producer-prod.iam.gserviceaccount.com \
  --role roles/cloudkms.cryptoKeyEncrypterDecrypter

# Revoke Encrypter/Decrypter permissions to the Compute Engine service account.
gcloud projects remove-iam-policy-binding $PROJECT \
  --member serviceAccount:service-$PROJECT_NUMBER@compute-system.iam.gserviceaccount.com \
  --role roles/cloudkms.cryptoKeyEncrypterDecrypter

# Revoke Encrypter/Decrypter permissions to the BigQuery service account.
gcloud projects remove-iam-policy-binding $PROJECT \
  --member serviceAccount:bq-$PROJECT_NUMBER@bigquery-encryption.iam.gserviceaccount.com \
  --role roles/cloudkms.cryptoKeyEncrypterDecrypter

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

encryption-keys

encryption-keys

README.md

Using customer-managed encryption keys

Before you begin

BigQuery KMS Key example

Cleanup

Files

encryption-keys

Directory actions

More options

Directory actions

More options

Latest commit

History

encryption-keys

Folders and files

parent directory

README.md

Using customer-managed encryption keys

Before you begin

BigQuery KMS Key example

Cleanup