Skip to content

Latest commit

 

History

History
 
 

encryption-keys

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Using customer-managed encryption keys

Open in Cloud Shell

This sample demonstrate how to use cryptographic encryption keys for the I/O connectors in an Apache Beam pipeline. For more information, see the Using customer-managed encryption keys docs page.

Before you begin

Follow the Getting started with Google Cloud Dataflow page, and make sure you have a Google Cloud project with billing enabled and a service account JSON key set up in your GOOGLE_APPLICATION_CREDENTIALS environment variable. Additionally, for this sample you need the following:

  1. Enable the APIs: BigQuery and Cloud KMS API.

  2. Create a Cloud Storage bucket.

    export BUCKET=your-gcs-bucket
    gsutil mb gs://$BUCKET
  3. Create a symmetric key ring. For best results, use a regional location. This example uses a global key for simplicity.

    export KMS_KEYRING=samples-keyring
    export KMS_KEY=samples-key
    
    # Create a key ring.
    gcloud kms keyrings create $KMS_KEYRING --location global
    
    # Create a key.
    gcloud kms keys create $KMS_KEY --location global \
      --keyring $KMS_KEYRING --purpose encryption

    Note: Although you can destroy the key version material, you cannot delete keys and key rings. Key rings and keys do not have billable costs or quota limitations, so their continued existence does not impact costs or production limits.

  4. Grant Encrypter/Decrypter permissions to the Dataflow, Compute Engine, and BigQuery service accounts. This grants your Dataflow, Compute Engine and BigQuery service accounts the permission to encrypt and decrypt with the CMEK you specify. The Dataflow workers use these service accounts when running the pipeline, which is different from the user service account used to start the pipeline.

    export PROJECT=$(gcloud config get-value project)
    export PROJECT_NUMBER=$(gcloud projects list --filter $PROJECT --format "value(PROJECT_NUMBER)")
    
    # Grant Encrypter/Decrypter permissions to the Dataflow service account.
    gcloud projects add-iam-policy-binding $PROJECT \
      --member serviceAccount:service-$PROJECT_NUMBER@dataflow-service-producer-prod.iam.gserviceaccount.com \
      --role roles/cloudkms.cryptoKeyEncrypterDecrypter
    
    # Grant Encrypter/Decrypter permissions to the Compute Engine service account.
    gcloud projects add-iam-policy-binding $PROJECT \
      --member serviceAccount:service-$PROJECT_NUMBER@compute-system.iam.gserviceaccount.com \
      --role roles/cloudkms.cryptoKeyEncrypterDecrypter
    
    # Grant Encrypter/Decrypter permissions to the BigQuery service account.
    gcloud projects add-iam-policy-binding $PROJECT \
      --member serviceAccount:bq-$PROJECT_NUMBER@bigquery-encryption.iam.gserviceaccount.com \
      --role roles/cloudkms.cryptoKeyEncrypterDecrypter
  5. Clone the python-docs-samples repository.

    git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git
  6. Navigate to the sample code directory.

    cd python-docs-samples/dataflow/encryption-keys
  7. Create a virtual environment and activate it.

    virtualenv env
    source env/bin/activate

    Once you are done, you can deactivate the virtualenv and go back to your global Python environment by running deactivate.

  8. Install the sample requirements.

    pip install -U -r requirements.txt

BigQuery KMS Key example

The following sample gets some data from the NASA wildfires public BigQuery dataset using a customer-managed encryption key, and dump that data into the specified output_bigquery_table using the same customer-managed encryption key.

Make sure you have the following variables set up:

# Set the project ID, GCS bucket and KMS key.
export PROJECT=$(gcloud config get-value project)
export BUCKET=your-gcs-bucket

# Set the region for the Dataflow job.
# https://cloud.google.com/compute/docs/regions-zones/
export REGION=us-central1

# Set the KMS key ID.
export KMS_KEYRING=samples-keyring
export KMS_KEY=samples-key
export KMS_KEY_ID=$(gcloud kms keys list --location global --keyring $KMS_KEYRING --filter $KMS_KEY --format "value(NAME)")

# Output BigQuery dataset and table name.
export DATASET=samples
export TABLE=dataflow_kms

Create the BigQuery dataset where the output table resides.

# Create the BigQuery dataset.
bq mk --dataset $PROJECT:$DATASET

To run the sample using the Dataflow runner.

python bigquery_kms_key.py \
  --output_bigquery_table $PROJECT:$DATASET.$TABLE \
  --kms_key $KMS_KEY_ID \
  --project $PROJECT \
  --runner DataflowRunner \
  --temp_location gs://$BUCKET/samples/dataflow/kms/tmp \
  --region $REGION

Note: To run locally you can omit the --runner command line argument and it defaults to the DirectRunner.

You can check your submitted Cloud Dataflow jobs in the GCP Console Dataflow page or by using gcloud.

gcloud dataflow jobs list

Finally, check the contents of the BigQuery table.

bq query --use_legacy_sql=false "SELECT * FROM `$PROJECT.$DATASET.$TABLE`"

Cleanup

To avoid incurring charges to your GCP account for the resources used:

# Remove only the files created by this sample.
gsutil -m rm -rf "gs://$BUCKET/samples/dataflow/kms"

# [optional] Remove the Cloud Storage bucket.
gsutil rb gs://$BUCKET

# Remove the BigQuery table.
bq rm -f -t $PROJECT:$DATASET.$TABLE

# [optional] Remove the BigQuery dataset and all its tables.
bq rm -rf -d $PROJECT:$DATASET

# Revoke Encrypter/Decrypter permissions to the Dataflow service account.
gcloud projects remove-iam-policy-binding $PROJECT \
  --member serviceAccount:service-$PROJECT_NUMBER@dataflow-service-producer-prod.iam.gserviceaccount.com \
  --role roles/cloudkms.cryptoKeyEncrypterDecrypter

# Revoke Encrypter/Decrypter permissions to the Compute Engine service account.
gcloud projects remove-iam-policy-binding $PROJECT \
  --member serviceAccount:service-$PROJECT_NUMBER@compute-system.iam.gserviceaccount.com \
  --role roles/cloudkms.cryptoKeyEncrypterDecrypter

# Revoke Encrypter/Decrypter permissions to the BigQuery service account.
gcloud projects remove-iam-policy-binding $PROJECT \
  --member serviceAccount:bq-$PROJECT_NUMBER@bigquery-encryption.iam.gserviceaccount.com \
  --role roles/cloudkms.cryptoKeyEncrypterDecrypter