We are introducing the ability to use Snowflake as the storage and computing platform for Theia. When using Snowflake, it is no longer necessary to run ClickHouse DB (flow records are stored in a Snowflake database) or Spark (flow processing is done by Snowflake virtual warehouses). Using Snowflake for Theia means that you have to bring your own AWS and Snowflake accounts (you will be charged for resource usage) and some features available with "standard" Theia are not available yet with Snowflake.
Theia with Snowflake requires Antrea >= v1.9.0 and Theia >= v0.3.0.
Because it is not yet distributed as a release asset, you will need to build the CLI yourself, which requires Git, Golang and Make.
git clone [email protected]:antrea-io/theia.git
cd theia/snowflake
make
Follow the steps in the AWS documentation to specify credentials that can be used by the AWS Go SDK. Either configure the ~/.aws/credentials file or set the required environment variables.
You can also export the AWS_REGION
environment variable, set to your preferred
region.
Export the following environment variables: SNOWFLAKE_ACCOUNT
,
SNOWFLAKE_USER
, SNOWFLAKE_PASSWORD
.
You may skip this step if you already have a bucket that you want to use.
./bin/theia-sf create-bucket
Retrieve the bucket name output by the command.
You may skip this step if you already have a KMS key that you want to use. If you choose not to use a KMS key, Snowflake credentials will be stored in your S3 bucket in clear text.
./bin/theia-sf create-kms-key
Retrieve the key ID output by the command.
./bin/theia-sf onboard --bucket-name <BUCKET NAME> --key-id <KEY ID>
The command output will include a table like this one, with important information:
+----------------------------+------------------------------------------------------------------+
| Region | us-west-2 |
| Bucket Name | antrea-flows-93e9ojn80fgn5vwt |
| Bucket Flows Folder | flows |
| Snowflake Database Name | ANTREA_93E9OJN80FGN5VWT |
| Snowflake Schema Name | THEIA |
| Snowflake Flows Table Name | FLOWS |
| SNS Topic ARN | arn:aws:sns:us-west-2:867393676014:antrea-flows-93e9ojn80fgn5vwt |
| SQS Queue ARN | arn:aws:sqs:us-west-2:867393676014:antrea-flows-93e9ojn80fgn5vwt |
+----------------------------+------------------------------------------------------------------+
helm repo add antrea https://charts.antrea.io
helm repo update
helm install antrea antrea/antrea -n kube-system --set featureGates.FlowExporter=true
helm install flow-aggregator antrea/flow-aggregator \
--set s3Uploader.enable=true \
--set s3Uploader.bucketName=<BUCKET NAME> \
--set s3Uploader.bucketPrefix=flows \
--set s3Uploader.awsCredentials.aws_access_key_id=<AWS ACCESS KEY ID> \
--set s3Uploader.awsCredentials.aws_secret_access_key=<AWS SECRET ACCESS KEY> \
-n flow-aggregator --create-namespace
Follow these steps if you want to delete all resources created by
theia-sf
. Just like for onboarding, AWS credentials and
Snowflake credentials are required.
# always call offboard with the same arguments as onboard!
./bin/theia-sf offboard --bucket-name <BUCKET NAME> --key-id <KEY ID>
# if you created a KMS key and you want to schedule it for deletion
./bin/theia-sf delete-kms-key --key-id <KEY ID>
# if you created an S3 bucket to store infra state and you want to delete it
./bin/theia-sf delete-bucket --name <BUCKET NAME>
NetworkPolicy Recommendation recommends the NetworkPolicy configuration to secure Kubernetes network and applications. It analyzes the network flows stored in the Snowflake database to generate Kubernetes NetworkPolicies or Antrea NetworkPolicies.
# make sure you have called onboard before running policy-recommendation
./bin/theia-sf policy-recommendation --database-name <SNOWFLAKE DATABASE NAME> > recommended_policies.yml
Database name can be found in the output of the onboard command.
NetworkPolicy Recommendation requires a Snowflake warehouse to execute and may
take seconds to minutes depending on the number of flows. We recommend using a
Medium size warehouse
if you are working on a big dataset. If no warehouse is provided by the
--warehouse-name
option, we will create a temporary X-Small size warehouse by
default. Running NetworkPolicy Recommendation will consume Snowflake credits,
the amount of which will depend on the size of the warehouse and the contents
of the database.
The Abnormal Traffic Drop Detector scans flow records stored in the database and detects any unreasonable amount of flows dropped or blocked by NetworkPolicy for each endpoint, then reports alerts to users. It helps identify potential issues with NetworkPolicies and potential security threats. Also, it could alert admins to take appropriate action to mitigate them.
# make sure you have called onboard before running drop-detection
./bin/theia-sf drop-detection --database-name <SNOWFLAKE DATABASE NAME>
Database name can be found in the output of the onboard command.
Just like for NetworkPolicy Recommendation, the Abnormal Traffic Drop
Detector requires a Snowflake warehouse to execute and may take seconds to
minutes depending on the number of flows. If no warehouse is provided by the
--warehouse-name
option, we will create a temporary X-Small size warehouse by
default. Running Abnormal Traffic Drop Detector will consume Snowflake credits,
the amount of which will depend on the size of the warehouse and the contents
of the database.
We use Grafana as the tool to query data from Snowflake, and visualize the networking flows in the cluster(s).
Export the following environment variables:
Name | Description |
---|---|
SNOWFLAKE_ACCOUNT | Specifies the full name of your account (provided by Snowflake). |
SNOWFLAKE_USER | Specifies the login name of the user for the connection. |
SNOWFLAKE_PASSWORD | Specifies the password for the specified user. |
SNOWFLAKE_WAREHOUSE | Specifies the virtual warehouse to use once connected. |
SNOWFLAKE_DATABASE | Specifies the default database to use once connected. |
SNOWFLAKE_ROLE (Optional) | Specifies the default access control role to use in the Snowflake session initiated by Grafana. |
Database name can be found in the output of the onboard command.
We suggest running Grafana using the official Docker image.
Before running Grafana, we need to download the Snowflake datasource plugin. We suggest creating your own plugin directory.
mkdir your-plugin-path && cd your-plugin-path
wget https://github.com/michelin/snowflake-grafana-datasource/releases/download/v1.2.0/snowflake-grafana-datasource.zip
unzip snowflake-grafana-datasource.zip
export GF_PLUGINS=$(pwd)
Then run Grafana with Docker:
cd theia/snowflake
docker run -d \
-p 3000:3000 \
-v "${GF_PLUGINS}":/var/lib/grafana/plugins \
-v "$(pwd)"/grafana/provisioning:/etc/grafana/provisioning \
--name=grafana \
-e "GF_PLUGINS_ALLOW_LOADING_UNSIGNED_PLUGINS=michelin-snowflake-datasource" \
-e "GF_INSTALL_PLUGINS=https://downloads.antrea.io/artifacts/grafana-custom-plugins/theia-grafana-sankey-plugin-1.0.2.zip;theia-grafana-sankey-plugin,https://downloads.antrea.io/artifacts/grafana-custom-plugins/theia-grafana-chord-plugin-1.0.1.zip;theia-grafana-chord-plugin" \
-e "GF_DASHBOARDS_DEFAULT_HOME_DASHBOARD_PATH=/etc/grafana/provisioning/dashboards/homepage.json" \
-e "SNOWFLAKE_ACCOUNT=${SNOWFLAKE_ACCOUNT}" \
-e "SNOWFLAKE_USER=${SNOWFLAKE_USER}" \
-e "SNOWFLAKE_PASSWORD=${SNOWFLAKE_PASSWORD}" \
-e "SNOWFLAKE_WAREHOUSE=${SNOWFLAKE_WAREHOUSE}" \
-e "SNOWFLAKE_DATABASE=${SNOWFLAKE_DATABASE}" \
-e "SNOWFLAKE_ROLE=${SNOWFLAKE_ROLE}" \
grafana/grafana:9.1.6
Open your web browser and go to http://localhost:3000/
, login with:
- username: admin
- password: admin
You will see the home dashboard after login. It provides an overview of the monitored Kubernetes cluster, short introduction and links to the other pre-built dashboards for more specific flow visualization. Detailed introduction of these dashboards can be found at Pre-built Dashboards.
Currently, with Snowflake datasource, we have built the Home Dashboard, Flow Records Dashboard, Pod-to-Pod Flows Dashboard, and Network-Policy Flows Dashboard.
The dashboards allow you to select and view flow data from one or more clusters. You will be able to find a dropdown menu on the top left corner of each dashboard. The filter will affect all the panels in the dashboard.