Skip to content

Commit

Permalink
Document various available configuration. (airbytehq#9249)
Browse files Browse the repository at this point in the history
- Add comments to the interface methods in Configs.java.
- Add new document on configuring airbyte. Transfer the non internal-only variables to this document.
  • Loading branch information
davinchia authored Jan 4, 2022
1 parent a24a287 commit 8c3c68c
Show file tree
Hide file tree
Showing 6 changed files with 328 additions and 12 deletions.
2 changes: 2 additions & 0 deletions airbyte-commons/src/main/resources/log4j2.xml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@
This is useful if you want to override the environment variables at runtime (or if you don't have
access to the necessary information at the point where you are setting environment variables).
Please update configuring-airbyte.md if the names of any of the below variables change.
-->

<!-- Always log INFO by default. -->
Expand Down
210 changes: 201 additions & 9 deletions airbyte-config/models/src/main/java/io/airbyte/config/Configs.java
Original file line number Diff line number Diff line change
Expand Up @@ -12,125 +12,317 @@
import java.util.Map;
import java.util.Set;

/**
* This interface defines the general variables for configuring Airbyte.
* <p>
* Please update the configuring-airbyte.md document when modifying this file.
* <p>
* Please also add one of the following tags to the env var accordingly:
* <p>
* 1. 'Internal-use only' if a var is mainly for Airbyte-only configuration. e.g. tracking, test or
* Cloud related etc.
* <p>
* 2. 'Alpha support' if a var does not have proper support and should be used with care.
*/
public interface Configs {

// CORE
// General
/**
* Distinguishes internal Airbyte deployments. Internal-use only.
*/
String getAirbyteRole();

/**
* Defines the Airbyte deployment version.
*/
AirbyteVersion getAirbyteVersion();

String getAirbyteVersionOrWarning();

/**
* Defines the bucket for caching specs. This immensely speeds up spec operations. This is updated
* when new versions are published.
*/
String getSpecCacheBucket();

/**
* Distinguishes internal Airbyte deployments. Internal-use only.
*/
DeploymentMode getDeploymentMode();

/**
* Defines if the deployment is Docker or Kubernetes. Airbyte behaves accordingly.
*/
WorkerEnvironment getWorkerEnvironment();

/**
* Defines the configs directory. Applies only to Docker, and is present in Kubernetes for backward
* compatibility.
*/
Path getConfigRoot();

/**
* Defines the Airbyte workspace directory. Applies only to Docker, and is present in Kubernetes for
* backward compatibility.
*/
Path getWorkspaceRoot();

// Docker Only
/**
* Defines the name of the Airbyte docker volume.
*/
String getWorkspaceDockerMount();

/**
* Defines the name of the docker mount that is used for local file handling. On Docker, this allows
* connector pods to interact with a volume for "local file" operations.
*/
String getLocalDockerMount();

/**
* Defines the docker network jobs are launched on with the new scheduler.
*/
String getDockerNetwork();

Path getLocalRoot();

// Secrets
/**
* Defines the GCP Project to store secrets in. Alpha support.
*/
String getSecretStoreGcpProjectId();

/**
* Define the JSON credentials used to read/write Airbyte Configuration to Google Secret Manager.
* These credentials must have Secret Manager Read/Write access. Alpha support.
*/
String getSecretStoreGcpCredentials();

/**
* Defines the Secret Persistence type. None by default. Set to GOOGLE_SECRET_MANAGER to use Google
* Secret Manager. Set to TESTING_CONFIG_DB_TABLE to use the database as a test. Alpha support.
* Undefined behavior will result if this is turned on and then off.
*/
SecretPersistenceType getSecretPersistenceType();

// Database
/**
* Define the Jobs Database user.
*/
String getDatabaseUser();

/**
* Define the Jobs Database password.
*/
String getDatabasePassword();

/**
* Define the Jobs Database url in the form of
* jdbc:postgresql://${DATABASE_HOST}:${DATABASE_PORT/${DATABASE_DB}. Do not include username or
* password.
*/
String getDatabaseUrl();

/**
* Define the minimum flyway migration version the Jobs Database must be at. If this is not
* satisfied, applications will not successfully connect. Internal-use only.
*/
String getJobsDatabaseMinimumFlywayMigrationVersion();

/**
* Define the total time to wait for the Jobs Database to be initialized. This includes migrations.
*/
long getJobsDatabaseInitializationTimeoutMs();

/**
* Define the Configs Database user. Defaults to the Jobs Database user if empty.
*/
String getConfigDatabaseUser();

/**
* Define the Configs Database password. Defaults to the Jobs Database password if empty.
*/
String getConfigDatabasePassword();

/**
* Define the Configs Database url in the form of
* jdbc:postgresql://${DATABASE_HOST}:${DATABASE_PORT/${DATABASE_DB}. Defaults to the Jobs Database
* url if empty.
*/
String getConfigDatabaseUrl();

/**
* Define the minimum flyway migration version the Configs Database must be at. If this is not
* satisfied, applications will not successfully connect. Internal-use only.
*/
String getConfigsDatabaseMinimumFlywayMigrationVersion();

/**
* Define the total time to wait for the Configs Database to be initialized. This includes
* migrations.
*/
long getConfigsDatabaseInitializationTimeoutMs();

/**
* Define if the Bootloader should run migrations on start up.
*/
boolean runDatabaseMigrationOnStartup();

// Airbyte Services
/**
* Define the url where Temporal is hosted at. Please include the port. Airbyte services use this
* information.
*/
String getTemporalHost();

/**
* Define the url where the Airbyte Server is hosted at. Airbyte services use this information.
* Manipulates the `INTERNAL_API_HOST` variable.
*/
String getAirbyteApiHost();

/**
* Define the port where the Airbyte Server is hosted at. Airbyte services use this information.
* Manipulates the `INTERNAL_API_HOST` variable.
*/
int getAirbyteApiPort();

/**
* Define the url the Airbyte Webapp is hosted at. Airbyte services use this information.
*/
String getWebappUrl();

// Jobs
/**
* Define the number of attempts a sync will attempt before failing.
*/
int getSyncJobMaxAttempts();

/**
* Define the number of days a sync job will execute for before timing out.
*/
int getSyncJobMaxTimeoutDays();

/**
* Define the job container's minimum CPU usage. Units follow either Docker or Kubernetes, depending
* on the deployment. Defaults to none.
*/
String getJobMainContainerCpuRequest();

/**
* Define the job container's maximum CPU usage. Units follow either Docker or Kubernetes, depending
* on the deployment. Defaults to none.
*/
String getJobMainContainerCpuLimit();

/**
* Define the job container's minimum RAM usage. Units follow either Docker or Kubernetes, depending
* on the deployment. Defaults to none.
*/
String getJobMainContainerMemoryRequest();

/**
* Define the job container's maximum RAM usage. Units follow either Docker or Kubernetes, depending
* on the deployment. Defaults to none.
*/
String getJobMainContainerMemoryLimit();

// Jobs - Kube only
/**
* Define one or more Job pod tolerations. Tolerations are separated by ';'. Each toleration
* contains k=v pairs mentioning some/all of key, effect, operator and value and separated by `,`.
*/
List<TolerationPOJO> getJobKubeTolerations();

/**
* Define one or more Job pod node selectors. Each kv-pair is separated by a `,`.
*/
Map<String, String> getJobKubeNodeSelectors();

/**
* Define the Job pod connector image pull policy.
*/
String getJobKubeMainContainerImagePullPolicy();

/**
* Define the Job pod connector image pull secret. Useful when hosting private images.
*/
String getJobKubeMainContainerImagePullSecret();

/**
* Define the Job pod socat image.
*/
String getJobKubeSocatImage();

/**
* Define the Job pod busybox image.
*/
String getJobKubeBusyboxImage();

/**
* Define the Job pod curl image pull.
*/
String getJobKubeCurlImage();

/**
* Define the Kubernetes namespace Job pods are created in.
*/
String getJobKubeNamespace();

String getJobMainContainerCpuRequest();

String getJobMainContainerCpuLimit();

String getJobMainContainerMemoryRequest();

String getJobMainContainerMemoryLimit();

// Logging/Monitoring/Tracking
/**
* Define either S3, Minio or GCS as a logging backend. Kubernetes only. Multiple variables are
* involved here. Please see {@link CloudStorageConfigs} for more info.
*/
LogConfigs getLogConfigs();

/**
* Define either S3, Minio or GCS as a state storage backend. Multiple variables are involved here.
* Please see {@link CloudStorageConfigs} for more info.
*/
CloudStorageConfigs getStateStorageCloudConfigs();

/**
* Determine if Datadog tracking events should be published. Mainly for Airbyte internal use.
*/
boolean getPublishMetrics();

/**
* Define whether to publish tracking events to Segment or log-only. Airbyte internal use.
*/
TrackingStrategy getTrackingStrategy();

// APPLICATIONS
// Worker
/**
* Define the maximum number of workers each Airbyte Worker container supports. Multiple variables
* are involved here. Please see {@link MaxWorkersConfig} for more info.
*/
MaxWorkersConfig getMaxWorkers();

// Worker - Kube only
/**
* Define the local ports the Airbyte Worker pod uses to connect to the various Job pods.
*/
Set<Integer> getTemporalWorkerPorts();

// Scheduler
/**
* Define how and how often the Scheduler sweeps its local disk for old configs. Multiple variables
* are involved here. Please see {@link WorkspaceRetentionConfig} for more info.
*/
WorkspaceRetentionConfig getWorkspaceRetentionConfig();

/**
* Define the maximum number of concurrent jobs the Scheduler schedules. Defaults to 5.
*/
String getSubmitterNumThreads();

// Container Orchestrator

/**
* Define if Airbyte should use Scheduler V2. Internal-use only.
*/
boolean getContainerOrchestratorEnabled();

enum TrackingStrategy {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ public class EnvConfigs implements Configs {
private static final String CONFIGS_DATABASE_INITIALIZATION_TIMEOUT_MS = "CONFIGS_DATABASE_INITIALIZATION_TIMEOUT_MS";
private static final String JOBS_DATABASE_MINIMUM_FLYWAY_MIGRATION_VERSION = "JOBS_DATABASE_MINIMUM_FLYWAY_MIGRATION_VERSION";
private static final String JOBS_DATABASE_INITIALIZATION_TIMEOUT_MS = "JOBS_DATABASE_INITIALIZATION_TIMEOUT_MS";
private static final String CONTAINER_ORCHESTRATOR_ENABLED = "CONTAINER_ORCHESTRATOR_ENABLED";

private static final String STATE_STORAGE_S3_BUCKET_NAME = "STATE_STORAGE_S3_BUCKET_NAME";
private static final String STATE_STORAGE_S3_REGION = "STATE_STORAGE_S3_REGION";
Expand Down Expand Up @@ -544,7 +545,7 @@ public String getSubmitterNumThreads() {

@Override
public boolean getContainerOrchestratorEnabled() {
return getEnvOrDefault("CONTAINER_ORCHESTRATOR_ENABLED", false, Boolean::valueOf);
return getEnvOrDefault(CONTAINER_ORCHESTRATOR_ENABLED, false, Boolean::valueOf);
}

// Helpers
Expand Down
1 change: 1 addition & 0 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -256,6 +256,7 @@
* [Change Data Capture (CDC)](understanding-airbyte/cdc.md)
* [Namespaces](understanding-airbyte/namespaces.md)
* [Json to Avro Conversion](understanding-airbyte/json-avro-conversion.md)
* [Configuring Airbyte](understanding-airbyte/configuring-airbyte.md)
* [Glossary of Terms](understanding-airbyte/glossary.md)
* [API documentation](api-documentation.md)
* [Project Overview](project-overview/README.md)
Expand Down
4 changes: 2 additions & 2 deletions docs/operator-guides/scaling-airbyte.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,9 @@ This is a **non-issue** for users running Airbyte Docker.

### Temporal DB

Temporal maintains multiple idle connexions. By the default value is `20` and you may want to lower or increase this number. One issue we noticed is
Temporal maintains multiple idle connections. By the default value is `20` and you may want to lower or increase this number. One issue we noticed is
that temporal creates multiple pools and the number specified in the `SQL_MAX_IDLE_CONNS` environment variable of the `docker.compose.yaml` file
might end up allowing 4-5 times more connexions than expected.
might end up allowing 4-5 times more connections than expected.

If you want tho increase the amount of allowed idle connexion, you will also need to increase `SQL_MAX_CONNS` as well because `SQL_MAX_IDLE_CONNS`
is capped by `SQL_MAX_CONNS`.
Expand Down
Loading

0 comments on commit 8c3c68c

Please sign in to comment.