Skip to content

syedqasimahmed/dremio-bigquery-connector

 
 

Repository files navigation

Dremio BigQuery Connector

Build Status Last Commit Latest Release License Platform

Overview

This is a community based BigQuery Dremio connector made using the ARP framework. Check Dremio Hub for more examples and ARP Docs for documentation.

What is Dremio?

Dremio delivers lightning fast query speed and a self-service semantic layer operating directly against your data lake storage and other sources. No moving data to proprietary data warehouses or creating cubes, aggregation tables and BI extracts. Just flexibility and control for Data Architects, and self-service for Data Consumers.

Use Cases

  • Join data from BigQuery with other sources (On prem/Cloud)
  • Interactive SQL performance with Data Reflections
  • Offload BigQuery tables using CTAS to your cheap data lake storage - HDFS, S3, ADLS
  • Curate Datasets easily through the self-service platform

Usage

Creating a new BigQuery Source

Required Parameters

  • Google Cloud Project ID
    • Ex: my-big-project-name.
  • Service Account Email & JSON Key
    • You will need to generate an IAM service account with access to the BigQuery resources you want to query. You will need the contents of the JSON key for that account, as well as the email address associated with it.

Development

Building and Installation

  1. Download and install the Google Simba BigQuery JDBC driver from the Google website. Install the main JAR file into your local Maven repository with the following (update the path to match the download path):
    mvn install:install-file \
        -Dfile=/Users/build/Downloads/SimbaJDBCDriverforGoogleBigQuery42_1.2.11.1014/GoogleBigQueryJDBC42.jar \
        -DgroupId=com.simba.googlebigquery \
        -DartifactId=googlebigquery-jdbc42 \
        -Dversion=1.2.11.1014 \
        -Dpackaging=jar \
        -DgeneratePom=true
  1. Generate a shaded BigQuery JDBC client JAR file by running mvn clean install inside the bigquery-driver-shade directory.
  2. In root directory with the pom.xml file run mvn clean install -DskipTests.
  3. Take the resulting .jar file in the target folder and put it in the <DREMIO_HOME>\jars folder in Dremio.
  4. Restart Dremio

Debugging

To debug pushdowns for queries set the following line in logback.xml

  <logger name="com.dremio.exec.store.jdbc">
    <level value="${dremio.log.level:-trace}"/>
  </logger>

You can then notice lines like below in server.log file after which you can revist the YAML file to add pushdowns:

- 2019-07-11 18:56:24,001 [22d879a7-ce3d-f2ca-f380-005a88865700/0:foreman-planning] DEBUG c.d.e.store.jdbc.dialect.arp.ArpYaml - Operator / not supported. Aborting pushdown.

You can also take a look at the planning tab/visualized plan of the profile to determine if everything is pushed down or not.

Contribution

Submitting an issue

Pull Requests

PRs are welcome. When submitting a PR make sure of the following:

  • Try to follow Google's Java style coding when modifying/creating Java related content.
  • Use a YAML linter to check the syntactic correctness of YAML file
  • Make sure the build passes
  • Run basic queries at least to ensure things are working properly

About

BigQuery Data Connector for Dremio

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 100.0%