Databricks
Databricks (opens in a new tab) is a unified data intelligence platform.
Prerequisites
- A JDK installation (opens in a new tab)
- The JDBC URL (opens in a new tab) for the Databricks (opens in a new tab) cluster
Setup
Environment Variables
Add the following to a .env
file in your Cube project:
CUBEJS_DB_TYPE=databricks-jdbc
# CUBEJS_DB_NAME is optional
CUBEJS_DB_NAME=default
# You can find this inside the cluster's configuration
CUBEJS_DB_DATABRICKS_URL=jdbc:databricks://dbc-XXXXXXX-XXXX.cloud.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/XXXXX/XXXXX;AuthMech=3;UID=token
# You can specify the personal access token separately from `CUBEJS_DB_DATABRICKS_URL` by doing this:
CUBEJS_DB_DATABRICKS_TOKEN=XXXXX
# This accepts the Databricks usage policy and must be set to `true` to use the Databricks JDBC driver
CUBEJS_DB_DATABRICKS_ACCEPT_POLICY=true
Docker
Create a .env
file as above, then extend the
cubejs/cube:jdk
Docker image tag to build a Cube image with the JDBC driver:
FROM cubejs/cube:jdk
COPY . .
RUN npm install
You can then build and run the image using the following commands:
docker build -t cube-jdk .
docker run -it -p 4000:4000 --env-file=.env cube-jdk
Environment Variables
Environment Variable | Description | Possible Values | Required |
---|---|---|---|
CUBEJS_DB_NAME | The name of the database to connect to | A valid database name | ✅ |
CUBEJS_DB_DATABRICKS_URL | The URL for a JDBC connection | A valid JDBC URL | ✅ |
CUBEJS_DB_DATABRICKS_ACCEPT_POLICY | Whether or not to accept the license terms for the Databricks JDBC driver | true , false | ✅ |
CUBEJS_DB_DATABRICKS_TOKEN | The personal access token (opens in a new tab) used to authenticate the Databricks connection | A valid token | ✅ |
CUBEJS_DB_DATABRICKS_CATALOG | The name of the Databricks catalog (opens in a new tab) to connect to | A valid catalog name | ❌ |
CUBEJS_DB_EXPORT_BUCKET_MOUNT_DIR | The path for the Databricks DBFS mount (opens in a new tab) | A valid mount path | ❌ |
CUBEJS_CONCURRENCY | The number of concurrent connections each queue has to the database. Default is 2 | A valid number | ❌ |
CUBEJS_DB_MAX_POOL | The maximum number of concurrent database connections to pool. Default is 8 | A valid number | ❌ |
Pre-Aggregation Feature Support
count_distinct_approx
Measures of type
count_distinct_approx
can
be used in pre-aggregations when using Databricks as a source database. To learn
more about Databricks's support for approximate aggregate functions, click
here (opens in a new tab).
Pre-Aggregation Build Strategies
To learn more about pre-aggregation build strategies, head here.
Feature | Works with read-only mode? | Is default? |
---|---|---|
Simple | ✅ | ✅ |
Export Bucket | ❌ | ❌ |
By default, Databricks JDBC uses a simple strategy to build pre-aggregations.
Simple
No extra configuration is required to configure simple pre-aggregation builds for Databricks.
Export Bucket
Databricks supports using both AWS S3 (opens in a new tab) and Azure Blob Storage (opens in a new tab) for export bucket functionality.
AWS S3
To use AWS S3 as an export bucket, first complete the Databricks guide on mounting S3 buckets to Databricks DBFS (opens in a new tab).
Ensure the AWS credentials are correctly configured in IAM to allow reads and writes to the export bucket in S3.
CUBEJS_DB_EXPORT_BUCKET_TYPE=s3
CUBEJS_DB_EXPORT_BUCKET=s3://my.bucket.on.s3
CUBEJS_DB_EXPORT_BUCKET_AWS_KEY=<AWS_KEY>
CUBEJS_DB_EXPORT_BUCKET_AWS_SECRET=<AWS_SECRET>
CUBEJS_DB_EXPORT_BUCKET_AWS_REGION=<AWS_REGION>
Azure Blob Storage
To use Azure Blob Storage as an export bucket, follow the Databricks guide on mounting Azure Blob Storage to Databricks DBFS (opens in a new tab).
Retrieve the storage account access key (opens in a new tab) from your Azure account and use as follows:
CUBEJS_DB_EXPORT_BUCKET_TYPE=azure
CUBEJS_DB_EXPORT_BUCKET=wasbs://my-bucket@my-account.blob.core.windows.net
CUBEJS_DB_EXPORT_BUCKET_AZURE_KEY=<AZURE_STORAGE_ACCOUNT_ACCESS_KEY>
SSL/TLS
Cube does not require any additional configuration to enable SSL/TLS for Databricks JDBC connections.
Additional Configuration
Cube Cloud
To accurately show partition sizes in the Cube Cloud APM, an export bucket must be configured.