Documentation
Databricks

Databricks

Databricks (opens in a new tab) is a unified data intelligence platform.

Prerequisites

Setup

Environment Variables

Add the following to a .env file in your Cube project:

CUBEJS_DB_TYPE=databricks-jdbc
# CUBEJS_DB_NAME is optional
CUBEJS_DB_NAME=default
# You can find this inside the cluster's configuration
CUBEJS_DB_DATABRICKS_URL=jdbc:databricks://dbc-XXXXXXX-XXXX.cloud.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/XXXXX/XXXXX;AuthMech=3;UID=token
# You can specify the personal access token separately from `CUBEJS_DB_DATABRICKS_URL` by doing this:
CUBEJS_DB_DATABRICKS_TOKEN=XXXXX
# This accepts the Databricks usage policy and must be set to `true` to use the Databricks JDBC driver
CUBEJS_DB_DATABRICKS_ACCEPT_POLICY=true

Docker

Create a .env file as above, then extend the cubejs/cube:jdk Docker image tag to build a Cube image with the JDBC driver:

FROM cubejs/cube:jdk
 
COPY . .
RUN npm install

You can then build and run the image using the following commands:

docker build -t cube-jdk .
docker run -it -p 4000:4000 --env-file=.env cube-jdk

Environment Variables

Environment VariableDescriptionPossible ValuesRequired
CUBEJS_DB_NAMEThe name of the database to connect toA valid database name
CUBEJS_DB_DATABRICKS_URLThe URL for a JDBC connectionA valid JDBC URL
CUBEJS_DB_DATABRICKS_ACCEPT_POLICYWhether or not to accept the license terms for the Databricks JDBC drivertrue, false
CUBEJS_DB_DATABRICKS_TOKENThe personal access token (opens in a new tab) used to authenticate the Databricks connectionA valid token
CUBEJS_DB_DATABRICKS_CATALOGThe name of the Databricks catalog (opens in a new tab) to connect toA valid catalog name
CUBEJS_DB_EXPORT_BUCKET_MOUNT_DIRThe path for the Databricks DBFS mount (opens in a new tab)A valid mount path
CUBEJS_CONCURRENCYThe number of concurrent connections each queue has to the database. Default is 2A valid number
CUBEJS_DB_MAX_POOLThe maximum number of concurrent database connections to pool. Default is 8A valid number

Pre-Aggregation Feature Support

count_distinct_approx

Measures of type count_distinct_approx can be used in pre-aggregations when using Databricks as a source database. To learn more about Databricks's support for approximate aggregate functions, click here (opens in a new tab).

Pre-Aggregation Build Strategies

To learn more about pre-aggregation build strategies, head here.

FeatureWorks with read-only mode?Is default?
Simple
Export Bucket

By default, Databricks JDBC uses a simple strategy to build pre-aggregations.

Simple

No extra configuration is required to configure simple pre-aggregation builds for Databricks.

Export Bucket

Databricks supports using both AWS S3 (opens in a new tab) and Azure Blob Storage (opens in a new tab) for export bucket functionality.

AWS S3

To use AWS S3 as an export bucket, first complete the Databricks guide on mounting S3 buckets to Databricks DBFS (opens in a new tab).

Ensure the AWS credentials are correctly configured in IAM to allow reads and writes to the export bucket in S3.

CUBEJS_DB_EXPORT_BUCKET_TYPE=s3
CUBEJS_DB_EXPORT_BUCKET=s3://my.bucket.on.s3
CUBEJS_DB_EXPORT_BUCKET_AWS_KEY=<AWS_KEY>
CUBEJS_DB_EXPORT_BUCKET_AWS_SECRET=<AWS_SECRET>
CUBEJS_DB_EXPORT_BUCKET_AWS_REGION=<AWS_REGION>

Azure Blob Storage

To use Azure Blob Storage as an export bucket, follow the Databricks guide on mounting Azure Blob Storage to Databricks DBFS (opens in a new tab).

Retrieve the storage account access key (opens in a new tab) from your Azure account and use as follows:

CUBEJS_DB_EXPORT_BUCKET_TYPE=azure
CUBEJS_DB_EXPORT_BUCKET=wasbs://my-bucket@my-account.blob.core.windows.net
CUBEJS_DB_EXPORT_BUCKET_AZURE_KEY=<AZURE_STORAGE_ACCOUNT_ACCESS_KEY>

SSL/TLS

Cube does not require any additional configuration to enable SSL/TLS for Databricks JDBC connections.

Additional Configuration

Cube Cloud

To accurately show partition sizes in the Cube Cloud APM, an export bucket must be configured.