DuckDB
DuckDB (opens in a new tab) is an in-process SQL OLAP database management system, and has
support for querying data in CSV, JSON and Parquet formats from an AWS
S3-compatible blob storage. This means you can query data stored in AWS S3,
Google Cloud Storage, or Cloudflare R2 (opens in a new tab).
You can also use the CUBEJS_DB_DUCKDB_DATABASE_PATH
environment variable to
connect to a local DuckDB database.
Cube can also connect to MotherDuck (opens in a new tab), a cloud-based serverless analytics platform built on DuckDB. When connected to MotherDuck, DuckDB uses hybrid execution (opens in a new tab) and routes queries to S3 through MotherDuck for better performance.
Prerequisites
- A set of IAM credentials which allow access to the S3-compatible data source. Credentials are only required for private S3 buckets.
- The region of the bucket
- The name of a bucket to query data from
Setup
Manual
Add the following to a .env
file in your Cube project:
CUBEJS_DB_TYPE=duckdb
Cube Cloud
In Cube Cloud, select DuckDB when creating a new deployment and fill in the required fields:
If you are not using MotherDuck, leave the MotherDuck Token field blank.
You can also explore how DuckDB works with Cube if you create a demo deployment in Cube Cloud.
Environment Variables
Environment Variable | Description | Possible Values | Required |
---|---|---|---|
CUBEJS_DB_DUCKDB_MEMORY_LIMIT | The maximum memory limit for DuckDB. Equivalent to SET memory_limit=<MEMORY_LIMIT> . Default is 75% of available RAM | A valid memory limit | ❌ |
CUBEJS_DB_DUCKDB_SCHEMA | The default search schema (opens in a new tab) | A valid schema name | ❌ |
CUBEJS_DB_DUCKDB_MOTHERDUCK_TOKEN | The service token to use for connections to MotherDuck | A valid MotherDuck service token (opens in a new tab) | ❌ |
CUBEJS_DB_DUCKDB_DATABASE_PATH | The database filepath to use for connection to a local database. | A valid duckdb database file path | ❌ |
CUBEJS_DB_DUCKDB_S3_ACCESS_KEY_ID | The Access Key ID to use for database connections | A valid Access Key ID | ❌ |
CUBEJS_DB_DUCKDB_S3_SECRET_ACCESS_KEY | The Secret Access Key to use for database connections | A valid Secret Access Key | ❌ |
CUBEJS_DB_DUCKDB_S3_ENDPOINT | The S3 endpoint | A valid S3 endpoint (opens in a new tab) | ❌ |
CUBEJS_DB_DUCKDB_S3_REGION | The region of the bucket (opens in a new tab) | A valid AWS region | ❌ |
CUBEJS_CONCURRENCY | The number of concurrent connections each queue has to the database. Default is 2 | A valid number | ❌ |
CUBEJS_DB_DUCKDB_S3_USE_SSL | Use SSL for connection | A boolean | ❌ |
CUBEJS_DB_DUCKDB_S3_URL_STYLE | To choose the S3 URL style(vhost or path) | 'vhost' or 'path' | ❌ |
CUBEJS_DB_DUCKDB_S3_SESSION_TOKEN | The token for the S3 session | A valid Session Token | ❌ |
Pre-Aggregation Feature Support
count_distinct_approx
Measures of type
count_distinct_approx
can
be used in pre-aggregations when using DuckDB as a source database. To learn
more about DuckDB's support for approximate aggregate functions, click
here (opens in a new tab).
Pre-Aggregation Build Strategies
To learn more about pre-aggregation build strategies, head here.
Feature | Works with read-only mode? | Is default? |
---|---|---|
Batching | ❌ | ✅ |
Export Bucket | - | - |
By default, DuckDB uses a batching strategy to build pre-aggregations.
Batching
No extra configuration is required to configure batching for DuckDB.
Export Bucket
DuckDB does not support export buckets.
SSL
Cube does not require any additional configuration to enable SSL as DuckDB connections are made over HTTPS.