AWS Athena
Prerequisites
- A set of IAM credentials (opens in a new tab) which allow access to AWS Athena (opens in a new tab)
- The AWS region (opens in a new tab)
- The S3 bucket (opens in a new tab) on AWS to store query results (opens in a new tab)
Setup
Manual
Add the following to a .env
file in your Cube project:
CUBEJS_DB_TYPE=athena
CUBEJS_AWS_KEY=AKIA************
CUBEJS_AWS_SECRET=****************************************
CUBEJS_AWS_REGION=us-east-1
CUBEJS_AWS_S3_OUTPUT_LOCATION=s3://my-athena-output-bucket
CUBEJS_AWS_ATHENA_WORKGROUP=primary
CUBEJS_AWS_ATHENA_CATALOG=AwsDataCatalog
Cube Cloud
In some cases you'll need to allow connections from your Cube Cloud deployment IP address to your database. You can copy the IP address from either the Database Setup step in deployment creation, or from Settings → Configuration in your deployment.
In Cube Cloud, select AWS Athena when creating a new deployment and fill in the required fields:
Cube Cloud also supports connecting to data sources within private VPCs if dedicated infrastructure is used. Check out the VPC connectivity guide for details.
Environment Variables
Environment Variable | Description | Possible Values | Required |
---|---|---|---|
CUBEJS_AWS_KEY | The AWS Access Key ID to use for database connections | A valid AWS Access Key ID | ✅ |
CUBEJS_AWS_SECRET | The AWS Secret Access Key to use for database connections | A valid AWS Secret Access Key | ✅ |
CUBEJS_AWS_REGION | The AWS region of the Cube deployment | A valid AWS region (opens in a new tab) | ✅ |
CUBEJS_AWS_S3_OUTPUT_LOCATION | The S3 path to store query results made by the Cube deployment | A valid S3 path | ❌ |
CUBEJS_AWS_ATHENA_WORKGROUP | The name of the workgroup in which the query is being started | A valid Athena Workgroup (opens in a new tab) | ❌ |
CUBEJS_AWS_ATHENA_CATALOG | The name of the catalog to use by default | A valid Athena Catalog name (opens in a new tab) | ❌ |
CUBEJS_DB_SCHEMA | The name of the schema to use as information_schema filter. Reduces count of tables loaded during schema generation. | A valid schema name | ❌ |
CUBEJS_CONCURRENCY | The number of concurrent connections each queue has to the database. Default is 5 | A valid number | ❌ |
Pre-Aggregation Feature Support
count_distinct_approx
Measures of type
count_distinct_approx
can
be used in pre-aggregations when using AWS Athena as a source database. To learn
more about AWS Athena's support for approximate aggregate functions, click
here (opens in a new tab).
Pre-Aggregation Build Strategies
To learn more about pre-aggregation build strategies, head here.
Feature | Works with read-only mode? | Is default? |
---|---|---|
Batching | ❌ | ✅ |
Export Bucket | ❌ | ❌ |
By default, AWS Athena uses a batching strategy to build pre-aggregations.
Batching
No extra configuration is required to configure batching for AWS Athena.
Export Bucket
AWS Athena only supports using AWS S3 for export buckets.
AWS S3
For improved pre-aggregation performance with large datasets, enable export bucket functionality by configuring Cube with the following environment variables:
Ensure the AWS credentials are correctly configured in IAM to allow reads and writes to the export bucket in S3.
CUBEJS_DB_EXPORT_BUCKET_TYPE=s3
CUBEJS_DB_EXPORT_BUCKET=my.bucket.on.s3
CUBEJS_DB_EXPORT_BUCKET_AWS_KEY=<AWS_KEY>
CUBEJS_DB_EXPORT_BUCKET_AWS_SECRET=<AWS_SECRET>
CUBEJS_DB_EXPORT_BUCKET_AWS_REGION=<AWS_REGION>
SSL
Cube does not require any additional configuration to enable SSL as AWS Athena connections are made over HTTPS.