Querying concurrency
All queries to data APIs are processed asynchronously via a query queue. It allows to optimize the load and increase querying performance.
Query queue
The query queue allows to deduplicate queries to API instances and insulate upstream data sources from query spikes. It also allows to execute queries to data sources concurrently for increased performance.
By default, Cube uses a single query queue for queries from all API instances and the refresh worker to all configured data sources.
You can read more about the query queue in the this blog post (opens in a new tab).
Multiple query queues
You can use the context_to_orchestrator_id
configuration option to route queries to multiple queues based on the security
context.
If you're configuring multiple connections to data sources via the driver_factory
configuration option, you must also configure
context_to_orchestrator_id
to ensure that queries are routed to correct queues.
Data sources
Cube supports various kinds of data sources, ranging from cloud data warehouses to embedded databases. Each data source scales differently, therefore Cube provides sound defaults for each kind of data source out-of-the-box.
Data source concurrency
By default, Cube uses the following concurrency settings for data sources:
Data source | Default concurrency |
---|---|
Amazon Athena | 10 |
Amazon Redshift | 5 |
Apache Pinot | 10 |
ClickHouse | 10 |
Databricks | 10 |
Firebolt | 10 |
Google BigQuery | 10 |
Snowflake | 8 |
All other data sources | 5 or less, if specified in the driver (opens in a new tab) |
You can use the CUBEJS_CONCURRENCY
environment variable to adjust the maximum
number of concurrent queries to a data source. It's recommended to use the default
configuration unless you're sure that your data source can handle more concurrent
queries.
Connection pooling
For data sources that support connection pooling, the maximum number of concurrent
connections to the database can also be set by using the CUBEJS_DB_MAX_POOL
environment variable. If changing this from the default, you must ensure that the
new value is greater than the number of concurrent connections used by Cube's query
queues and the refresh worker.
Refresh worker
By default, the refresh worker uses the same concurrency settings as API instances. However, you can override this behvaior in the refresh worker configuration.