Connecting to data sources
Choose a data source to get started with below.
Note that Cube also supports connecting to multiple data sources out of the box.
Data warehouses
Query engines
Transactional databases
Time series databases
Streaming
Other data sources
API endpoints
Cube is designed to work with data sources that allow querying them with SQL.
Cube is not designed to access data files directly or fetch data from REST, or GraphQL, or any other API. To use Cube in that way, you either need to use a supported data source (e.g., use DuckDB to query Parquet files on Amazon S3) or create a custom data source driver.
Data source drivers
Driver support
Most of the drivers for data sources are supported either directly by the Cube team or by their vendors. The rest are community-supported and will be highlighted as such in their respective pages.
You can find the source code (opens in a new tab) of the drivers that are part
of the Cube distribution in cubejs-*-driver
folders on GitHub.
Third-party drivers
The following drivers were contributed by the Cube community. They are not part of the Cube distribution, however, they can still be used with Cube:
- ArangoDB (opens in a new tab)
- CosmosDB (opens in a new tab)
- CrateDB (opens in a new tab)
- Dremio (opens in a new tab)
- Dremio ODBC (opens in a new tab)
- OpenDistro Elastic (opens in a new tab)
- SAP Hana (opens in a new tab)
- Trino (opens in a new tab)
- Vertica (opens in a new tab)
You need to configure driver_factory
to use a third-party
driver.
Currently unsupported data sources
If you'd like to connect to a data source which is not yet listed on this page, please see the list of requested drivers (opens in a new tab) and file an issue (opens in a new tab) on GitHub.
You're more than welcome to contribute new drivers as well as new features and
patches to
existing drivers (opens in a new tab). Please
check the
contribution guidelines (opens in a new tab)
and join the #contributing-to-cube
channel in our
Slack community (opens in a new tab).
You can contact us (opens in a new tab) to discuss an integration with a currently unsupported data source. We might be able to assist Cube Cloud users on the Enterprise Premier (opens in a new tab) product tier.
Concurrency and pooling
All Cube database drivers come with presets for concurrency and pooling that work out-of-the-box. The following information is included as a reference.
For increased performance, Cube uses multiple concurrent connections to
configured data sources. The CUBEJS_CONCURRENCY
environment variable controls
concurrency settings for query queues and the refresh scheduler as well as the
maximum concurrent connections.
For databases that support connection pooling,
the maximum number of concurrent connections to the database can also be set by
using the CUBEJS_DB_MAX_POOL
environment variable; if changing this from the
default, you must ensure that the new value is greater than the number of
concurrent connections used by Cube's query queues and refresh scheduler.