SQL API
SQL API enables Cube to deliver data over the Postgres-compatible protocol (opens in a new tab) to certain kinds of data applications, including but not limited to the following ones:
- Most commonly, business intelligence tools, e.g., Tableau, Power BI, ThoughtSpot, Sigma, Looker Studio, Superset / Preset, and Metabase
- Data notebooks, e.g., Jupyter, Hex, or Deepnote
- Reverse ETL tools, e.g., Census or Hightouch
Often, the SQL API is used to enable internal or self-serve business intelligence (opens in a new tab) use cases.
Please use this GitHub issue (opens in a new tab) to suggest tools of your interest and vote for already proposed ones.
Example request
You can use the psql
utility (opens in a new tab) to connect to the SQL API:
# Cube Core
PGPASSWORD=password \
psql -h localhost \
-p 15432 \
-U user cube
# Cube Cloud
PGPASSWORD=password \
psql -h awesome-ecom.sql.gcp-us-central1.cubecloudapp.dev \
-p 5432 \
-U cube awesome-ecom
Then, you can run queries in the Postgres dialect, just like the following one:
SELECT
users.state,
users.city,
orders.status,
MEASURE(orders.count)
FROM orders
CROSS JOIN users
WHERE
users.state != 'us-wa'
AND orders.created_at BETWEEN '2020-01-01' AND '2021-01-01'
GROUP BY 1, 2, 3
LIMIT 10;
You can also introspect the data model in a Postgres-native way by querying tables
in information_schema
(opens in a new tab) or using backslash
commands (opens in a new tab):
SELECT *
FROM information_schema.tables
WHERE table_schema = 'public';
\d
Check query format for details about supported queries. Also, check SQL API reference for the list of supported SQL commands, functions, and operators.
Fundamentals
In the SQL API, each cube or view from the data model is represented as a table. Measures, dimensions, and segments are represented as columns in these tables. SQL API can execute regular queries, queries with post-processing, and queries with pushdown that can reference these tables and columns.
Under the hood, the SQL API uses Apache DataFusion (opens in a new tab) as its query engine. It's responsible for query planning and execution. As part of query planning, the SQL API also uses egg (opens in a new tab) (an e-graph term rewriting (opens in a new tab) library) to analyze incoming SQL queries and find the best query plan out of a wide variety of possible plans to execute.
Overall, query planning is a seamless process. SQL API does its best to execute a query as a regular query or a query with post-processing. If that is not possible, then the query would be executed as a query with pushdown. There are trade-offs associated with each query type:
Query type | In-memory cache | Pre-aggregations | SQL support |
---|---|---|---|
Regular queries | β Used | β Used | π‘ Very limited |
Queries with post-processing | β Used | β Used | π‘ Limited |
Queries with pushdown | β Used | β Not used | β Extensive |
Query pushdown in the SQL API is available in public preview. Read more (opens in a new tab) in the blog.
Query pushdown is disabled by default. You should explicitly enable it. In future versions, it will be enabled by default. Also, enabling query pushdown would affect how ungrouped queries are executed; check query format for details.
Configuration
Cube Core
SQL API is disabled by default. To enable the SQL API, set CUBEJS_PG_SQL_PORT
to a port number you'd like to connect to with a Postgres-compatible tool.
Credential | Environment variable, etc. |
---|---|
Host | Host you're running Cube at |
Port | Set via CUBEJS_PG_SQL_PORT |
User name | Set via CUBEJS_SQL_USER |
Password | Set via CUBEJS_SQL_PASSWORD |
Database | Any valid string, e.g., cube |
You can also use
checkSqlAuth
,
canSwitchSqlUser
, and CUBEJS_SQL_SUPER_USER
to configure custom authentication.
Example
The following Docker Compose file will run Cube with the SQL API enabled on
port 15432, accessible using user
as the user name, password
as the password,
and any string as the database name:
services:
cube:
image: cubejs/cube:latest
ports:
- 4000:4000
- 15432:15432
environment:
- CUBEJS_DEV_MODE=true
- CUBEJS_API_SECRET=SECRET
- CUBEJS_DB_USER=cube
- CUBEJS_DB_PASS=12345
- CUBEJS_DB_HOST=demo-db-examples.cube.dev
- CUBEJS_DB_NAME=ecom
- CUBEJS_DB_TYPE=postgres
- CUBEJS_PG_SQL_PORT=15432 # SQL API credential
- CUBEJS_SQL_USER=user # SQL API credential
- CUBEJS_SQL_PASSWORD=password # SQL API credential
volumes:
- .:/cube/conf
After running it with docker compose up
, you can finally connect and execute
an example request.
Cube Cloud
SQL API is enabled by default. To find your SQL API endpoint and credentials in Cube Cloud, go to the Overview page, click API credentials, and choose the SQL API tab.
By default, the SQL API is enabled on port 5432, the user name is cube
, and
a random string is generated for the password. You can customize these with
CUBEJS_PG_SQL_PORT
, CUBEJS_SQL_USER
, and CUBEJS_SQL_PASSWORD
environment
variables by navigating toΒ Settings β Configration.
Query planning
By default, the SQL API executes queries as regular queries
or queries with post-processing. Such queries support only a limited
set of SQL functions and operators, and sometimes you can get the following error:
Error during rewrite: Can't detect Cube query and it may be not supported yet.
You can use the CUBESQL_SQL_PUSH_DOWN
environment variable to instruct the SQL API
to execute such queries as queries with pushdown.
Query pushdown in the SQL API is available in public preview. Read more (opens in a new tab) in the blog.
Query planning is a resource-intensive task, and sometimes you can get the following
error: Error during rewrite: Can't find rewrite due to 10002 AST node limit reached.
Use the following environment variables to allocate more resources for query planning:
CUBESQL_REWRITE_MAX_NODES
, CUBESQL_REWRITE_MAX_ITERATIONS
, CUBESQL_REWRITE_TIMEOUT
.
Streaming
By default, query results are loaded in a single batch. However, a more effective
streaming mode can be used for large result sets. To enable it, set the
CUBESQL_STREAM_MODE
environment variable to true
.
When the streaming mode is enabled, the maximum row limit does not apply to SQL API queries. They can return an unlimited number of rows.
Session limit
Each concurrent connection to the SQL API consumes some resources and attempting
to establish too many connections at once can lead to an out-of-memory crash.
You can use the CUBEJS_MAX_SESSIONS
environment variable to adjust the session
limit.