Skip to Content

Joins in the SQL API

Views

The best practice is to use views to specify explicit join paths for SQL API queries.

While BI tools would see a view as a table, in fact, no materialization is performed until Cube is queried through the SQL API. At the query time, Cube will try to maximize member pushdown, so only required parts of the view are materialized at query time.

Cube also solves fan and chasm traps based on the dimensions selected in the query, so if measure aggregation types are properly set up, you will see correct results in BI tools even though cubes and views are seen just as tables.

Consider the following data model:

views: - name: orders_users cubes: - join_path: orders includes: - status - count - join_path: orders.users prefix: true includes: - id - city - state
view(`orders_users`, { cubes: [ { join_path: orders, includes: ['status', 'count'] }, { join_path: orders.users, prefix: true, includes: ['id', 'city', 'state'] } ] })

With this data model, here’s how you can get orders count by users’ cities:

cube=> SELECT MEASURE(count) AS count, users_city FROM orders_users GROUP BY 2; count | users_city -------+--------------- 1416 | Los Angeles 1412 | Seattle 1365 | Mountain View 1263 | New York 1220 | Austin 1164 | Chicago 1101 | San Francisco 1059 | Palo Alto (8 rows)

CROSS JOIN and __cubeJoinField

The SQL API also supports joins through the __cubeJoinField virtual column, which allows to control how specific cubes are joined. This is considered an advanced functionality, and views should be used where possible. Join can also be done through CROSS JOIN.

Usage of CROSS JOIN or __cubeJoinField instructs Cube to perform join as it’s defined in a data model while using provided cubes as join hints.

For example, the following query joins the orders and products tables under the hood on orders.product_id = products.id, exactly the same way as the REST API query does:

cube=> SELECT p.name, SUM(o.count) FROM orders o LEFT JOIN products p ON o.__cubeJoinField = p.__cubeJoinField GROUP BY 1 LIMIT 5; name | SUM(o.count) --------------------------+-------------- Tasty Plastic Mouse | 121 Intelligent Cotton Ball | 119 Ergonomic Steel Tuna | 116 Intelligent Rubber Pants | 116 Generic Wooden Gloves | 116 (5 rows)

Or through CROSS JOIN:

cube=> SELECT p.name, sum(o.count) FROM orders o CROSS JOIN products p GROUP BY 1 LIMIT 5; name | SUM(o.count) --------------------------+-------------- Tasty Plastic Mouse | 121 Intelligent Cotton Ball | 119 Ergonomic Steel Tuna | 116 Intelligent Rubber Pants | 116 Generic Wooden Gloves | 116 (5 rows)

In the resulting query plan, you won’t see any joins as you can’t see those for REST API queries either:

cube=> EXPLAIN SELECT p.name, sum(o.count) FROM orders o LEFT JOIN products p ON o.__cubeJoinField = p.__cubeJoinField GROUP BY 1 LIMIT 5; plan_type | plan ---------------+----------------------- logical_plan | CubeScan: request={ + | "measures": [ + | "orders.count" + | ], + | "dimensions": [ + | "products.name" + | ], + | "segments": [], + | "limit": 5 + | } physical_plan | CubeScanExecutionPlan+ | (2 rows)

This feature allows you to join cubes even joined transitively only.

In most of the BI tools you’d use __cubeJoinField to define joins between cube tables. In tools that allow defining custom SQL datasets, you can use joined tables as a dataset SQL. For example:

SELECT o.count as count, p.name as product_name, p.description as product_description FROM orders o CROSS JOIN products p;

Please note we use aliasing to avoid name clashing between cube members in a resulting data set. In this case, wrapped SQL will be properly processed by Cube, pushing down all operations to Cube query:

cube=> SELECT product_name, SUM(count) FROM ( SELECT o.count as count, p.name as product_name, p.description as product_description FROM orders o CROSS JOIN products p ) joined GROUP BY 1 ORDER BY 2 DESC LIMIT 5; product_name | SUM(joined.count) --------------------------+------------------- Tasty Plastic Mouse | 121 Intelligent Cotton Ball | 119 Ergonomic Steel Tuna | 116 Intelligent Rubber Pants | 116 Generic Wooden Gloves | 116 (5 rows)

We can see this by introspecting the EXPLAIN plan for this query:

cube=> EXPLAIN SELECT product_name, SUM(count) FROM ( SELECT o.count as count, p.name as product_name, p.description as product_description FROM orders o CROSS JOIN products p ) joined GROUP BY 1 ORDER BY 2 DESC LIMIT 5; plan_type | plan ---------------+----------------------- logical_plan | CubeScan: request={ + | "measures": [ + | "orders.count" + | ], + | "dimensions": [ + | "products.name" + | ], + | "segments": [], + | "order": [ + | [ + | "orders.count",+ | "desc" + | ] + | ], + | "limit": 5 + | } physical_plan | CubeScanExecutionPlan+ | (2 rows)

Please note that, even if product_description is in the inner selection, it isn’t evaluated in the final query as it isn’t used in any way.

Was this page useful?