Using dynamic union tables
Use case
Sometimes, you may have a lot of tables in a database, which actually relate to the same entity.
For example, you can have “per client” tables with the same data, but related to
different customers: elon_musk_table
, john_doe_table
, steve_jobs_table
,
etc. In this case, it would make sense to create a single cube
for customers, which should be backed by a union table from all customers tables.
Data modeling
You can use the sql
parameter to define a cube over an
arbitrary SQL query, e.g., a query that includes UNION
or UNION ALL
operators:
cubes:
- name: customers
sql: >
SELECT *, 'Einstein' AS name FROM einstein_data UNION ALL
SELECT *, 'Pascal' AS name FROM pascal_data UNION ALL
SELECT *, 'Newton' AS name FROM newton_data
measures:
- name: count
type: count
dimensions:
- name: name
sql: name
type: string
However, it can be quite annoying to write the SQL to union all tables manually. Luckily, you can use dynamic data modeling to generate necessary SQL based on a list of tables:
{%- set customer_tables = {
"einstein_data": "Einstein",
"pascal_data": "Pascal",
"newton_data": "Newton"
} -%}
cubes:
- name: customers
sql: >
{%- for table, name in customer_tables | items %}
SELECT *, '{{ name | safe }}' AS name FROM {{ table | safe }}
{% if not loop.last %}UNION ALL{% endif %}
{% endfor %}
measures:
- name: count
type: count
dimensions:
- name: name
sql: name
type: string
Result
Querying count
and name
members of the dynamically defined customers
cube
would result in the following generated SQL:
SELECT
"customers".name "customers__name",
count(*) "customers__count"
FROM
(
SELECT
*,
'Einstein' AS name
FROM
einstein_data
UNION ALL
SELECT
*,
'Pascal' AS name
FROM
pascal_data
UNION ALL
SELECT
*,
'Newton' AS name
FROM
newton_data
) AS "customers"
GROUP BY
1
ORDER BY
2 DESC