Joining data from multiple data sources
Use case
Let's imagine we store information about products and their suppliers in separate databases. We want to aggregate data from these data sources while having decent performance. In the recipe below, we'll learn how to create a rollup join between two databases to achieve our goal.
Configuration
First of all, we should define our database connections with the dataSource
option in the cube.js
configuration file:
module.exports = {
driverFactory: ({ dataSource }) => {
if (dataSource === "suppliers") {
return {
type: "postgres",
database: "recipes",
host: "demo-db-recipes.cube.dev",
user: "cube",
password: "12345",
port: "5432",
};
}
if (dataSource === "products") {
return {
type: "postgres",
database: "ecom",
host: "demo-db-recipes.cube.dev",
user: "cube",
password: "12345",
port: "5432",
};
}
throw new Error("dataSource is undefined");
},
};
Data modeling
First, we'll define
rollup
pre-aggregations for products
and suppliers
. Note that these
pre-aggregations should contain the dimension on which they're joined. In this
case, it's the supplier_id
dimension in the products
cube, and the id
dimension in the suppliers
cube:
cubes:
- name: products
# ...
pre_aggregations:
- name: products_rollup
type: rollup
dimensions:
- name
- supplier_id
indexes:
- name: category_index
columns:
- supplier_id
joins:
suppliers:
sql: "{supplier_id} = ${suppliers.id}"
relationship: many_to_one
cubes:
- name: suppliers
# ...
pre_aggregations:
- name: suppliers_rollup
type: rollup
dimensions:
- id
- company
- email
indexes:
- name: category_index
columns:
- id
Then, we'll also define a rollup_join
pre-aggregation in the products
cube,
which will enable aggregating data from multiple data sources:
cubes:
- name: products
# ...
pre_aggregations:
- name: combined_rollup
type: rollup_join
dimensions:
- suppliers.email
- suppliers.company
- name
rollups:
- suppliers.suppliers_rollup
- products_rollup
Query
Let's get the product names and their suppliers' info, such as company name and email, with the following query:
{
"order": {
"products.name": "asc"
},
"dimensions": ["products.name", "suppliers.company", "suppliers.email"],
"limit": 3
}
Result
We'll get the data from two pre-aggregations joined into one rollup_join
:
[
{
"products.name": "Awesome Cotton Sausages",
"suppliers.company": "Justo Eu Arcu Inc.",
"suppliers.email": "id.risus@luctuslobortisClass.net"
},
{
"products.name": "Awesome Fresh Keyboard",
"suppliers.company": "Quisque Purus Sapien Limited",
"suppliers.email": "Cras@consectetuercursuset.co.uk"
},
{
"products.name": "Awesome Rubber Soap",
"suppliers.company": "Tortor Inc.",
"suppliers.email": "Mauris@ac.com"
}
]
Source code
Please feel free to check out the
full source code (opens in a new tab)
or run it with the docker-compose up
command. You'll see the result, including
queried data, in the console.