Create your first data model
Cube follows a dataset-oriented data modeling approach, which is inspired by and expands upon dimensional modeling. Cube incorporates this approach and provides a practical framework for implementing dataset-oriented data modeling.
When building a data model in Cube, you work with two dataset-centric objects: cubes and views. Cubes usually represent business entities such as customers, line items, and orders. In cubes, you define all the calculations within the measures and dimensions of these entities. Additionally, you define relationships between cubes, such as "an order has many line items" or "a user may place multiple orders."
Views sit on top of a data graph of cubes and create a facade of your entire data model, with which data consumers can interact. You can think of views as the final data products for your data consumers - BI users, data apps, AI agents, etc. When building views, you select measures and dimensions from different connected cubes and present them as a single dataset to BI or data apps.
Working with cubes
To begin building your data model, click on Enter Development Mode in Cube Cloud. This will take you to your personal developer space, where you can safely make changes to your data model without affecting the production environment.
In the previous section, we generated four cubes from the Snowflake schema. To see the data graph of these four cubes and how they are connected to each other, click the Show Graph button on the Data Model page.
Let's review the orders
cube first and update it with additional dimensions
and measures.
Once you are in developer mode, navigate to the Data Model and click
on the orders.yml
file in the left sidebar inside the model/cubes
directory
to open it.
You should see the following content of model/cubes/orders.yml
file.
cubes:
- name: orders
sql_table: ECOM.ORDERS
joins:
- name: users
sql: "{CUBE}.USER_ID = {users}.ID"
relationship: many_to_one
dimensions:
- name: status
sql: STATUS
type: string
- name: id
sql: ID
type: number
primary_key: true
- name: created_at
sql: CREATED_AT
type: time
- name: completed_at
sql: COMPLETED_AT
type: time
measures:
- name: count
type: count
As you can see, we already have a count
measure that we can use to calculate
the total count of our orders.
Let's add an additional measure to the orders
cube to calculate only
completed orders. The status
dimension in the orders
cube reflects the
three possible statuses: processing, shipped, or completed. We will
create a new measure completed_count
by using a filter on that dimension. To
do this, we will use a
filter parameter of the
measure and refer to the existing dimension.
Add the following measure definition to your model/cubes/orders.yml
file. It
should be included within the measures
block.
- name: completed_count
type: count
filters:
- sql: "{CUBE}.status = 'completed'"
With these two measures in place, count
and completed_count
, we can create a
derived measure. Derived measures are measures that you can create based on
existing measures. Let's create the completed_percentage
derived measure.
Add the following measure definition to your model/cubes/orders.yml
file
within the measures
block.
- name: completed_percentage
type: number
sql: "(100.0 * {CUBE.completed_count} / NULLIF({CUBE.count}, 0))"
format: percent
Below you can see what your updated orders
cube should look like with two new
measures. Feel free to copy this code and paste it into your
model/cubes/order.yml
file.
cubes:
- name: orders
sql_table: ECOM.ORDERS
joins:
- name: users
sql: "{CUBE}.USER_ID = {users}.ID"
relationship: many_to_one
dimensions:
- name: status
sql: STATUS
type: string
- name: id
sql: ID
type: number
primary_key: true
- name: created_at
sql: CREATED_AT
type: time
- name: completed_at
sql: COMPLETED_AT
type: time
measures:
- name: count
type: count
- name: completed_count
type: count
filters:
- sql: "{CUBE}.status = 'completed'"
- name: completed_percentage
type: number
sql: "(100.0 * {CUBE.completed_count} / NULLIF({CUBE.count}, 0))"
format: percent
Click Save All in the upper corner to save changes to the data model. Now, you can navigate to Cube’s Playground. The Playground is a web-based tool that allows you to query your data without connecting any tools or writing any code. It's the fastest way to explore and test your data model.
You can select measures and dimensions from different cubes in playground,
including your newly created completed_percentage
measure.
Working with views
When building views, we recommend following entity-oriented design and structuring your views around your business entities. Usually, cubes tend to be normalized entities without duplicated or redundant members, while views are denormalized entities where you pick as many measures and dimensions from multiple cubes as needed to describe a business entity.
Let's create our first view, which will provide all necessary measures and
dimensions to explore orders. Views are usually located in the views
folder
and have a _view
postfix.
Create model/views/orders_view.yml
with the following content:
views:
- name: orders_view
cubes:
- join_path: orders
includes:
- status
- created_at
- count
- completed_count
- completed_percentage
- join_path: orders.users
prefix: true
includes:
- city
- age
- state
When building views, you can leverage the cubes
parameter, which enables you
to include measures and dimensions from other cubes in the view. You can build
your view by combining multiple joined cubes and specifying the path by which
they should be joined for that particular view.
After saving, you can experiment with your newly created view in the Playground.
In the next section, we will learn how to query our orders_view
using a BI
tool.