Concepts

Cube borrows a lot of terminology from OLAP theory (opens in a new tab), and this document is intended for both newcomers and regular users to refresh their understanding.

We'll use a sample e-commerce database with two tables, orders and line_items to illustrate the concepts throughout this page:

orders

id	status	completed_at	created_at
1	completed	2019-01-05 00:00:00.000000	2019-01-02 00:00:00.000000
2	shipped	2019-01-17 00:00:00.000000	2019-01-02 00:00:00.000000
3	completed	2019-01-27 00:00:00.000000	2019-01-02 00:00:00.000000
4	shipped	2019-01-09 00:00:00.000000	2019-01-02 00:00:00.000000
5	processing	2019-01-29 00:00:00.000000	2019-01-02 00:00:00.000000

line_items

id	product_id	order_id	quantity	price	created_at
1	31	1	1	275	2019-01-31 00:00:00.000000
2	49	2	6	248	2021-01-20 00:00:00.000000
3	89	3	6	197	2021-11-25 00:00:00.000000
4	71	4	8	223	2019-12-23 00:00:00.000000
5	64	5	5	75	2019-04-20 00:00:00.000000

We'll be following the entity-first approach to designing the data model.

See this recipe to learn about entity-first and metrics-first approaches.

Cubes

Cubes represent datasets in Cube and are conceptually similar to views in SQL (opens in a new tab). Cubes are usually declared in separate files with one cube per file. Typically, a cube points to a single table in your database using the sql_table property:

YAML

JavaScript

cubes:
  - name: orders
    sql_table: orders

If you're using dbt, see this recipe to streamline defining cubes on top of dbt models.

You can also use the sql property to accommodate more complex SQL queries:

YAML

JavaScript

cubes:
  - name: orders
    sql: >
      SELECT *
      FROM orders, line_items
      WHERE orders.id = line_items.order_id

Each cube contains the definitions of its members: dimensions, measures, and segments. You can control the access to cubes and their members by configuring the member-level security.

Joins are used to define relations between cubes. Pre-aggregations are used to accelerate queries to cubes. Cubes and their members can be further referenced by views.

Note that cubes support extension, polymorphism, and data blending. Cubes can be defined statically and you can also build dynamic data models.

Cube supports data models that consist of thousands of cubes and views. For large multi-tenancy configurations, e.g., with more than 100 tenants, consider using a multi-cluster deployment.

See the reference documentaton for the full list of cube parameters.

Views

Views sit on top of the data graph of cubes and create a facade of your whole data model with which data consumers can interact. They are useful for defining metrics, managing governance and data access, and controlling ambiguous join paths.

Views do not define their own members. Instead, they reference cubes by specific join paths and include their members.

In the example below, we create the orders view which includes select members from base_orders, products, and users cubes:

YAML

JavaScript

views:
  - name: orders
 
    cubes:
      - join_path: base_orders
        includes:
          - status
          - created_date
          - total_amount
          - total_amount_shipped
          - count
          - average_order_value
 
      - join_path: base_orders.line_items.products
        includes:
          - name: name
            alias: product
 
      - join_path: base_orders.users
        prefix: true
        includes: "*"
        excludes:
          - company

Views do not define any pre-aggregations. Instead, they reuse pre-aggregations from underlying cubes.

View can be defined statically and you can also build dynamic data models.

See the reference documentaton for the full list of view parameters.

Folders

Optionally, members of a view can be organized into folders. Each folder would contain a subset of members of the view.

Cube supports both flat and nested folder structures, which can be used with various visualization tools. If a specific tool does not support nested folders, they will be exposed to such a tool as an equivalent flat structure. Check APIs & Integrations for details on the nested folders support.

Dimensions

Dimensions represent the properties of a single data point in the cube. The orders table contains only dimensions, so representing them in the orders cube is straightforward:

YAML

JavaScript

cubes:
  - name: orders
    # ...
 
    dimensions:
      - name: id
        sql: id
        type: number
        # Here we explicitly let Cube know this field is the primary key
        # This is required for de-duplicating results when using joins
        primary_key: true
 
      - name: status
        sql: status
        type: string

Note that the id dimension is defined as a primary key. It is also possible to have more than one primary key dimension in a cube if you'd like them all to be parts of a composite key.

The line_items table also has a couple of dimensions which can be represented as follows:

YAML

JavaScript

cubes:
  - name: line_items
    # ...
 
    dimensions:
      - name: id
        sql: id
        type: number
        # Again, we explicitly let Cube know this field is the primary key
        # This is required for de-duplicating results when using joins
        primary_key: true
 
      - name: order_id
        sql: order_id
        type: number

If needed, dimensions can be organized into hierarchies. Also, proxy dimensions are helpful for code reusability and subquery dimensions can be used to join cubes implicitly.

See the reference documentaton for the full list of dimension parameters.

Dimension types

Dimensions can be of different types, e.g., string, number, or time. Often, data types in SQL are mapped to dimension types in the following way:

Data type in SQL	Dimension type in Cube
`timestamp`, `date`, `time`	`time`
`text`, `varchar`	`string`
`integer`, `bigint`, `decimal`	`number`
`boolean`	`boolean`

See the dimension type reference for details.

Time dimensions

Time-based properties are modeled using dimensions of the time type. They allow grouping the result set by a unit of time (e.g., days, weeks, month, etc.), also known as the time dimension granularity.

The following granularities are available by default for any time dimension: year, quarter, month, week (starting on Monday), day, hour, minute, second. You can also define custom granularities and optionally expose them via proxy dimensions in case you need to use weeks starting on Sunday, fiscal years, etc.

See the following recipes:

For a custom granularity example.
For a custom calendar example.

Here's how we can add time dimensions to the data model:

YAML

JavaScript

cubes:
  - name: orders
    # ...
 
    dimensions:
      - name: created_at
        sql: created_at
        type: time
        # You can use this time dimension with all default granularities:
        # year, quarter, month, week, day, hour, minute, second
 
      - name: completed_at
        sql: completed_at
        type: time
        # You can use this time dimension with all default granularities
        # and an additional custom granularity defined below
        granularities:
          - name: fiscal_year_starting_on_february_01
            interval: 1 year
            offset: 1 month

Time dimensions are essential to enabling performance boosts such as partitioned pre-aggregations and incremental refreshes.

Measures

Measures represent the properties of a set of data points in the cube. To add a measure called count to our orders cube, for example, we can do the following:

YAML

JavaScript

cubes:
  - name: orders
    # ...
 
    measures:
      - name: count
        type: count

In our LineItems cube, we can also create a measure to sum up the total value of line items sold:

YAML

JavaScript

cubes:
  - name: line_items
    # ...
 
    measures:
      - name: total
        sql: price
        type: sum

Calculated measures and subquery dimensions can be used for measure composition. Multi-stage calculations enable data modeling of more sophisticated measures.

See the reference documentaton for the full list of measure parameters.

Measure types

Measures can be of different types, e.g., count, sum, or number. Often, aggregate functions in SQL are mapped to measure types in the following way:

Aggregate function in SQL	Measure type in Cube
`AVG`	`avg`
`BOOL_AND`, `BOOL_OR`	`boolean`
`COUNT`	`count`
`COUNT(DISTINCT …)`	`count_distinct`
`APPROX_COUNT_DISTINCT`	`count_distinct_approx`
`MAX`	`max`
`MIN`	`min`
`PERCENTILE_CONT`, `MEDIAN`	`number`
`STRING_AGG`, `LISTAGG`	`string`
`SUM`	`sum`
Any function returning a timestamp, e.g., `MAX(time)`	`time`

See the measure type reference for details.

See the following recipes:

To learn how to define average and percentile measures,
To learn how to calculate period-over-period changes.

Measure additivity

Additivity is a property of measures that detemines whether measure values, once calculated for a set of dimensions, can be further aggregated to calculate measure values for a subset of these dimensions.

Measure additivity has an impact on pre-aggregation matching.

Additivity of a measure depends on its type. Only measures with the following types are considered additive: count, count_distinct_approx, min, max, sum. Measures with all other types are considered non-additive.

Example

Consider the following cube:

YAML

JavaScript

cubes:
  - name: employees
    sql: >
      SELECT 1 AS id, 'Ali' AS first_name, 20 AS age, 'Los Gatos' AS city UNION ALL
      SELECT 2 AS id, 'Bob' AS first_name, 30 AS age, 'San Diego' AS city UNION ALL
      SELECT 3 AS id, 'Eve' AS first_name, 40 AS age, 'San Diego' AS city
 
    measures:
      - name: count
        type: count
 
      - name: avg_age
        sql: age
        type: avg
 
    dimensions:
      - name: city
        sql: city
        type: string

If we run a query that includes city as a dimension and count and avg_age as measures, we'll get the following results:

city	count	avg_age
Los Gatos	1	20
San Diego	2	35

Then, if we remove the city dimension from the query, we'll get the following results:

count	avg_age
3	30

As you can see, the value of the count measure that we've got for the second query could have been calculated based on the results of the first one: 1 + 2 = 3. It explains why the count measure, having the count type, is considered additive.

However, the value of the avg_age measure that we've got for the second query can't be calculated based on the results of the first one: there's no way to derive 30 from 20 and 35. This is why the avg_age measure, having the avg type, is considered non-additive.

Leaf measures

Measures that do not reference other measures are considered leaf measures.

By definition, all measures that only reference SQL columns and expressions are leaf measures. On the other hand, calculated measures might not necessarily be leaf measures because they can reference other measures.

Whether a query contains only additive leaf measures has an impact on pre-aggregation matching.

Joins

Joins define the relationships between cubes, which then allows accessing and comparing properties from two or more cubes at the same time. In Cube, all joins are LEFT JOINs.

An INNER JOIN can be replicated with Cube; when making a Cube query, add a filter for IS NOT NULL on the required column.

In the following example, we are left-joining the line_items cube onto our orders cube:

YAML

JavaScript

cubes:
  - name: orders
    # ...
 
    joins:
      - name: line_items
        # Here we use the `CUBE` global to refer to the current cube,
        # so the following is equivalent to `orders.id = line_items.order_id`
        sql: "{CUBE}.id = {line_items.order_id}"
        relationship: many_to_one

There are three types of join relationships (one_to_one, one_to_many, and many_to_one) and a few other concepts such as the direction of joins and transitive joins pitfalls.

See the reference documentaton for the full list of join parameters.

Segments

Segments are pre-defined filters that are kept within the data model instead of a Cube query. They help to simplify queries and make it easy to reuse common filters across a variety of queries.

To add a segment which limits results to completed orders, we can do the following:

YAML

JavaScript

cubes:
  - name: orders
    # ...
 
    segments:
      - name: only_completed
        sql: "{CUBE}.status = 'completed'"

See the reference documentaton for the full list of segment parameters.

Pre-aggregations

Pre-aggregations provide a powerful way to accelerate frequently used queries and keep the cache up-to-date. Within a data model, they are defined using the pre_aggregations property:

YAML

JavaScript

cubes:
  - name: orders
    # ...
 
    pre_aggregations:
      - name: main
        measures:
          - count
        dimensions:
          - status
        time_dimension: created_at
        granularity: day

A more thorough introduction can be found in Getting Started with Pre-Aggregations.

See the reference documentaton for the full list of pre-aggregation parameters.

Overview Calculated members