Syntax

Entities within the data model (e.g., cubes, views, etc.) should be placed under the model folder, follow naming conventions, and be defined using a supported syntax.

Folder structure

Data model files should be placed inside the model folder. You can use the schema_path configuration option to override the folder name or the repository_factory configuration option to dynamically define the folder name and data model file contents.

It’s recommended to place each cube or view in a separate file, in model/cubes and model/views folders, respectively. Example:


model
├── cubes
│   ├── orders.yml
│   ├── products.yml
│   └── users.yml
└── views
    └── revenue.yml

Model syntax

Cube supports two ways to define data model files: with YAML or JavaScript syntax. YAML data model files should have the .yml extension, whereas JavaScript data model files should end with .js. You can mix YAML and JavaScript files within a single data model.


cube(`orders`, {
  sql: `
    SELECT *
    FROM orders, line_items
    WHERE orders.id = line_items.order_id
  `
})


cubes:
  - name: orders
    sql: |
      SELECT *
      FROM orders, line_items
      WHERE orders.id = line_items.order_id

You can define the data model statically or build dynamic data models programmatically. YAML data models use Jinja and Python whereas JavaScript data models use JavaScript.

It is recommended to default to YAML syntax because of its simplicity and readability. However, JavaScript might provide more flexibility for dynamic data modeling.

See Cube style guide for more recommendations on syntax and structure.

Naming

Common rules apply to names of entities within the data model. All names must:

Start with a letter.
Consist of letters, numbers, and underscore (_) symbols only.
Not be a reserved keyword in Python , e.g., from, return, or yield.
When using the DAX API, not clash with the names of columns in date hierarchies.

It is also recommended that names use snake case .

Good examples of names:

orders, stripe_invoices, or base_payments (cubes)
opportunities, cloud_accounts, or arr (views)
count, avg_price, or total_amount_shipped (measures)
name, is_shipped, or created_at (dimensions)
main, orders_by_status, or lambda_invoices (pre-aggregations)

SQL expressions

When defining cubes, you would often provide SQL snippets in sql and sql_table parameters.

Provided SQL expressions should match your database SQL dialect, e.g., to aggregate a list of strings, you would probably pick the LISTAGG function in Snowflake and the STRING_AGG function in BigQuery.


cubes:
  - name: orders
    sql_table: orders
 
    measures:
      - name: statuses
        sql: "STRING_AGG(status)"
        type: string
 
    dimensions:
      - name: status
        sql: "UPPER(status)"
        type: string


cube(`orders`, {
  sql_table: `orders`,
 
  measures: {
    statuses: {
      sql: `STRING_AGG(status)`,
      type: `string`
    }
  },
 
  dimensions: {
    status: {
      sql: `UPPER(status)`,
      type: `string`
    }
  }
})

Currently, Cube does not wrap your SQL snippets in parentheses during SQL generation. In case of non-trivial snippets, this may lead to unexpected results. Please track this issue .

User-defined functions

If you have created a user-defined function (UDF) in your data source, you can use it in the sql parameter as well.

Case sensitivity

If your database uses case-sensitive identifiers, make sure to properly quote table and column names. For example, here’s how you can reference a Postgres table that contains uppercase letters in its name:


cubes:
  - name: orders
    sql_table: 'public."Orders"'


cube(`orders`, {
  sql_table: `public."Orders"`
})

References

To write versatile data models, it is important to be able to reference members of cubes and views, such as measures or dimensions, as well as table columns. Cube supports the following syntax for references.

`column`

Most commonly, you would use bare column names in the sql parameter of measures or dimensions. In the following example, name references the respective column of the users table.


cubes:
  - name: users
    sql_table: users
 
    dimensions:
      - name: name
        sql: name
        type: string


cube(`users`, {
  sql_table: `users`,
 
  dimensions: {
    name: {
      sql: `name`,
      type: `string`
    }
  }
})

This syntax works great for simple use cases. However, if your cubes have joins and joined cubes have columns with the same name, the generated SQL query might become ambiguous. See below how to work around that.

`{member}`

When defining measures and dimensions, you can also reference other members of the same cube by wrapping their names in curly braces. In the following example, the full_name dimension references name and surname dimensions of the same cube.


cubes:
  - name: users
    sql_table: users
 
    dimensions:
      - name: name
        sql: name
        type: string
 
      - name: surname
        sql: "UPPER(surname)"
        type: string
 
      - name: full_name
        sql: "CONCAT({name}, ' ', {surname})"
        type: string


cube(`users`, {
  sql_table: `users`,
 
  dimensions: {
    name: {
      sql: `name`,
      type: `string`
    },
 
    surname: {
      sql: `UPPER(surname)`,
      type: `string`
    },
 
    full_name: {
      sql: `CONCAT(${name}, ' ', ${surname})`,
      type: `string`
    }
  }
})

This syntax works great for simple use cases. However, there are cases (like subquery) when you’d like to reference members of other cubes. See below how to do that.

`{time_dimension.granularity}`

When referencing a time dimension, you can specificy a granularity to refer to a time value with that specific granularity. It can be one of the default granularities (e.g., year or week) or a custom granularity:


cubes:
  - name: users
    sql_table: users
 
    dimensions:
      - name: created_at
        sql: created_at
        type: time
 
        granularities:
          - name: sunday_week
            interval: 1 week
            offset: -1 day
 
      - name: created_at__year
        sql: "{created_at.year}"
        type: time
 
      - name: created_at__sunday_week
        sql: "{created_at.sunday_week}"
        type: time


cube(`users`, {
  sql_table: `users`,
 
  dimensions: {
    created_at: {
      sql: `created_at`,
      type: `time`,
 
      granularities: {
        sunday_week: {
          interval: `1 week`,
          offset: `-1 day`
        }
      }
    },
 
    created_at__year: {
      sql: `${created_at.year}`,
      type: `time`
    },
 
    created_at__sunday_week: {
      sql: `${created_at.sunday_week}`,
      type: `time`
    }
  }
})

`{cube}.column`, `{cube.member}`

You can qualify column and member names with the name of a cube to remove the ambiguity when cubes are joined and reference members of other cubes.


cubes:
  - name: users
    sql_table: users
 
    joins:
      - name: contacts
        sql: "{users}.contact_id = {contacts.id}"
        relationship: one_to_one
 
    dimensions:
      - name: id
        sql: "{users}.id"
        type: number
        primary_key: true
 
      - name: name
        sql: "COALESCE({users.name}, {contacts.name})"
        type: string
 
  - name: contacts
    sql_table: contacts
 
    dimensions:
      - name: id
        sql: "{contacts}.id"
        type: number
        primary_key: true
 
      - name: name
        sql: "{contacts}.name"
        type: string


cube(`users`, {
  sql_table: `users`,
 
  joins: {
    contacts: {
      sql: `${users}.contact_id = ${contacts.id}`,
      relationship: `one_to_one`
    }
  },
 
  dimensions: {
    id: {
      sql: `${users}.id`,
      type: `number`,
      primary_key: true
    },
 
    name: {
      sql: `COALESCE(${users}.name, ${contacts.name})`,
      type: `string`
    }
  }
})
 
cube(`contacts`, {
  sql_table: `contacts`,
 
  dimensions: {
    id: {
      sql: `${contacts}.id`,
      type: `number`,
      primary_key: true
    },
 
    name: {
      sql: `${contacts}.name`,
      type: `string`
    }
  }
})

In production, using fully-qualified names is generally encouraged since it removes the ambiguity and keeps data model code maintainable as it grows. However, always referring to the current cube by its name leads to code repetition and violates the DRY principle. See below how to solve that.

`{cube1.cube2.member}`

You can also qualify member names with more than one cube name, separated by a dot, to provide a join path and remove the ambiguity of join resolution. Join paths can be used in calculated members, views, and pre-aggregation definitions.

In the following example, four cubes are joined together, forming a diamond subgraph in the join tree: a joins to b and c, and both b and c join to d. As you can see, join paths in the d_via_b and d_via_c dimensions of a are then used to specify which intermediate cube to use when resolving the dimension from d.


cubes:
  - name: a
    sql: |
      SELECT 1 AS id UNION ALL
      SELECT 2 AS id UNION ALL
      SELECT 3 AS id
 
    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true
 
      - name: d_via_b
        sql: "{b.d.id}"
        type: number
 
      - name: d_via_c
        sql: "{c.d.id}"
        type: number
 
    joins:
      - name: b
        sql: "{a.id} = {b.id}"
        relationship: one_to_one
 
      - name: c
        sql: "{a.id} = {c.id}"
        relationship: one_to_one
 
  - name: b
    sql: |
      SELECT 1 AS id UNION ALL
      SELECT 2 AS id UNION ALL
      SELECT 3 AS id
 
    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true
 
    joins:
      - name: d
        sql: "{b.id} = {d.id}"
        relationship: one_to_one
 
  - name: c
    sql: |
      SELECT 1 AS id UNION ALL
      SELECT 2 AS id UNION ALL
      SELECT 3 AS id
 
    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true
 
    joins:
      - name: d
        sql: "{c.id} = {d.id}"
        relationship: one_to_one
 
  - name: d
    sql: |
      SELECT 1 AS id UNION ALL
      SELECT 2 AS id UNION ALL
      SELECT 3 AS id
 
    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true


cube(`a`, {
  sql: `
    SELECT 1 AS id UNION ALL
    SELECT 2 AS id UNION ALL
    SELECT 3 AS id
  `,
 
  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true
    },
 
    d_via_b: {
      sql: `${b.d.id}`,
      type: `number`
    },
 
    d_via_c: {
      sql: `${c.d.id}`,
      type: `number`
    }
  },
 
  joins: {
    b: {
      sql: `${a.id} = ${b.id}`,
      relationship: `one_to_one`
    },
 
    c: {
      sql: `${a.id} = ${c.id}`,
      relationship: `one_to_one`
    }
  }
})
 
cube(`b`, {
  sql: `
    SELECT 1 AS id UNION ALL
    SELECT 2 AS id UNION ALL
    SELECT 3 AS id
  `,
 
  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true
    }
  },
 
  joins: {
    d: {
      sql: `${b.id} = ${d.id}`,
      relationship: `one_to_one`
    }
  }
})
 
cube(`c`, {
  sql: `
    SELECT 1 AS id UNION ALL
    SELECT 2 AS id UNION ALL
    SELECT 3 AS id
  `,
 
  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true
    }
  },
 
  joins: {
    d: {
      sql: `${c.id} = ${d.id}`,
      relationship: `one_to_one`
    }
  }
})
 
cube(`d`, {
  sql: `
    SELECT 1 AS id UNION ALL
    SELECT 2 AS id UNION ALL
    SELECT 3 AS id
  `,
 
  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true
    }
  }
})

`{CUBE}` variable

You can use a handy {CUBE} context variable (mind the uppercase) to reference the current cube so you don’t have to repeat the its name over and over. It works both for column and member references.


cube(`users`, {
  sql_table: `users`,
 
  joins: {
    contacts: {
      sql: `${CUBE}.contact_id = ${contacts.id}`,
      relationship: `one_to_one`
    }
  },
 
  dimensions: {
    id: {
      sql: `${CUBE}.id`,
      type: `number`,
      primary_key: true
    },
 
    name: {
      sql: `COALESCE(${CUBE}.name, ${contacts.name})`,
      type: `string`
    }
  }
})
 
cube(`contacts`, {
  sql_table: `contacts`,
 
  dimensions: {
    id: {
      sql: `${CUBE}.id`,
      type: `number`,
      primary_key: true
    },
 
    name: {
      sql: `${CUBE}.name`,
      type: `string`
    }
  }
})


cubes:
  - name: users
    sql_table: users
 
    joins:
      - name: contacts
        sql: "{CUBE}.contact_id = {contacts.id}"
        relationship: one_to_one
 
    dimensions:
      - name: id
        sql: "{CUBE}.id"
        type: number
        primary_key: true
 
      - name: name
        sql: "COALESCE({CUBE.name}, {contacts.name})"
        type: string
 
  - name: contacts
    sql_table: contacts
 
    dimensions:
      - name: id
        sql: "{CUBE}.id"
        type: number
        primary_key: true
 
      - name: name
        sql: "{CUBE}.name"
        type: string

Check the {users.name} dimension. Referencing another cube in the dimension definition instructs Cube to make an implicit join to that cube. For example, using the data model above, we can make the following query:


{
  "dimensions": ["users.name"]
}

The resulting generated SQL query would look like this:


SELECT COALESCE("users".name, "contacts".name) "users__name"
FROM users "users"
LEFT JOIN contacts "contacts"
  ON "users".contact_id = "contacts".id

`{cube.sql()}` function

When defining a cube, you can reference the sql parameter of another cube, effectively reusing the SQL query it’s defined on. This is particularly useful when defining polymorphic cubes or using data blending.

Consider the following data model:


cube(`organisms`, {
  sql_table: `organisms`
})
 
cube(`animals`, {
  sql: `
    SELECT *
    FROM ${organisms.sql()}
    WHERE kingdom = 'animals'
  `
})
 
cube(`dogs`, {
  sql: `
    SELECT *
    FROM ${animals.sql()}
    WHERE species = 'dogs'
  `,
 
  measures: {
    count: {
      type: `count`
    }
  }
})


cubes:
  - name: organisms
    sql_table: organisms
 
  - name: animals
    sql: |
      SELECT *
      FROM {organisms.sql()}
      WHERE kingdom = 'animals'
 
  - name: dogs
    sql: |
      SELECT *
      FROM {animals.sql()}
      WHERE species = 'dogs'
 
    measures:
      - name: count
        type: count

If you query for dogs.count, Cube will generate the following SQL:


SELECT count(*) "dogs__count"
FROM (
  SELECT *
  FROM (
    SELECT *
    FROM organisms
    WHERE kingdom = 'animals'
  )
  WHERE species = 'dogs'
) AS "dogs"

Curly braces and escaping

As you can see in the examples above, within SQL expressions, curly braces are used to reference cubes and members.

In YAML data models, use {reference}:


cubes:
  - name: orders
    sql: |
      SELECT id, created_at
      FROM {other_cube.sql()}
 
    dimensions:
      - name: status
        sql: status
        type: string
 
      - name: status_x2
        sql: "{status} || ' ' || {status}"
        type: string

In JavaScript data models, use ${reference} in JavaScript template literals (mind the dollar sign):


cube(`orders`, {
  sql: `
    SELECT id, created_at
    FROM ${other_cube.sql()}
  `,
 
  dimensions: {
    status: {
      sql: `status`,
      type: `string`
    },
 
    status_x2: {
      sql: `${status} || ' ' || ${status}`,
      type: `string`
    }
  }
})

If you need to use literal, non-referential curly braces in YAML, e.g., to define a JSON object, you can escape them with a backslash:


cubes:
  - name: json_object_in_postgres
    sql: SELECT CAST('\{"key":"value"\}'::JSON AS TEXT) AS json_column
 
  - name: csv_from_s3_in_duckdb
    sql: |
      SELECT *
      FROM read_csv(
        's3://bbb/aaa.csv',
        delim = ',',
        header = true,
        columns=\{'time':'DATE','count':'NUMERIC'\}
      )

Non-SQL references

Outside SQL expressions, column is not recognized as a column name; it is rather recognized as a member name. It means that, outside sql and sql_table parameters, you can skip the curly braces and reference members by their names directly: member, cube_name.member, or CUBE.member.


cubes:
  - name: orders
    sql_table: orders
 
    dimensions:
      - name: status
        sql: status
        type: string
 
    measures:
      - name: count
        type: count
 
    pre_aggregations:
      - name: orders_by_status
        dimensions:
          - CUBE.status
        measures:
          - CUBE.count


cube(`orders`, {
  sql_table: `orders`,
 
  dimensions: {
    status: {
      sql: `status`,
      type: `string`
    }
  },
 
  measures: {
    count: {
      type: `count`
    }
  },
 
  pre_aggregations: {
    orders_by_status: {
      dimensions: [CUBE.status],
      measures: [CUBE.count]
    }
  }
})

Context variables

In addition to the CUBE variable, you can also use a few more context variables within your data model. They are generally useful for two purposes: optimizing generated SQL queries and defining dynamic data models.

Troubleshooting

`Can't parse timestamp`

Sometimes, you might come across the following error message: Can't parse timestamp: 2023-11-07T14:33:23.16.000.

It indicates that the data source was unable to recognize the value of a time dimension as a timestamp. Please check that the SQL expression of this time dimension evaluates to a TIMESTAMP type.

Check this recipe to see how you can work around string values in time dimensions.

Was this page useful?