Providing a custom data model for each tenant
Use case
We have multiple users and we would like them to have different data models. These data models can be completely different or have something in common.
Configuration
Let's assume that we have two users: Alice
and Bob
. We'll refer to them as
tenants. We're going to provide custom data models for these tenants by
implementing multitenancy.
Multitenancy
First of all, we need to define the following configuration options so that Cube knows how to distinguish between your tenants:
context_to_app_id
to derive tenant identifiers from security contexts.scheduled_refresh_contexts
to provide a list of security contexts.
Put the following code into your cube.py
or cube.js
configuration
file:
Data modeling
Customizing publicity
The simplest way to customize the data models is by changing the publicity of data model entities. It works great for use cases when tenants share parts of their data models.
By setting the public
parameter of cubes, views,
measures, dimensions, and
segments, you can ensure that each tenant has its unique
perspective of the whole data model.
With the following data model, Alice
will only have access to cube_a
,
Bob
will only have access to cube_b
, and they both will have access to
select members of cube_x
:
{% set tenant_id = COMPILE_CONTEXT['securityContext']['tenant_id'] %}
cubes:
- name: cube_a
sql_table: table_a
public: {{ tenant_id == 'Alice' }}
measures:
- name: count
type: count
- name: cube_b
sql_table: table_b
public: {{ tenant_id == 'Bob' }}
measures:
- name: count
type: count
- name: cube_x
sql_table: table_x
measures:
- name: count
type: count
- name: count_a
type: count
public: {{ tenant_id == 'Alice' }}
- name: count_b
type: count
public: {{ tenant_id == 'Bob' }}
For your convenience, Playground ignores publicity configration and marks data model entities that are not accessible for querying through APIs with the lock icon.
Here's what Alice
sees:
And here's the perspective of Bob
:
Customizing other parameters
Similarly to customizing publicity, you can set other parameters of data model entities for each tenant individually:
- By setting
sql
orsql_table
parameters of cubes, you can ensure that each tenant accesses data from its own tables or database schemas. - By setting the
data_source
parameter, you can point each tenant to its own data source, allowing to switch between database names or even database servers. - By setting the
extends
parameter, you can ensure that cubes of some tenants are enriched with custom measures, dimensions, or joins.
With the following data model, cube_x
will read data from the Alice
database
schema for Alice
and from Bob
database schema for Bob
:
{% set tenant_id = COMPILE_CONTEXT['securityContext']['tenant_id'] %}
cubes:
- name: cube_x
sql_table: {{ tenant_id | safe }}.table_x
measures:
- name: count
type: count
Here's the generated SQL for Alice
:
And here's the generated SQL for Bob
:
Dynamic data modeling
A more advanced way to customize the data models is by using dynamic data models. It allows to create fully customized data models for each tenant programmatically.
With the following data model, cube_x
will have the count_a
measure for
Alice
and the count_b
measure for Bob
:
{% set tenant_id = COMPILE_CONTEXT['securityContext']['tenant_id'] %}
cubes:
- name: cube_x
sql_table: table_x
measures:
- name: count
type: count
{% if tenant_id == 'Alice' %}
- name: count_a
sql: column_a
type: count
{% endif %}
{% if tenant_id == 'Bob' %}
- name: count_b
sql: column_b
type: count
{% endif %}
Here's the data model and the generated SQL for Alice
:
And here's the data model and the generated SQL for Bob
:
Loading from disk
You can also maintain independent data models for each tenant that you would load from separate locations on disk. It allows to create fully customized data models for each tenant that are maintained mostly as static files.
By using the repository_factory
option with the
file_repository
utility, you can load data model files for each tenant from
a custom path.
With the following configuration, Alice
will load the data model files from
model/Alice
while Bob
will load the data model files from model/Bob
:
Loading externally
Finally, you can maintain independent data models for each tenant that you would load from an external location rather from a folder on disk. Good examples of such locations are an S3 bucket, a database, or an external API. It allows to provide fully customized data models for each tenant that you have full control of.
It can be achieved by using the same repository_factory
option.
Instead of using the file_repository
utility, you would have to write your own
code that fetches data model files for each tenant.