Multitenancy

Cube supports multitenancy out of the box, both on database and data model levels. Multiple drivers are also supported, meaning that you can have one customer’s data in MongoDB and others in Postgres with one Cube instance.

There are several configuration options you can leverage for your multitenancy setup. You can use all of them or just a couple, depending on your specific case. The options are:

context_to_app_id
schema_version
repository_factory
driver_factory
context_to_orchestrator_id
pre_aggregations_schema
query_rewrite
scheduled_refresh_contexts
scheduled_refresh_time_zones

All of the above options are functions, which you provide in the configuration file. The functions accept one argument: a context object, which has a securityContext property where you can provide all the necessary data to identify a user e.g., organization, app, etc. By default, the securityContext is defined by Cube API Token.

There are several multitenancy setup scenarios that can be achieved by using combinations of these configuration options.

See the following recipes:

If you'd like to provide a custom data source for each tenant.
If you'd like to provide a custom data model for each tenant.

Multitenancy vs Multiple Data Sources

In cases where your Cube data model is spread across multiple different data sources, consider using the data_source cube property instead of multitenancy. Multitenancy is designed for cases where you need to serve different datasets for multiple users, or tenants which aren't related to each other.

On the other hand, multitenancy can be used for scenarios where users need to access the same data but from different databases. The multitenancy and multiple data sources features aren't mutually exclusive and can be used together.

A default data source must exist and be configured. It is used to resolve target query data source for now. This behavior will be changed in future releases.

A simple configuration with two data sources might look like:

cube.js:

module.exports = {
  driverFactory: ({ dataSource } = {}) => {
    if (dataSource === "db1") {
      return {
        type: "postgres",
        database: process.env.DB1_NAME,
        host: process.env.DB1_HOST,
        user: process.env.DB1_USER,
        password: process.env.DB1_PASS,
        port: process.env.DB1_PORT,
      };
    } else {
      return {
        type: "postgres",
        database: process.env.DB2_NAME,
        host: process.env.DB2_HOST,
        user: process.env.DB2_USER,
        password: process.env.DB2_PASS,
        port: process.env.DB2_PORT,
      };
    }
  },
};

A more advanced example that uses multiple data sources could look like:

cube.js:

module.exports = {
  driverFactory: ({ dataSource } = {}) => {
    if (dataSource === "web") {
      return {
        type: "athena",
        database: dataSource,
 
        // ...
      };
    } else if (dataSource === "googleAnalytics") {
      return {
        type: "bigquery",
 
        // ...
      };
    } else if (dataSource === "financials") {
      return {
        type: "postgres",
        database: "financials",
        host: "financials-db.acme.com",
        user: process.env.FINANCIALS_DB_USER,
        password: process.env.FINANCIALS_DB_PASS,
      };
    } else {
      return {
        type: "postgres",
 
        // ...
      };
    }
  },
};

More information can be found on the Multiple Data Sources page.

queryRewrite vs Multitenant Compile Context

As a rule of thumb, the queryRewrite should be used in scenarios when you want to define row-level security within the same database for different users of such database. For example, to separate access of two e-commerce administrators who work on different product categories within the same e-commerce store, you could configure your project as follows.

Use the following cube.js configuration file:

module.exports = {
  queryRewrite: (query, { securityContext }) => {
    if (securityContext.categoryId) {
      query.filters.push({
        member: "products.category_id",
        operator: "equals",
        values: [securityContext.categoryId],
      });
    }
    return query;
  },
};

Also, you can use a data model like this:

YAML

JavaScript

cubes:
  - name: products
    sql_table: products

On the other hand, multi-tenant COMPILE_CONTEXT should be used when users need access to different databases. For example, if you provide SaaS ecommerce hosting and each of your customers have a separate database, then each e-commerce store should be modeled as a separate tenant.

YAML

JavaScript

cubes:
  - name: products
    sql_table: "{COMPILE_CONTEXT.security_context.userId}.products"

Running in Production

Each unique id generated by contextToAppId or contextToOrchestratorId will generate a dedicated set of resources, including data model compile cache, SQL compile cache, query queues, in-memory result caching, etc. Depending on your data model complexity and usage patterns, those resources can have a pretty sizable memory footprint ranging from single-digit MBs on the lower end and dozens of MBs on the higher end. So you should make sure Node VM has enough memory reserved for that.

There're multiple strategies in terms of memory resource utilization here. The first one is to bucket your actual tenants into variable-size buckets with assigned contextToAppId or contextToOrchestratorId by some bucketing rule. For example, you can bucket your biggest tenants in separate buckets and all the smaller ones into a single bucket. This way, you'll end up with a very small count of buckets that will easily fit a single node.

Another strategy is to split all your tenants between different Cube nodes and route traffic between them so that each Cube API node serves only its own set of tenants and never serves traffic for another node. In that case, memory usage is limited by the number of tenants served by each node. Cube Cloud utilizes precisely this approach for scaling. Please note that in this case, you should also split refresh workers and assign appropriate scheduledRefreshContexts to them.

Same DB Instance with per Tenant Row Level Security

Per tenant row-level security can be achieved by configuring queryRewrite, which adds a tenant identifier filter to the original query. It uses the securityContext to determine which tenant is requesting data. This way, every tenant starts to see their own data. However, resources such as query queue and pre-aggregations are shared between all tenants.

cube.js:

module.exports = {
  queryRewrite: (query, { securityContext }) => {
    const user = securityContext;
    if (user.id) {
      query.filters.push({
        member: "users.id",
        operator: "equals",
        values: [user.id],
      });
    }
    return query;
  },
};

Multiple DB Instances with Same Data Model

Let's consider an example where we store data for different users in different databases, but on the same Postgres host. The database name format is my_app_<APP_ID>_<USER_ID>, so my_app_1_2 is a valid database name.

To make it work with Cube, first we need to pass the appId and userId as context to every query. We should first ensure our JWTs contain those properties so we can access them through the security context.

const jwt = require("jsonwebtoken");
const CUBE_API_SECRET = "secret";
 
const cubeToken = jwt.sign({ appId: "1", userId: "2" }, CUBE_API_SECRET, {
  expiresIn: "30d",
});

Now, we can access them through the securityContext property inside the context object. Let's use contextToAppId and contextToOrchestratorId to create a dynamic Cube App ID and Orchestrator ID for every combination of appId and userId, as well as defining driverFactory to dynamically select the database, based on the appId and userId:

The App ID (the result of contextToAppId) is used as a caching key for various in-memory structures like data model compilation results, connection pool. The Orchestrator ID (the result of contextToOrchestratorId) is used as a caching key for database connections, execution queues and pre-aggregation table caches. Not declaring these properties will result in unexpected caching issues such as the data model or data of one tenant being used for another.

cube.js:

module.exports = {
  contextToAppId: ({ securityContext }) =>
    `CUBE_APP_${securityContext.appId}_${securityContext.userId}`,
  contextToOrchestratorId: ({ securityContext }) =>
    `CUBE_APP_${securityContext.appId}_${securityContext.userId}`,
  driverFactory: ({ securityContext }) => ({
    type: "postgres",
    database: `my_app_${securityContext.appId}_${securityContext.userId}`,
  }),
};

Same DB Instance with per Tenant Pre-Aggregations

To support per-tenant pre-aggregation of data within the same database instance, you should configure the preAggregationsSchema option in your cube.js configuration file. You should use also securityContext to determine which tenant is requesting data.

cube.js:

module.exports = {
  contextToAppId: ({ securityContext }) =>
    `CUBE_APP_${securityContext.userId}`,
  preAggregationsSchema: ({ securityContext }) =>
    `pre_aggregations_${securityContext.userId}`,
};

Multiple Data Models and Drivers

What if for application with ID 3, the data is stored not in Postgres, but in MongoDB?

We can instruct Cube to connect to MongoDB in that case, instead of Postgres. To do this, we'll use the driverFactory option to dynamically set database type. We will also need to modify our securityContext to determine which tenant is requesting data. Finally, we want to have separate data models for every application. We can use the repositoryFactory option to dynamically set a repository with data model files depending on the appId:

cube.js:

const { FileRepository } = require("@cubejs-backend/server-core");
 
module.exports = {
  contextToAppId: ({ securityContext }) =>
    `CUBE_APP_${securityContext.appId}_${securityContext.userId}`,
  contextToOrchestratorId: ({ securityContext }) =>
    `CUBE_APP_${securityContext.appId}_${securityContext.userId}`,
  driverFactory: ({ securityContext }) => {
    if (securityContext.appId === 3) {
      return {
        type: "mongobi",
        database: `my_app_${securityContext.appId}_${securityContext.userId}`,
        port: 3307,
      };
    } else {
      return {
        type: "postgres",
        database: `my_app_${securityContext.appId}_${securityContext.userId}`,
      };
    }
  },
  repositoryFactory: ({ securityContext }) =>
    new FileRepository(`model/${securityContext.appId}`),
};

Scheduled Refreshes for Pre-Aggregations

If you need scheduled refreshes for your pre-aggregations in a multi-tenant deployment, ensure you have configured scheduled_refresh_contexts correctly. You may also need to configure scheduled_refresh_time_zones.

Leaving scheduled_refresh_contexts unconfigured will lead to issues where the security context will be undefined. This is because there is no way for Cube to know how to generate a context without the required input.

Concurrency Environment variables