A Logical Approach to Securing Data

The primary role of a semantic layer is to present a dimensional view of a physical data model. A dimensional view is important because efficient consumption of data is at odds with efficient storage of data. Users consume data by domain, querying dimensions and metrics related to an invoice or user. Databases store data for a single domain across multiple tables to reduce the redundancy of updates and storage. Cube’s universal semantic layer bridges the gap between the logical model desired by consumers and the physical model imposed by the database.

Humans think about security from a domain perspective. If you have access to a system then it should follow that you have access to the minimum constituent pieces required to make the system function. Your car key provides access to the inside of your vehicle and grants you permission to start the engine. You don't need separate keys to the door, starter solenoid, and battery. Your bank card gives you access to your bank account. You don't need separate access to all of the infrastructure the payment system runs on.

If the goal of a semantic layer is to reduce the complexity of consuming data shouldn’t it also reduce the complexity of securing data?

Security Sprawl

Every new consumption point added to your data platform presents a new risk. How does it handle authentication and authorization, row-level security, column-masking, and caching? If you are using multiple BI tools without a universal semantic layer you likely have most of this security model built into your data warehouse. However, this approach has gaps:

Q: How do you manage security for applications that utilize REST or GraphQL to fetch data? Do you have a plan to constrain access to data accessed by AI?

A: One of the primary benefits of Cube Cloud’s data model is that it is built once and can be accessed by many different APIs. A security model built in Cube will apply to all downstream consumers.

Q: If your security is handled in the data warehouse then how do you secure data extracted by BI tools? Do you have a separate security model for users accessing SSAS cubes from Excel?

A: Cube’s pre-aggregations provide extract speeds and are governed by the same security model as the queries federated to your warehouse.

Cube’s universal semantic layer eliminates model chaos by centralizing your model. It can also eliminate duplicated security models that can cause leaks if they aren’t updated properly. Additionally, Cube’s code-first approach enables security models to be version controlled and approved via PR.

How Security Works in Cube

Cube can integrate with external authentication and authorization systems using our data access policies and or the check_sql_auth function.

Consumer passes a query plus username or username and password to Cube’s SQL API
Cube authenticates the user using an external service
Cube receives the user properties (such as role) required for authorization
Cube transpiles and rewrites the query (for RLS and masking) according to the user properties received in step three
Cube authenticates with the database according to the user properties received in step three and queries the database with the query rewritten in step four
The database returns the data to Cube
Cube returns the data to the consumer

Advantages of Building Security in Cube

User-Centric Controls

In Cube, Views are the endpoints presented to consumers. Views contain all of the semantics – metrics and conforming dimensions – relevant to a specific domain such as customer, invoice, or product. Views represent the final culmination of the logical model and often rely on a web of connections to underlying tables based on the physical model; the value of a semantic layer is to obfuscate the complex joins and business logic required to make sense of the data.

Consumers (users or applications) should have access to the views necessary to perform their role. Consumers don’t need access to individual tables in your database or even access to your database at all.

Each consumer role can be represented by a role, service principal, or service account that has access to the necessary physical objects in your database. The driver_factory function can be used to map a consumer to a specific set of authorizations in the database.

Column-based access, mandatory filters, and row-level security can also be configured.

Logging

Cube provides logs, metrics, and auditing to ensure monitoring and alerting for security events. Additionally, you can use the object representation of a query plus custom logging in cube.py or cube.js to track column-level access by individual users and more.

Governance and Adaptability

A semantic layer is a representation of the physical layer with semantics applied. Providing access at this level of abstraction prevents consumers (or bad actors) from accessing the physical layer where sensitive data is stored and denies any possibility of CRUD actions being performed on the source of truth.

Building security in the semantic layer enables access to data derived from sensitive data such as masked data or access at specified levels of granularity.

The code-first approach in Cube means all updates in security must undergo a PR approval before being deployed into your production environment.

Security for Extracted Data

Most BI tools have some extraction capabilities that analytics teams rely upon for speed and that data teams rely upon to reduce the load on their data warehouse: Tableau data extracts, PowerBI import mode, Metabase caching, etc. Once data is extracted from a database all security measures applied at the database level are useless.

Cube provides pre-aggregations as an improvement in speed, cost, and security of data extracted into individual BI systems.

When security is built into the semantic layer instead of the database, the Cube universal semantic layer can secure extracted data with the same rules that apply to data queried from your database.

Rethinking Security with Semantics

The Cube universal semantic layer provides security by separating access to semantics for consumers from the physical layer of data, protection for extracted data without compromising performance, and organizing security in a logical way that matches user expectations.

The semantic sprawl, the repetition and lack of governance of business logic between tools, that leads to model chaos is caused by a plurality of analytic access points to your database; Cube is the solution for this problem.

Cube can also help secure and monitor data access across all consumers.