By definition, the semantic layer serves as a trusted source of truth for the whole business, standardizing metrics definitions, governing data access, and providing optimal query performance. The latter is the responsibility of the caching layer in Cube and its purpose-built storage, Cube Store. And they equally contribute to protecting the data.
Cube and Cube Store protect data both in motion and at rest:
- Data is loaded from upstream data sources to the semantic layer via encrypted connections and after proper authentication.
- Similarly, data is served from the semantic layer to end users through APIs & integrations via encrypted connections and upon user authentication.
- While stored in the caching layer, data rollups are secured at multiple levels using industry-standard encryption.
Today, we're announcing an additional layer of data-at-rest protection for Cube Cloud users: Parquet encryption with customer-managed keys in Cube Store. Read on to see how it contributes to the security of your data.
Data-at-rest encryption in Cube
Unlike some of BI tools that extract and store raw data from upstream data sources, Cube is designed to not store any raw data. When serving a query, Cube will retrieve the requested dataset from the data source on-demand and send it over to the end user.
However, when pre-aggregations are configured to accelerate queries, Cube will build and store data rollups: transformed representations of data that don't include any raw facts but are good enough to fulfill certain subsets of queries. These rollups are managed by Cube Store, a purpose-built storage for Cube.
Cube Store keeps data rollups in its persistent storage and loads necessary parts into its scratch storage when serving queries:
- Persistent storage is essentially a distributed cloud object storage. In Cube Cloud, it can be either Amazon S3, or Google Cloud Storage, or Azure Blob Storage. All of them provide standard encryption out of the box.
- Scratch storage is the disk space of Cube Store workers. In Cube Cloud, they are connected to encrypted Amazon EBS volumes.
This ensures that data rollups are always encrypted at rest.
Parquet encryption in Cube Store
Data rollups are organized as files in the Parquet format (an industry-standard, open-source, column-oriented data file format designed for efficient data storage and retrieval). Parquet provides a modular encryption mechanism that encrypts and authenticates the file data and metadata while still allowing for columnar projection, predicate pushdown, encoding, and compression.
Now, Cube Cloud customers can add an optional layer of protection to their data rollups in the persistent storage. Similarly to customer-managed keys in Databricks or Snowflake, they can provide their own customer-managed keys (CMK) for Parquet encryption in Cube Store.
With Parquet encryption, rollup data is secured using the industry-standard AES cipher with 256-bit keys. Data encyption and decryption are completely seamless to Cube Store operations. Key management is performed via the UI in the settings of your Cube Cloud deployment.
Read more in the documentation: data-at-rest encryption, encryption keys management.
What's next?
Parquet encryption with customer-managed keys (CMK) in Cube Store is available to Cube Cloud customers on Enterprise and above plans.
If you'd like to opt in for this additional layer of data protection or discuss the implementation of your semantic layer, please reach out to us.