A fundamental premise of semantic layers is that they centralize metric definitions and deliver consistently calculated metrics to data consumers. In theory, without a semantic layer, one might be resourceful and prudent enough to keep metric definitions in sync across several data tools over an indefinite period of time. In practice, entropy will likely prevail.
And what would prevent disorder inside a semantic layer, in its data model? At Cube, we believe that the right tooling is the solution. Data engineers should have tools to observe, understand, and reason about the data model, regardless of the team size, historical context, or other circumstances.
But what is the data model? One can describe it as a set of entities supported by a semantic layer, like cubes, views, measures, dimensions, joins, etc. However, I’m sure that when you hear “data model,” you see source code, the engineering lingua franca, whether it’s YAML, LookML, or a more niche syntax. Representing the data model as code and thinking in code is so universal that many existing tools are centered around code. Cube Cloud is no exception here, with its data model editor, version control support, and the development mode with code branches.
But is code the only, and the best, representation?
Humans vs. machines
Regardless of how good we are at reading and parsing code, it fundamentally is a machine-readable way to represent data models designed to please machines rather than humans. (If you happen to disagree, please enjoy the “YAML document from hell.”)
However, source code is not the only machine-readable format. For instance, as part of its set of APIs, Cube provides the meta
REST API endpoint that exposes meta-information about the data model. It renders data model entities in JSON and injects relevant insights, e.g., by grouping cubes into connected components according to their join relationships.
With the proliferation of semantic layers, we believe there’s a massive potential for vendor collaboration here that may result in an open standard for the semantic layer meta-information, enabling integrations with other data tools and providing data engineers with next-level tooling.
What about human-readable formats? Of course, there are data catalogs and data reliability tools that focus on logical entities within data models and expose their hierarchy and dependency. Integration and interoperability with these tools is the right path forward for Cube.
Cube Cloud should also provide a human-friendly, visual way to view the data model and reason about relationships between its entities. So, today, we’re happy to show you Data Graph.
Introducing Data Graph
Data Graph represents the data model as an entity relationship diagram and uses the crow’s-foot notation to mark join relationships between cubes. Data Graph is the best way to familiarize oneself with the contents of a semantic layer, take a bird’s-eye view of the data model, and find clusters of interconnected cubes.
You can drag and drop cubes, highlight join relationships by hovering over them, and click to navigate to the source code. For clarity, only dimensions involved in joins are displayed. If cubes are joined by an arbitrary SQL
expression, it is highlighted as such.
We’re releasing Data Graph in preview; currently, it only shows cubes and join relationships between them. We’d like to hear your feedback and look forward to introducing support for visualizing views down the road.
Data Graph is available in Cube Cloud on all tiers, and you can try it today. Join our Slack community at slack.cube.dev to share your thoughts and opinions about Data Graph.