It’s been a wild year for the semantic layer. Years from now when folks reflect on 2023, I’m convinced that it will be dubbed as the year it all began. While Cube and the semantic layer have been around for a long time, it was only in 2023 when the stand alone semantic layer category turned from a cool idea, to a necessary ingredient in any data stack.
A quick look back. The data stack has gone through many phases. Or as Frank Bien, CEO of Looker, used to say “three waves.” The first wave was the one vendor monoliths characterized by rigid but effective data stacks delivering Business Intelligence dashboards with crystallized data models that represented the business accurately and completely. Sounds good on paper, but the reality was these one vendor stacks were complicated, difficult to implement, and once they were live (a mere 2 years or so later) there were very few people who could use them. Thus creating the dreaded data bottleneck where long queues of business people waited for that report or that dashboard that would help them make a critical decision.
Wave two was defined by chaos: the business user taking hold of whatever data they could find, analyzing it themselves in Tableau or another data visualization tool, and then making decisions based on half-baked conclusions from whatever pile of data they could find. But what other choice did they have? They couldn’t wait for the monolith to get around to helping them, decisions needed to be made now.
But this wave of chaos also - behind the scenes - broke apart the semantic layer into tiny bits that now float around the data stack. A little bit in your cloud data warehouse, a little in your transformation, a data model in your BI tool, just enough to almost get you what you need.
Then dbt bought Transform, a semantic layer competitor to ours, and the data industry blinked.
At Cube we saw this as a huge win. It was as if dbt had signaled to the industry that a semantic layer mattered, but didn’t have a product yet to sell. Was this admitting their first stab at a semantic layer wasn’t good enough, so they bought another company? Whatever it was, Cube was in a position to catch all that interest and monetize it.
All our metrics jumped at the news, more companies asking us about semantic layers and how Cube can help them, more people on our website, more sign-ups of our free Cube Cloud product. Our Q1 was fantastic beating our number by 131% closing the Linux Foundation, Drift, Nielsen, AppDirect - 24 new enterprise businesses using Cube.
And in the subsequent months, we flew on that momentum and launched Semantic Layer Sync that keeps the data models in Tableau, SuperSet, PowerBI, and Metabase synced with Cube automatically. We advanced our SQL API, introduced the orchestration API, and added a more human-friendly, visual way to explore the semantic layer with our Cube Cloud Data Graphs.
GigaOm took the lead and defined the Semantic Layer with their Sonar Report for Semantic Layers and Metric Stores outlining what is required with code-first data modeling, many APIs, and pre-aggregation capabilities, as well as naming Cube the only Fast-Mover and Leader in the bunch. Andrew Burst from GigaOm explains his research and thinking in detail in this webinar with Cube customers Drift and Breakthrough. Gartner followed with a Gartner® Hype Cycle™ for Data and Analytics Governance and included Cube and a write-up of the semantic layer.
But then. AI.
ChatGPT exploded, OpenAI skyrocketed, and businesses suddenly recognized the potential that AI and LLMs had to fundamentally re-write (literally and figuratively) their operations. And if the semantic layer seemed prudent before, now it was a requirement.
AI’s hallucinate - especially with data - because they don’t have the context that a semantic layer brings to the data. Cube works beautifully, out of the box, with LLMs to deliver context and accuracy to whatever the AI application is doing.
We partnered to launch an integration with LangChain, one of the most important building blocks for developers of AI, including a demo to show how it works. We partnered with Delphi to deliver AI-powered conversational interface on top of Cube’s semantic layer and Patterson Consulting created a demo that they shared on this webinar and highlighted at our booth at Snowflake’s Summit 2023.
We saw an 8x increase in people asking our sales team about AI and the business overall grew 3.5x. Making me think that AI is now ushering in the next wave - the third wave - of the data stack. One where the semantic layer is a critical component of every data stack so that consistent data and metrics can be sent to BI tools, data applications, and AI tools.
We’ve seen our customers expand into LLMs using their Cube semantic layer as the base. Spyne.app, a marketing analytics application that brings together campaign data from all channels into one place for analysis, expanded their offering to include Spyne AI. Users can ask natural language questions and get back charts and commentary on their marketing campaigns.
“Cube has been great. Not only has Cube improved query performance on BigQuery from a few seconds to milliseconds, but it has increased the quality of ChatGPT substantially thanks to proper definitions and calculated fields.”
And while Spyne AI delivers awesome value to Spyne’s customers, the meta of this story is that because Spyne already had Cube as their semantic layer, they could move incredibly quickly to deliver AI functionality to their customers.
As we head into the end of the year, when everyone slows down (even AI it seems), it’s clear that the semantic layer is now a “Must-Have” as Tomasz Tunguz states in his Top 10 Trends for Data in 2024. He writes:
“Semantic models unify a single definition across an organization for a particular metric. Looker did this within the context of a BI system. But organizations need this layer across the stack. In addition to the reusability of definitions, composability - creating complex analysis with simple building blocks - will define this layer, both for humans who find it easier to understand and for large language models that synthesize semantics.”
Businesses that have a semantic layer will not only have the benefit of a centralized source of truth for their metrics and data, they will also have the flexibility to spin up any new data technology or application in a matter of days - especially LLMs and other AI tools. Can we say that we know what AI tool or LLM will be the best for each business? Unlikely given how new this innovation is. But the smartest business leaders will want the flexibility and power to quickly test and iterate such that when the winning tool is discovered, they will be able to deliver value in days.