Hasura GraphQL engine helps build modern apps & APIs 10x faster. In this article, we’ll see how you can build a data-intensive app with Hasura that would work 10x faster. To achieve that, we’ll use Hasura Remote Schemas with Cube and its GraphQL API to boost query performance.
You can find the complete source code for this application on GitHub.
What are we building?
In this tutorial, you’ll learn how to extend Hasura with a Remote Schema by using Cube as a caching layer. This will enable delivering data in milliseconds to end users of a web application built with React and Apollo GraphQL.
We’ll use an example dataset of fraudulent financial transactions. The dataset is collected from Kaggle and only has one table called fraud
. To read more, check out the full explanation of the dataset.
Let’s jump in!
What is Hasura?
Hasura is an open-source GraphQL engine that helps developers build modern APIs and apps faster. You can run Hasura locally or in the cloud, and connect it to your new or existing databases to instantly get a production-grade GraphQL API.
The Hasura GraphQL engine is open-source with ~28,000 GitHub stars!
Hasura recommends running Hasura Cloud for your production environment with the option to use Hasura Enterprise, with added security and performance features, if you need to deploy it on-premises.
Hasura currently supports four databases: PostgreSQL, Microsoft SQL Server, Google BigQuery, and AWS Aurora. And, four more are coming soon: Oracle, Mongo, MySQL, Elastic.
Hasura has built-in authorization and caching. The row-level security authorization engine allows you to conveniently specify authorization rules at a model level, and safely expose the GraphQL API to developers inside or outside your organization. Both Hasura Cloud and Enterprise provide support for caching query responses, to improve performance for queries that are executed frequently. This includes actions and queries against remote schemas as well!
Hasura support both GraphQL and REST APIs. You can create idiomatic REST endpoints based on GraphQL templates. With Hasura you don’t need to be a GraphQL expert upfront. It saves time and effort in shipping and maintaining the APIs for you.
Getting Started with Hasura
The easiest way to get started with Hasura is by using Hasura Cloud. Head over to hasura.io and create a new account if you already haven't done so.
Start by creating a new instance.
Let’s connect the database to Hasura. From your Hasura console, select the data icon and click the Connect Database button. Next, select connect to existing database options.
You will get a form to enter your database credentials. Here’s where you add the credentials to the database we provided and click “Connect Database”.
Once you connect the database, click the Track button to create a GraphQL API from the fraud
table.
Once tracking is enabled, you can modify columns and preview the data.
Let’s jump into Hasura and run a few analytical queries.
Running analytical queries in Hasura
Navigate to the API tab in the Hasura console. Open the GraphQLi IDE. Let’s use aggregate fields to fetch aggregated data. Hasura is great to expose this aggregation feature!
Notice that it takes about 3 seconds to load the data.
Hasura exposes a caching feature, but again, it’s difficult to cache queries that often have varying filters.
Caching alone can only cover a finite number of use cases. If I add a where clause, I get the same issue again even though caching is enabled.
We can also create aggregations with views in Hasura.
Once again go to the Data tab in Hasura and click the SQL tab in the left-hand navigation.
Here you can add raw SQL to create a view.
But, also enable tracking as well. This will let you query the data using the GraphQL API.
The query now takes around 2 seconds to return a response. Building front-end visualizations with charts that take this long to load are unacceptable.
Let’s try getting to the bottom of why running analytical queries is slow with Hasura and PostgreSQL.
Why are the analytical queries slow in Hasura?
Data, a lot of it. Postgres is a traditional row-oriented database. Row-oriented databases store information in the disk row by row.
Analytical queries require aggregating the data by column. Row-oriented databases don’t perform well in this case because all columns in every row need to be read from the disk instead of a few ones involved in a query. Also, certain optimizations like compression can only be effectively applied to columnar data.
You can learn more about how row-oriented databases work and their limitations in this blog post.
You need a caching mechanism that’s reliable and flexible. User experience research consistently shows a positive correlation between faster response times and higher customer satisfaction. Half of your users will immediately drop off if the response time is over 3 seconds if we believe Google.
What we’ll do to mitigate this is to add Cube into our data stack. It’ll help us model data and accelerate queries with caching and pre-aggregations.
What is Cube?
Cube is an open-source API-first headless business intelligence platform that connects to your data sources and makes queries fast, responsive, cost-effective, and consistent across your applications.
It enables data engineers and application developers to access and organize data to build performant data-intensive applications.
Cube’s API layer can efficiently aggregate your data and serve it to applications. Instead of querying complex, large datasets directly from your Postgres database, you can use Cube as a middleware layer. Cube performs caching, pre-aggregation, and much more, making your analytical queries faster and more efficient.
Moreover, Cube has a GraphQL API that easily connects to your Hasura engine as a Remote Schema. This way you can unify all your GraphQL endpoints in Hasura and get the added benefit of performant analytical queries.
Setting up Cube
The easiest way to get started with Cube is with Cube Cloud. It provides a fully managed, ready-to-use Cube cluster. However, if you prefer self-hosting, follow this guide in the docs.
Let’s move on and create a new Cube deployment in Cube Cloud. You can select a cloud platform of your choice.
Next, select + Create to get started with a fresh instance from scratch.
Next, provide the database connection information. Select PostgreSQL.
Now enter the same database credentials we used above when setting up Hasura, and select continue.
Cube auto-generates a data model from your SQL tables. Similar to Hasura’s Track feature mentioned above. It’s used to model raw data into meaningful business definitions.
Select the fraud
table for data model generation, and click Generate . It takes a few minutes for the Cube instance to get provisioned.
Now, we can move on to defining our data model and accelerating queries with pre-aggregations.
Centralized data modeling
In your Cube deployment, select Data Model in the left-hand navigation and click Enter Development Mode . Now let’s edit the Fraud.js
model definition to add a measure for the sum of transaction amounts.
Once Development Mode is enabled, go ahead and paste the code below into the Fraud.js
data model file.
Save the changes, and the data model will be updated. Next, commit and push these changes. Cube uses Git for version control. You can revert your changes anytime you like.
Moving over to the Playground, you can run the same analytical query we ran previously in Hasura.
The query will also be mapped to a GraphQL query through the GraphiQL IDE that’s exposed within Cube Cloud.
Now we have a centralized data model where we can reliably handle business definitions without splitting up logic between PostgreSQL and Hasura.
But, we’re only halfway there. Let’s add query acceleration with pre-aggregations as well.
Adding pre-aggregations to increase query performance
One of Cube’s most powerful features is pre-aggregations. They can reduce the execution time of a query drastically. In our case with this tutorial, we’ll be reducing the response time to well below 200 ms, or even less, for queries that took above 3 seconds.
In Cube, pre-aggregations are condensed versions of the source data. They are materialized ahead of time and persisted as tables separately from the raw data. To learn more about pre-aggregations, please follow this tutorial.
We also highly recommend you check these in-depth video workshops on pre-aggregations: Mastering Cube Pre-Aggregations and Advanced Pre-aggregations in Cube.
But now, let’s jump back into Development Mode*. Select the Fraud.js
data model file again. Update the preAggregations
section to add a pre-aggregation definition.
Save the changes, click Commit and push , and the pre-aggregation will be built for our analytical query. Here’s what the pre-aggregation should look like once the data model has been updated.
When you run the query next time in Cube, the data will be pre-aggregated and saved in Cube's caching layer inside of Cube Store.
Running this query again, you’ll see a massive performance increase.
The true power lies in still retaining query acceleration when using filters. That’s why pre-aggregations are so much more powerful than basic caching strategies.
At this point, your Cube instance is ready to be hooked up with Hasura as a remote graph.
Add Cube as Remote Schema in Hasura
Hasura can merge remote GraphQL schemas and provide a unified GraphQL API. Think of it like automated schema stitching. Any GraphQL service that lives outside of Hasura is considered a remote graph.
You can effortlessly merge Cube's GraphQL schema using the Remote Schema option in Hasura.
From the Hasura console, select Remote Schemas, then select Add.
To add a remote schema, first copy the GraphQL API and Authorization token from your Cube deployment. You can find the GraphQL API endpoint and Authorization token by clicking the How to connect your application button in your Cube deployment Overview tab.
Give the Remote Schema a name and paste the GraphQL API and Authorization token in the input fields in Hasura. Click Add when you’re done.
If you go back to the API tab in Hasura, you can now run Cube queries directly. Let’s run the same query we ran a moment ago in the Cube Cloud Playground.
You will notice a significant performance improvement in your query this time.
⚠️ Caution: You may not see a significant performance improvement for the first few queries if you use a free tier of Hasura and a free tier of Cube. After the first 2/3 queries, it should work as expected. The free tier for both services goes into sleep mode when not used for some time.
Building data visualization with Apollo and React
For the front-end app, we’ll use React and Apollo Client, and query the Hasura GraphQL server that’s connected to Cube as a remote schema. We’ll use the nivo charting library, a modern production-ready data visualization tool.
You can check the full source code on GitHub and instantly run it with yarn dev
. You'll get a copy of this demo application.
The entry point is src/index.js
, and it uses a LineChart.jsx
file to generate the nivo line chart.
We decided to showcase the power of pre-aggregations by generating queries that filter the steps into pages of 50 each, as well as choosing whether to show valid or fraudulent transactions.
Even though the query uses filters, it will still be accelerated due to using pre-aggregations in Cube!
Let’s walk through the contents of the React files. First the index.js
.
Let me explain the main points of the code above.
- We use
@apollo/client
and wrap the React<App />
in<ApolloProvider>...<ApolloProvider/>
.- This includes using
httpLink
andauthLink
to load the Hasura GraphQL API endpoint and secret token.
- This includes using
- A typical API interaction flow in a React app with React hooks looks like this:
- use
useState
to create a state variable (e.g.,fraudChartData
); - compose a GraphQL query (e.g.,
GET_FRAUD_AMOUNT_SUM_CUBE_REMOTE_SCHEMA
); - call
useQuery
to fetch the result set (e.g.,fraudDataCube
); - use
useEffect
to await for the data and to transform it intofraudChartDataCube
to be loaded intoLineChart
; - assign the data to the state variable (e.g., with
setFraudChartDataCube
).
- use
- We configure the
GET_FRAUD_AMOUNT_SUM_CUBE_REMOTE_SCHEMA
GraphQL query to load parameters dynamically from the two dropdown selectors. - Lastly, the data is rendered by using
DisplayFraudAmountSum
withLineChart
.
That's it!
Final thoughts
Cube comes packed with features that can help you build data-intensive apps fast. Cube supports features like multiple database connections, multi-tenancy, SQL API for connecting BI tools, and more. By using Hasura’s remote schema, you can keep using your Hasura GraphQL server even when faced with running time-consuming analytical queries!
You can sign up for Cube Cloud for free and try it for yourself. To learn more about how Cube can help you to build your project, head over to the official documentation page.
If you have questions or feedback, we would love to hear what you have to say! Come join our Slack community. Click here to join!
That’s all for today. Feel free to leave Cube a ⭐ on GitHub if you liked this article!