This article will teach you how to build fast GraphQL APIs with sub-second latency, reduce architecture complexity, and make engineering teams more efficient using Apollo GraphQL Federation and Cube’s GraphQL API.
Step-by-step, we'll build an Apollo supergraph with Apollo GraphOS, federate queries from external services, and boost query performance (reduce latency) for analytical queries more than tenfold.
You can play around with the demo application and find the complete code on GitHub.
What are we building?
Take a look at the architecture diagram below. We'll first create an Apollo Server to load data from PostgreSQL. Then, we'll build an Apollo supergraph with Apollo GraphOS that fetches data from both a caching layer that accesses the same PostgreSQL instance, built with Cube’s GraphQL API, and the Apollo Server we built previously.
This will enable delivering data in milliseconds to end users of a web application built with React and Apollo Client. You will see that Cube can make any analytical query to PostgreSQL run with sub-second latency, making your GraphQL queries just as fast.
We’ll use an example dataset of fraudulent financial transactions. The dataset is collected from Kaggle and only has one table called fraud
. To read more, check out the description of the dataset.
Challenges of data-intensive applications
Data-intensive front-end apps consume data from multiple services with GraphQL endpoints. These GraphQL endpoints can be accessed independently. But, best practice is to combine them into a supergraph.
This makes it easier for your front-end team to work with one endpoint and unified schema while the back-end team has modular graphs and separation of concerns among multiple services.
When building data-intensive applications, you’ll run complex queries with heavy joins and aggregations. Querying massive amounts of data and aggregating them in the application layer is time-consuming and causes sub-optimal performance.
Let's see how these issues are resolved with Apollo and Cube.
What is Apollo GraphQL?
Apollo GraphQL is the leading open-source GraphQL implementation with 17 million monthly downloads! Their toolset helps build, deliver, and observe modern GraphQL apps and APIs by leveraging the power of supergraphs.
Apollo Federation — the API Gateway. Apollo Federation is the industry-standard open architecture for building a distributed supergraph that scales across teams. The Apollo team recently released a new Apollo Router to compose a supergraph. However, their most recent addition, Apollo GraphOS uses the Apollo Router in the Cloud, so you don't have to go through the hassle of running it yourself. We'll use Apollo GraphOS to compose a supergraph from multiple subgraphs, determine a query plan, and route requests across your services. Learn more about Apollo Federation in this post.
Apollo Server — the back-end server. The best way to build a production-ready TypeScript GraphQL server that connects to any microservice, API, or database. Compatible with all popular JavaScript frameworks and deployable in serverless environments.
Apollo Client — the front-end framework. The industry-standard open-source GraphQL client for web, iOS and Android apps with everything you need to fetch, cache, and modify application data.
Apollo Studio — the supergraph manager. Studio manages your supergraph lifecycle. Apollo tracks your GraphQL schemas in a registry to create a central source of truth for everything in your supergraph. Studio allows developers to explore data, collaborate on queries, observe usage, and deliver schema changes with agility and confidence.
Getting started with Apollo GraphQL Server
Let’s start by creating an Apollo Server with Node.js, and connecting it to our PostgreSQL data source.
Create a fresh folder, and run npm init -y
to initialize npm.
Next, let’s install a few modules we need.
Let’s move on to creating the database connection. Here’s are the credentials to the database we provided.
Create a folder called database
and paste this code into an index.js
file.
Once the database is configured, we can configure the GraphQL queries.
Create another folder called graphql
and paste this code into a fraud.js
file.
You see the amountSumFraudsWithStep
function will query PostgreSQL and aggregate the amount
column.
Now create another folder called src
and within it create a file called server.js
.
Finally, add an index.js
file in the root directory.
With all the code added, start the Node.js server. Jump into your terminal and run:
Boom! With that, you have an Apollo Server running. Let’s run some analytics queries!
Running analytical queries with Apollo GraphQL Server
With the Apollo Server running, open up http://localhost:4000/graphql
in your browser.
Go ahead and click Query your server. This will take you to Apollo Studio sandbox. In the Operation field, go ahead and run the query below.
Don’t forget to add values for the variables.
The query will take around 2 seconds.
You can use caching in Apollo, but it’s difficult to cache queries that often have varying filters. Caching alone can only cover a finite number of use cases. If I add a where clause, I get the same issue again even though caching is enabled.
Let’s try getting to the bottom of why running analytical queries is slow with GraphQL and PostgreSQL.
Why are analytical queries slow with GraphQL and PostgreSQL?
Analytical queries require aggregating the data by column. Postgres is a traditional row-oriented database. Row-oriented databases store information in the disk row by row.
Row-oriented databases don’t perform well in this case because all columns in every row need to be read from the disk instead of the few used in a query. You can learn more about how row-oriented databases work and their limitations in this blog post.
To mitigate this issue you need a caching mechanism that’s reliable and flexible.
User experience research consistently shows a positive correlation between faster response times and higher customer satisfaction. Half of your users will immediately drop off if the response time is over 3 seconds if we believe Google.
What we’ll do to mitigate this is to add Cube into our data stack. It’ll help us accelerate queries with caching and pre-aggregations by adding Cube’s GraphQL API to our federated supergraph.
What is Cube?
Cube is an open-source API-first headless business intelligence platform that connects to your data sources and makes queries fast, responsive, cost-effective, and consistent across your applications.
It enables data engineers and application developers to access and organize data to build performant data-intensive applications.
Cube’s API layer can efficiently aggregate your data and serve it to applications. Instead of querying complex, large datasets directly in your PostgreSQL database, you can use Cube as a middleware layer. Cube performs caching, pre-aggregation, and much more, making your analytical queries faster and more efficient.
Cube has a GraphQL API that easily connects to your Apollo Federation supergraph as a subgraph. This is how you unify all your GraphQL endpoints with Apollo Federation and get the added benefit of performant analytical queries.
Getting started with Cube
The easiest way to get started with Cube is with Cube Cloud. It provides a fully managed, ready-to-use Cube cluster. However, if you prefer self-hosting, follow this guide in the docs.
Let’s move on and create a new Cube deployment in Cube Cloud. You can select a cloud platform of your choice.
Next, select + Create to get started with a fresh instance from scratch.
Next, provide the database connection information. Select PostgreSQL.
Now enter the same database credentials we used above when setting up Apollo Server, and select continue.
Cube auto-generates a Data Schema from your SQL tables. It’s used to model raw data into meaningful business definitions.
Select the fraud
table for schema generation, and click Generate. It takes a few minutes for the Cube instance to get provisioned.
Now, we can move on to defining our data model and accelerating queries with pre-aggregations.
Centralized data modeling
In your Cube deployment, select Schema in the left-hand navigation and click Enter Development Mode. Now let’s edit the Fraud.js
schema definition to add a measure for the sum of transaction amounts.
Once Development Mode is enabled, go ahead and paste the code below into the Fraud.js
schema file.
Save the changes, and the data model will be updated. Next, commit and push these changes. Cube uses Git for version control. You can revert your changes anytime you like.
Moving over to the Playground, you can run the same analytical query we ran previously with Apollo.
The query will also be mapped to a GraphQL query through the GraphiQL IDE that’s exposed within Cube Cloud.
Now we have a centralized data model where we can reliably handle business definitions without making any changes to your PostgreSQL database.
But, we’re only halfway there. Let’s add query acceleration with pre-aggregations as well.
Adding pre-aggregations to increase query performance
One of Cube’s most powerful features is pre-aggregations. They can reduce the execution time of a query drastically. In our case with this tutorial, we’ll be reducing the response time to well below 200 ms, or even less, for queries that took above 2 seconds.
In Cube, pre-aggregations are condensed versions of the source data. They are materialized ahead of time and persisted as tables separately from the raw data. To learn more about pre-aggregations, please follow this tutorial.
We also highly recommend you check these in-depth video workshops on pre-aggregations: Mastering Cube Pre-Aggregations and Advanced Pre-aggregations in Cube.
But now, let’s jump back into Development Mode. Select the Fraud.js
schema file again. Update the preAggregations
section to add a pre-aggregation definition.
Save the changes, click Commit and push, and the pre-aggregation will be built for our analytical query. Here’s what the pre-aggregation should look like once the schema has been updated.
When you run the query next time in Cube, the data will be pre-aggregated and saved in Cube's caching layer inside of Cube Store.
Running this query again, you’ll see a massive performance increase.
The true power lies in still retaining query acceleration when using filters. That’s why pre-aggregations are so much more powerful than basic caching strategies.
At this point, your Cube instance is ready to be added to the Apollo Federation supergraph.
Creating an Apollo Federation supergraph with Apollo Studio and GraphOS
Head over to Apollo Studio. If you’re new, check out the Apollo Studio getting started guide here.
Go ahead and sign up and create an organization.
Studio automatically redirects you to your newly created organization, which is on the Serverless (Free) plan by default.
You now have an Apollo Studio organization, but it doesn't contain any graphs. Next we'll create a cloud supergraph, which will incorporate the existing GraphQL API we created above.
From your new organization in Apollo Studio, navigate to the Supergraphs tab. Click Connect your GraphQL API, and follow the steps below.
Go ahead and grab Cube’s GraphQL API endpoint and Authorization token.
Paste the URL in the Endpoint URL field, and add the token by clicking the Provide HTTP Headers button.
Make sure to add a name to your subgraph as well, and click Next.
Now, we need to set up the Supergraph.
Give the Supergraph an ID and Name, and move on to the next step. Now, add a variant to your Supergraph. Leave it as main for now and click the Create Supergraph button.
This will take you to the Supergraph overview page.
Open up the main
variant of your Supergraph, and go into Settings → Cloud Routing. Here's where you see the API Endpoint to access the Apollo Federation Router within GraphOS.
On this page you can also the Router config and add secrets.
Let's leave this as-is for now and move on to adding another subgraph. Navigate to the subgraphs page inside if the main
Variant. Here, click the Add a subgraph button.
Make sure to install Rover, Apollo’s CLI tool, then authenticate the Rover CLI by running this command:
This command will return a prompt. Copy the user token Apollo generated for you and paste it into the terminal prompt.
Next, create a file called apollo.graphql
that will contain the SDL for the Apollo Server GraphQL API we created previously.
We'll use this SDL to add a subgraph. In this example the Supergraph is called cube-team@main
.
Make sure to run the rover
command in the same directory where you saved the apollo.graphql
file.
This will run a check to make sure there are no breaking changes.
Finally, re-deploy your changes and publish your subgraph by running the following command.
After running the command above, you'll see another subgraph get added to your Supergraph variant.
Replace the API endpoints with your own. Or, use these to get a working sample.
Nice! Now you have Apollo GraphOS running. How about we run some analytics queries?
Running analytical queries with Apollo GraphOS
With your Supergraph in Apollo GraphOS running, open up the API Endpoint that gets generated for you in your browser. For this example it's https://main--cube-team.apollographos.net/graphql
.
Open the Apollo Studio sandbox and you’ll see you can query both Cube’s GraphQL API and the Apollo Server.
Again, in the left-hand side navigation you can see both query definitions. Go ahead run the same GraphQL query against the Apollo Server.
Now, let’s run a GraphQL query against Cube.
With that, let’s move on to building the front-end app!
Building data visualization with Apollo Client and React
For the front-end app, we’ll use React and Apollo Client, and query the Apollo GraphOS federated GraphQL API. We’ll use the nivo charting library, a modern production-ready data visualization tool.
You can check the full source code on GitHub and instantly run it with yarn dev
. You'll get a copy of this demo application.
The entry point is src/index.js
, and it uses a LineChart.jsx
file to generate the nivo line chart.
We decided to showcase the power of pre-aggregations by generating queries that filter the steps into pages of 50 each, as well as choosing whether to show valid or fraudulent transactions.
Even though the query uses filters, it will still be accelerated due to using pre-aggregations in Cube!
Let’s walk through the contents of the React files. First the index.js
.
Let me explain the main points of the code above.
- We use
@apollo/client
and wrap the React<App />
in<ApolloProvider>...<ApolloProvider/>
.- This includes using
httpLink
andauthLink
to load the Apollo GraphOS federated GraphQL API endpoint and Cube’s secret token.
- This includes using
- A typical API interaction flow in a React app with React hooks looks like this:
- use
useState
to create a state variable (e.g.,fraudChartDataCube
); - compose a GraphQL query (e.g.,
GET_FRAUD_AMOUNT_SUM_CUBE
); - call
useQuery
to fetch the result set (e.g.,fraudDataCube
); - use
useEffect
to await for the data and to transform it intofraudChartDataCube
to be loaded intoLineChart
; - assign the data to the state variable (e.g., with
setFraudChartDataCube
).
- use
- We configure the
GET_FRAUD_AMOUNT_SUM_CUBE
GraphQL query to load parameters dynamically from the two dropdown selectors. - Lastly, the data is rendered by using
DisplayFraudAmountSum
withLineChart
.
That's it! Now you know how to build the React app. Check out the live demo once again if you need some inspiration.
Summary
In the modern data stack, performance plays a big role. Cube's API layer makes it possible to model, cache, and optimize your data-intensive queries. Moreover, Cube's GraphQL API makes it effortless to integrate with your existing GraphQL architecture.
By using GraphQL federation with Apollo GraphOS, you can keep using your Apollo GraphQL server even when faced with running time-consuming analytical queries!
You can sign up for Cube Cloud for free and try it for yourself. To learn more about how Cube can help you to build your project, head over to the official documentation page.
If you have questions or feedback, we would love to hear what you have to say! Come join our Slack community. Click here to join!
That’s all for today. Feel free to leave Cube a ⭐ on GitHub if you liked this article!