Overview
There are many ways to visualize data, but when it comes to location-based (or geospatial) data, map-based data visualizations are the most comprehensible and graphic.
In this guide, we'll explore how to build a map data visualization with JavaScript (and React) using MapBox, a very popular set of tools for working with maps, navigation, and location-based search, etc.
We'll also learn how to make this map data visualization interactive (or dynamic), allowing users to control what data is being visualized on the map.
To make this guide even more interesting, we'll use Stack Overflow open dataset, publicly available in Google BigQuery and on Kaggle. With this dataset, we'll be able to find answers to the following questions:
- Where do Stack Overflow users live?
- Is there any correlation between Stack Overflow users' locations and their ratings?
- What is the total and average Stack Oerflow users' rating by country?
- Is there any difference between the locations of people who ask and answer questions?
Also, to host and serve this dataset via an API, we'll use PostgreSQL as a database and Cube as an analytical API platfrom which allows to bootstrap an backend for an analytical app in minutes.
So, that's our plan — and let's get hacking! 🤘
Oh, wait! Here's what our result is going to look like! Amazing, huh?
If you can't wait, feel free to study the demo and the source code on GitHub.
Update from April 2023. This guide was authored more than 2 years ago and certain parts (e.g., generation of the front-end boilerplate code) are not relevant anymore. Please see up-to-date front-end guides in the blog.
Dataset and API
Original Stack Overflow dataset contains locations as strings of text. However, Mapbox best works with locations encoded as GeoJSON, an open standard for geographical features based (surprise!) on JSON.
That's why we've used Mapbox Search API to perform geocoding. As the geocoding procedure has nothing to do with map data visualization, we're just providing the ready to use dataset with embedded GeoJSON data (the file size is about 600 MB).
We've also set up a public Postgres instance that we'll use throughout this tutorial so you don't need to set it up yourself.
Setting Up an API 📦
Let's use Cube, an open-source analytical API platform, to serve this dataset over an API. Run this command:
Cube uses environment variables for configuration. To set up the connection to our database, we need to specify the database type and name.
In the newly created stackoverflow__example
folder, please replace the contents of the .env file with the following:
Now we're ready to start the API with this simple command:
To check if the API works, please navigate to http://localhost:4000 in your browser. You'll see Cube Developer Playground, a powerful tool which greatly simplifies data exploration and query building.
The last thing left to make the API work is to define the data schema: it describes what kind of data we have in our dataset and what should be available at our application.
Let’s go to the data schema page and check all tables from our database. Then, please click on the plus icon and press the “generate schema” button. Voila! 🎉
Now you can spot a number of new *.js
files in the schema
folder.
So, our API is set up, and we're ready to create map data visualizations with Mapbox!
Frontend and Mapbox
Okay, now it's time to write some JavaScript and create the front-end part of our map data visualization. As with the data schema, we can easily scaffold it using Cube Developer Playground.
Navigate to the templates page and choose one of predefined templates or click "Create your own". In this guide, we'll be using React, so choose accordingly.
After a few minutes spent to install all dependencies (oh, these node_modules
) you'll have the new dashboard-app
folder. Run this app with the following commands:
Great! Now we're ready to add Mapbox to our front-end app.
Setting Up Mapbox 🗺
We'll be using the react-map-gl wrapper to work with Mapbox. Actually, you can find some plugins for React, Angular, and other frameworks in Mapbox documentation.
Let's install react-map-gl
with this command:
To connect this package to our front-end app, replace the src/App.jsx
with the following:
You can see that MAPBOX_TOKEN
needs to be obtained from Mapbox and put in this file.
Please see the Mapbox documentation or, if you already have a Mapbox account, just generate it at the account page.
At this point we have an empty world map and can start to visualize data. Hurray!
Planning the Map Data Visualization 🔢
Here's how you can any map data visualization using Mapbox and Cube:
- load data to the front-end with Cube
- transform data to GeoJSON format
- load data to Mapbox layers
- optionally, customize the map using the
properties
object to set up data-driven styling and manipulations
In this guide, we'll follow this path and create four independent map data visualizations:
- a heatmap layer based on users' location data
- a points layer with data-driven styling and dynamically updated data source
- a points layer with click events
- a choropleth layer based on different calculations and data-driven styling
Let's get hacking! 😎
Heatmap Visualization
Okay, let's create our first map data visualization! 1️⃣
Heatmap layer is a suitable way to show data distribution and density. That's why we'll use it to show where Stack Overflow users live.
Data Schema
This component needs quite a simple schema, because we need only such dimension as “users locations coordinates” and such measure as “count”.
However, some Stack Overflow users have amazing locations like "in the cloud", "Interstellar Transport Station", or "on a server far far away". Surprisingly, we can't translate all these fancy locations to GeoJSON, so we're using the SQL WHERE
clause to select only users from the Earth. 🌎
Here's how the schema/Users.js
file should look like:
Web Component
Also, we'll need the dashboard-app/src/components/Heatmap.js
component with the following source code. Let's break down its contents!
First, we're loading data to the front-end with a convenient Cube hook:
To make map rendering faster, with this query we're grouping users by their locations.
Then, we transform query results to GeoJSON format:
After that, we feed this data to Mapbox. With react-map-gl
, we can do it this way:
Note that here we use Mapbox data-driven styling: we defined the heatmap-weight
property as an expression and it depends on the "properties.value":
You can find more information about expressions in Mapbox docs.
Here's the heatmap we've built:
Useful links
- Heatmap layer example at Mapbox documentation
- Heatmap layers params descriptions
- Some theory about heatmap layers settings, palettes
Dynamic Points Visualization
The next question was: is there any correlation between Stack Overflow users' locations and their ratings? 2️⃣
Spoiler alert: no, there isn't 😜. But it's a good question to understand how dynamic data loading works and to dive deep into Cube filters.
Data Schema
We need to tweak the schema/User.js
data schema to look like this:
Web Component
Also, we'll need the dashboard-app/src/components/Points.js
component with the following source code. Let's break down its contents!
First, we needed to query the API to find out an initial range of users reputations:
Then, we create a Slider
component from Ant Design, a great open source UI toolkit. On every chnage to this Slider's value, the front-end will make a request to the database:
To make maps rendering faster, with this query we're grouping users by their locations and showing only the user with the maximum rating.
Then, like in the previous example, we transform query results to GeoJSON format:
Please note that we've also applied a data-driven styling at the layer properties, and now points' radius depends on the rating value.
When the data volume is moderate, it's also possible to use only Mapbox filters and still achieve desired performance. We can load data with Cube once and then filter rendered data with these layer settings:
Here's the visualization we've built:
Points and Events Visualization
Here we wanted to show the distribution of answers and questions by countries, so we rendered most viewable Stack Overflow questions and most rated answers. 3️⃣
When a point is clicked, we render a popup with information about a question.
Data Schema
Due to the dataset structure, we don't have the user geometry info in the Questions
table.
That's why we need to use joins in our data schema. It's a one-to-many relationship which means that one user can leave many questions.
We need to add the following code to the schema/Questions.js
file:
Web Component
Then, we need to have the dashboard-app/src/components/ClickEvents.js
component to contain the following source code. Here are the most important highlights!
The query to get questions data:
Then we use some pretty straightforward code to transform the data into geoJSON:
The next step is to catch the click event and load the point data. The following code is specific to the react-map-gl
wrapper, but the logic is just to listen to map clicks and filter by layer id:
When we catch a click event on some point, we request questions data filtered by point location and update the popup.
So, here's our glorious result:
Choropleth Visualization
Finally, choropleth. This type of map chart is suitable for regional statistics, so we're going to use it to visualize total and average users’ rankings by country. 4️⃣
Data Schema
To accomplish this, we'll need to complicate our schema a bit with a few transitive joins.
First, let's update the schema/Users.js
file:
The next file is schema/Mapbox.js
, it contains country codes and names:
Then comes schema/MapboxCoords.js
which, obviously, hold polygon coordinates for map rendering:
Please note that we have a join in schema/Mapbox.js
:
And another one in schema/User.js
:
With the Stack Overflow dataset, our most suitable column in the Mapbox
table is geounit
, but in other cases, postal codes, or iso_a3
/iso_a2
could work better.
That's all in regard to the data schema. You don't need to join the Users
cube with the MapboxCoords
cube directly. Cube will make all the joins for you.
Web Component
The source code is contained in the dashboard-app/src/components/Choropleth.js
component. Breaking it down for the last time:
The query is quite simple: we have a measure that calculates the sum of users’ rankings.
Then we need to transform the result to geoJSON:
After that we define a few data-driven styles to render the choropleth layer with a chosen color palette:
And that's basically it!
Here's what we're going to behold once we're done:
Looks beautiful, right?
The glorious end
So, here our attempt to build a map data visualization comes to its end.
We hope that you liked this guide. If you have any feedback or questions, feel free to join Cube community on Slack — we'll be happy to assist you.
Also, if you liked the way the data was queries via Cube API — visit Cube website and give it a shot. Cheers! 🎉