/ 7 min read / Using Holistics

A Complete Guide to Build Reporting Analytics on MongoDB using Holistics

by Anthony Thong Do

A Complete Guide to Build Reporting Analytics on MongoDB using Holistics

NoSQL databases like MongoDB offer a flexible, scalable and fast data storage for applications. However, the lack of analytical functions and JOIN operation makes it hard to analyze data in NoSQL databases.

In this tutorial, I will walk you through 5 simple steps to not only build a real-time dashboard but also a scalable data analytics stack out of your MongoDB using Holistics:

  1. Setup a SQL Database as a Data Warehouse
  2. Modeling Data, Import Data, Transform Data
  3. Build and Explore a Dataset to create meaningful visualizations
  4. Build Interactive Dashboards with Filters
  5. Schedule Your Dashboard to Slack

Data used in this guide is the sample_mflix database available when you register a Mongo Atlas account. This database contains data on movies, sessions, theaters, users and comments.

What is Holistics?

Holistics is a BI platform that helps you set up and run a full data analytics stack: from consolidating multiple data sources, modeling your data to generating meaningful insights. It now offers a free tier, so you can sign-up now and start building your dashboards following this guide.

1. Prepare a Data Warehouse

Data Warehouse is a central database to store data from different sources (BigQuery, MySQL, MongoDB, Google Analytics, Google Sheets...) and is optimized for analytical purposes.

Since MongoDB's interface is not friendly for analysts (who are trained mostly in SQL) and it does not have JOINs, you need to pull data from MongoDB to a SQL database (PostgreSQL, BigQuery, MySQL..) to make use of its analytical functions.

Check out these guides if you don't have a Data Warehouse yet

After preparing your data warehouse, you can connect it to Holistics, here I'm connecting to my PostgreSQL Database, and naming it demodw

Connect to your SQL Database
Note: For the SQLs we use to create columns, measures, etc. below, it follows PostgreSQL syntax, so it might be different if you are using a different database from Postgres as your data warehouse.

2. Model your Data

To put it simply, data modeling is the process where you map business logic to physical data so you can make sense of it. In Holistics, data modeling process aims to:

  • Consolidate data from multiple data sources.
  • Transform data to produce user-friendly data units that are easier to query and interpret.
  • Give business meaning to raw data generated from your system with metadata
  • Enable data self-service and collaboration

The end result of this step is a logical layer (a "data modeling layer") that allows further data manipulation operations.

2.1 Import MongoDB

In this step, I will import MongoDB data into tables in the data warehouse demodw I connected above, and set a schedule to refresh this import.

In Data Modeling View, create Data Model from Data Import, choose MongoDB. At this step, you are also prompted to select a default schema to write imported data into.

Import MongoDB

Next, Add my MongoDB connection:

Add MongoDB Connection

Select the MongoDB tables you want to import. Here I'm selecting all the tables (comments, movies, sessions, theaters, users)

Select MongoDB tables to import

After importing successfully, you can open each Data Model to see its structure or preview actual data sitting in your data warehouse.

2.2 Extract MongoDB fields

If your columns contain array or nested documents, you need to use SQL to extract them into separate columns.

For example, Movies's records have key imdb with structure like below. I will extract the rating values to imdb_rating field to be used later.

"imdb": { 
        "rating": { "$numberDouble": "7.3" }, 
        "votes": { "$numberInt": "5043" }, 
        "id": { "$numberInt": "12" } 
}

Open Movies Model, select Add / Calculated Column, give it a name, and use this query to extract imdb_rating {{ #THIS.imdb }}::jsonb ->> 'rating' (This is PostgreSQL syntax, if you DW is not PostgreSQL please use your SQL Database syntax instead). It will create a new field called IMDB Rating in your Movies Model that contains rating values.

Add a custom dimension to a data model

2.3 Add a Measure

Measures are created from SQL's aggregate functions and operations such as COUNT(), SUM(), AVG(), SUM(field_a) + SUM(field_b)... By combining Measures and Dimensions, you can ensure your end-users can produce the correct calculations and generate insights without writing the formula themselves.

Here I'm creating a simple count_movies measure with SQL formula count({{ #THIS.id }}). The syntax follows your database's SQL flavor, so you can create more complex Measures like "Count of movies of rating 8 and above": count(case when  {{#THIS.imdb_rating}} >= 8.0 then {{#THIS.id}} else null end)

Drag in other dimensions and you can break count_movies by released_year, genres, countries...

Add a measure

2.4 Add Relationships between data models

Relationships help you explore data from different tables without explicitly write joining SQL.

For example, a movie can have multiple comments, so I will link comments data model to movies data model with a many-to-one relationship (n-1) on comments.movie_id and movies.id.

Add relationships between data models

You can also click on Relationships tab to view the relationship diagram.

2.5. Transform your data and Import from other data sources

The steps above only concerns one MongoDB data source and involve a light transformation (extracting values from a nested fields). However, you can also integrate data from other sources (like CSV files, Google Sheets, Facebook Ads...) in your analysis, and write more complex SQL transformations. To keep this guide straightforward, I will leave those operations to another post.

You can refer to these docs if you are curious about what else you can do:

3. Build and Explore Datasets

Dataset is a collection of Data Models that share the same interest and have relationships with one another.

You can explore a dataset by mixing dimensions & measures from different data models. Holistics's engine will base on those combinations to generate SQL to be run against your data warehouse.

3.1 Create a dataset

Here I'm creating a dataset by combining comments and movies data models. All fields and measures from the two models will be available in the dataset, and they can be combined thanks to the relationship that I set up earlier.

Dataset Creation

3.2 Explore a dataset

After creating a dataset, you can start combining dimensions and measures in a familiar drag-and-drop interface, and select appropriate visualizations.

In this example, I will create a table to "list movies with their IMDB ratings and comments count" by combining movies.title, movies.imdb_rating, count(comments.id)

Movies by Comments Count, IMDB Rating

A chart of total movies released by years is only a few clicks away:

Movies over years

4. Build Interactive Dashboards with Filters

Finally, you can create a dashboard, collect all your explorations as charts and tables that you can check anytime or share with your teams.

Create a dashboard

Add some filters

What's the use of a dashboard if you don't have any filters?

Simply click on Add filter icon and choose from five types of filters (Field, Date, Text, Number, True/False) to filter your data. For more information, please visit our docs about Holistics's Filters.

For example, I'm creating a filter from field movies_dataset.movies.released_year and map it to movies.released_year fields of two widgets.

When a viewer change the released_year, a filtering condition will be applied to both mapped widgets and change their values.

Add filters

5. Deliver Your Dashboards to Slack

Finally, schedule your fresh dashboards to your mailboxes or Slack everyday, so everyone can keep in touch with latest data change and conveniently use data in internal discussions.

Click on Schedule icon on your dashboard, select New Slack Schedule. Here you can select the channel to post your dashboard to, schedule when to post and overwrite default filter values.

Setup Slack Schedule

Tada, every 7:00 AM, your Slack channel will receive a fresh dashboard like the one below

Slack schedule

Conclusion

I hope this guide serves as a fun, hands-on way to learn how to build your own MongoDB analytics foundation from scratch.

Of course, we’re just scratching the surface here. There are still more interesting things to do like combining data from other sources, using SQL to transform your data, optimizing query performance... which I will talk in other posts.

Now let's jump in and figure out how to apply this guide in your specific use cases! If you face any problem, do not hesitate to comment or contact me!

Anthony Thong Do

Anthony Thong Do

Entrepreneurship, Product, Data, Design | holistics.io, dbidagram.io | anthonytd.com | 200% on Products

Read More

From SQL Queries To Beautiful Charts

Connect to your database and build beautiful charts with Holistics BI

Learn More
Grab Logo

"Holistics is the solution to the increasingly many and complex data requests from the operational teams"


Tang Yee Jie - Senior Data Analyst, Grab

Tang Yee Jie

Senior Data Analyst, Grab