How To Setup Reporting Analytics For MongoDB

For a complete MongoDB guide, please refer to this link https://www.holistics.io/blog/build-reporting-analytics-mongodb-using-holistics-for-free/

MongoDB is an extremely popular datasource for developers across a variety of use cases. Companies typically experience faster development times, the ability to work with large data volumes and many different data types. Growth has been tremendous for Mongo, who recently had their IPO. However as great as it is for development, many Mongo users quickly encounter issues accessing data inside Mongo easily for reporting, especially when it comes to driving decisions at rapidly growing startups. We see many companies that know they need analytics for their Mongo data, but without a clear idea on how to get there:

Many Mongo users get stuck figuring out how to go from MongoDB data to analytics.

The place that they really need to get to is illustrated below:

MongoDB data is able to play nice with your other datasets inside a SQL reporting database.

This might seem complicated but the process is actually straight forward. Working with data from your various sources is best achieved by getting data out of Mongo, and into a relational database to build your analytics and reporting workflow.

MongoDB data is stored in a JSON format, which doesn’t work too well with data stored in more conventional formats such as CSV files and relational SQL databases. Remember that MongoDB was made to help developers work faster, and analytics was more of an afterthought. Luckily, there are solutions (like Holistics) that have tackled this problem.

If you need a database to get started, Postgres works really well and in our view is better for analytics than MySQL. This blog post also does a pretty good job addressing startup analytics.

There are a few options when selecting a provider for Postgres. Google Cloud is really easy to work with, and so is ElephantSQL. Many enterprises do well with leveraging EnterpriseDB’s fully managed database-as-a-service (DBaaS) on AWS.

That said if your team has a more roll-up-the-sleeves approach they could definitely select Amazon Web Services (AWS), Microsoft Azure, Digital Ocean or Alibaba cloud. Keep in mind this will require more configurations on your part!

Once you’ve got your Postgres relational database set up for reporting, you have a few options:

  1. Have your engineers build and maintain a connector to a relational database.
  2. Use a pre-built tool, from companies in this space like StitchData, Fivetran, Segment, or Holistics (You can check our page on how to run this in Holistics here https://www.holistics.io/features/mongodb/).

You can now connect a business intelligence tool to your reporting database, to build the charts and reports you need. Holistics lets you both move data from MongoDB into your database, and run SQL queries and reports off your database, so you can use a single tool instead of several.

As data volumes scale up you'll probably need to move to Google BigQuery, Snowflake or Amazon Redshift, but this is likely unnecessary right now (though if things go really well that might be necessary in less that 2 years!).

As an aside if you are reading this article it means you’re likely tracking everything in Mongo and not an application specific tool, as that tends to scale poorly, and forces you into propietary ETL scripts to make the data workable.

Additionally, reporting directly against the database that your business application is running on is a bad idea for a few reasons:

  1. Additional database load from reporting could make your main app run slowly
  2. You might not want to expose sensitive data for reporting purposes
  3. If someone writes a poorly written query that runs for hours, it could cause problems for the production workload of your database, severely affecting the database performance.

Reporting against Mongo ALSO isn't a good idea, as you can't leverage SQL to build out reports (SQL of course being the language of data!). There's a reason why SQL is still the go-to tried and tested solution after decades, and this becomes clear when you're trying to work with data from various sources, software services and departments.

Hope this helps, feel free to leave a comment or reach out to me directly at matt@holistics.io. if there’s anything else I can help answer or anything you think I missed!