In the previous chapter, we spoke about a typical analytics setup in an organization.
After reading it, you might wonder: three big steps (load, transform, visualize) and investment in a data-warehouse seems overkill to me at the start, can I start with a simpler setup?
Yes you can.
In fact, if you are in a very early stage and fits certain criteria below, you can probably skips some steps.
In short, your initial analytics setup can be very simple: just hook a BI tool to the production database of your application.
When you interact with dashboards in your BI tool, the data will be queried live from your application database. This way, in theory, you will get the data in real-time.
The simple setup above, of course, is not all sunshine and roses.
The highest risk you will face with the above setup is performance. Besides your normal production workload, your database now will also take on an additional analytics workloads, and may degrade your users’ experience.
The easiest solution here is to set up a replication database (replica for short) of your production database.
Check with your dev team to see if they have a replica that you can connect to. There’s a good chance that your dev team has already set something up.
Then, instead of connecting to production, point your BI tool to your replica instead.
This way, your production database will not be burdened by your analytical workload, while the data you receive is still relatively fresh (depending, of course, on how your dev team configures the replication interval)
If you run on NoSQL database (for instance, MongoDB or Cassandra), the above setup will not work for two reasons:
Limited choice of reporting tool.
Since SQL has become the defacto standard for analytics, most BI tools are designed to work with SQL, so it limits the choice of BI tools you may pick.
Also, there is no standardized analytics interface across different NoSQL databases, so you will end up looking for a specialized solution that is designed specifically for your brand of NoSQL database.
Limited analytics capability.
Most NoSQL databases do not have strong support for analytics, both in terms of processing engine and querying interface. Try and write an aggregation query on a big MongoDB collection, and you will quickly understand why we say this. The query will be difficult to write, and the wait for results would be horrific.
You might even get a surprise visit from your dev team for hogging up their production database.
At this stage, the recommended approach is again getting an SQL data warehouse, and then loading data from your NoSQL app into this data warehouse. This moves us towards the direction that we laid out in What does a typical analytics setup look like?.