First, it is important for you to get a high-level picture of an analytics setup. In this chapter, we talk about the most common setup for an analytics system. In certain places, you will see others doing things differently, but if you look closely, all analytics systems boil down to the same approach.
Let’s get started.
Most basic analytics setups will be broken into three steps:
Before you can analyze your organization’s data, data from multiple sources must be pulled into a central analytics database. This database is usually called a data warehouse, and the process by which this transfer happens is commonly called ETL (Extract Transform Load).
Chapter 2 of the book will go into more detail about this step.
Since we’re talking about a big picture view in this chapter, there are only two key components you need to understand.
The ETL Process
As mentioned earlier, ETL stands for ‘Extract, Transform, and Load’. This refers to the fact that you extract data from sources, transform that data into a queryable form for analysis, and then load that data into your data warehouse.
In Chapter 2, we will explore:
- How do you setup an ETL process? What ETL technology should you choose? - Why and how the industry is moving from ETL to a process called ELT (Extract, Load, Transform). - How is ELT different from ETL and how do they both compare? As ELT is now a more prominent approach than ETL, we will speak more about the ELT approach in this book.
The central analytics database, or “data warehouse”
This is the place where most of your analytics activities will take place. In this book we’ll talk about:
- Why do you need a data warehouse? - How do you build one? - What data warehouse technologies should you choose?
So what do you get at the end of this step?
Once you have these two pieces set up, the next step is to turn raw data into meaningful data for analytics.
In this guide, we assume that you extract data from your data sources and load that data to your warehouse as a first step. The second step is about transforming that data. The process of transforming the data into a form that is useful for business analysis is commonly called ‘data modeling’.
This step is necessary because raw data is often not ready to be used for reporting. Raw data will often contain extraneous information — e.g. duplicated records, test records, or metadata that is only meaningful to the production system — which is bad for analytics.
Therefore, you usually need to apply a “processing step” to such data. You’ll have to clean, transform and shape the data to match the logic necessary for your business’s reporting.
This step usually involves two kinds of operations:
Chapter 3 goes into more detail about these two operations, and compares a modern approach (which we prefer) to a more traditional approach that was developed in the 90s.
Beginner readers take note: usually, this is where you’ll find most of the fun — and complexity! — of doing data analytics.
At the end of this step, you’ll have a clean set of data that’s ready for analysis and reporting to end users.
Now that your data is properly transformed for analytics, it’s time to make use of the data to help grow your business. This is where people hook up a “reporting/visualization tool” to your data warehouse, and begin making those sweet, sweet charts.
Chapter 4 will focus on this aspect of data analytics.
Most people think of this step as just being about dashboarding and visualization, but it involves quite a bit more than that. In this book we’ll touch on a few applications of using data:
Since this step involves the use of a BI/visualization tool, we will also discuss: