fb

Introduction

Data is more accessible than ever, and is an integral part of every new business. However, when starting out in the data field, it is easy to get caught up in the hype, pursue the latest buzzwords and get lost before delivering real value to your business.

This book intends to provide you with a high-level view of the whole data analytics pipeline. You can think of it as an index of important topics that you should know, and where to get more resources if you want to dig deeper in a particular problem.

After finishing this book, you will walk away with a just-about-enough level of understanding for each of the topic, and see how they link together in the big picture.

This book is also written in the style of a reference manual for the whole analytics pipeline, so feel free to jump to any chapter that is most relevant to you, instead of reading it chronologically.

First, who are you and why should I trust you?

We are Holistics. We’ve been making data analytics tools for over five years, and helped more than a hundred companies build their business intelligence capabilities, sometimes from scratch.

A huge chunk of our time is spent educating and assisting companies as they migrate to this new world of cloud-based business intelligence tools. For the first time ever, we’re putting that experience up for the world to read.

Who is this book written for?

This book is written for people who want to know what goes into a full data stack. If you’re feeling lost due to the sheer amount of terminology and the wide range of software in data analytics, this may be the book for you.

The book will be especially beneficial to those who have some technical knowledge, but who don’t particularly understand the entire data landscape. In other words, you may be:

  • A junior data analyst with knowledge of SQL. However, you do not yet have a full picture of your company’s data stack. You find it hard to talk to data engineers when you need them to help with a particular pipeline problem.
  • A software engineer working in product who is assigned to set up a data stack from scratch. You think you know what tools you need, but you’re not sure if the stack you’ve chosen is the best for your company, going forward. This book will give you a lay-of-the-land overview of the entire data analytics world, so you’ll be able to pick the right components for your company.

So read on.

Who is this guidebook NOT for?

This guide is not written for non-technical business practitioners.

If you are a CEO or a project manager or a business team leader initiating a data analytics project for your company, it is best that you have a technical team member to help you go through the content presented in this book.

This guidebook is also not written for experienced data engineers who manage large-scale analytics system and want deeper knowledge about one particular problem. That said, you might still find some parts of the book useful, especially the part on Data Modeling and Transformation.

What you will get out of this book

In this book, you will learn

  • Basic concepts that you will most likely encounter when working in data field
  • The main components of an analytics setup, and how these components fit together
  • Which components are relevant to your current business

What you won’t get from this book

And here’s what you won’t get from reading this book:

  • In-depth knowledge of how to implement certain components (for example, how to write and maintain a robust ETL/ELT pipeline, or how to implement a recommendations system)
  • Domain-specific data knowledge (what is the best modeling practice for eCommerce data)
  • Language-specific knowledge (how to optimize queries, how to use different Python visualization packages …)
  • Data analysis techniques (how do you identify causal relationships, what different ways are there to verify a hypothesis?)

Let’s start

Are you ready to read the book? If so, let’s begin. Head over to High-level: What a typical analytics setup looks like and let’s start digging in.