How To Choose the Right Charts for Your Data

Starting a new project but confused at the endless choices of charts, graphs, and tools?

Where do you start?

How many variables should you plot?

How do you even choose which variables to plot?

As a data analyst, choosing the proper visualization for your data can sometimes be daunting. Several tools and charts exist which you could choose from. And yet, despite this abundance of choice, you are suffering from analysis paralysis.

Don’t worry, we’re here to help!

In this blog, we'll help you out by providing a guide to help you choose the right chart for your data. Not only that, we’ll also provide you with analytics tools so you can start your project with confidence.

Let’s begin!


Goals for Visualizing the Data

First things first — we need to ask the right questions about the data.

You cannot visualize your data unless you ask it the right questions. What are your goals for this project? What are you looking for in your data?

To get you started, we recommend the following goals to look for in your data:

  • To compare values
  • To show proportion
  • To describe relationship
  • To see the distribution
  • To show change over time

For each goal, we recommend the best chart for analysis, so let’s get into it!


To Compare Values

Comparing values is probably the most common goal in data analysis. You might be interested in finding out how each variable stacks against one other. Or you might also want to see the distribution of the variable or how it evolves over time.

To compare values, we recommend you use the following charts:

1. Bar chart - Using a bar chart makes it easy for your audience to compare the values of your data. From the length of the vertical or horizontal bars, one can easily compare which of the two or more variables is greater or lesser than the other.

Here you can see how a simple bar chart can provide information at a glance, by easily noticing that the highest variable is D and the lowest is G.

Bar chart

2. Line chart - A line chart should be used comparing continuous data, such as variables that are measured through time. Using a line chart lets you compare the data to the time it was measured. This gives you a better picture of the values compared to using bar charts.

The line chart below shows the movement of the continuous variable value through time. The ease of comparing the values in each period is similar to using bar charts, with the added sense of continuity.

Line chart

3. Histogram - A histogram provides a good way to compare the distribution of two or more variables. You can compare the variables not only according to the height of the distribution but also its spread across the mean.

As an example, notice the left-skewed distribution portrayed in the histogram below.

We can see the count of Values to be highest at 6.5. However, the mean of the distribution is lower than this due to its left-skewness. Thus the histogram provides more insight than a single value comparison using bar charts.

Histogram

To Show Proportion

In some cases, you might want to compare the proportion of one variable against the other. Instead of comparing the absolute values, we are looking for the fraction of one variable against the others.

For this goal, we suggest using these charts:
1. Stacked bar chart - Stacked bar charts allow us to compare categories in a variable. Each category or grouping is stacked on top of the other to form the bar chart for a specific variable.

From the chart below, stacking the total amount of different types against its location provides good information about the proportions. Roughly, we notice that each location provides equal amounts of total. The significant variable contributing to the total is thus the type variable.

Stacked bar chart

2. Treemap - Compared with stacked bar charts, treemaps are easier to interpret when comparing proportions of variables. Each grouping is visualized as a rectangle, as a part of a larger rectangle area covering the entire variable.

As an example, look at how we can effortlessly observe the areas of each section below, in proportion to each other. We easily see that A is roughly half of the dataset. Interpreting areas using rectangles of the treemap is thus straightforward.


To Describe Relationship

An important goal to consider in an analytics project is finding the relationship between variables. However before investigating the relationship between variables, it is useful to create visualizations that highlight possible connections between the variables.

1. Scatter plots - If we want to highlight the relationship between two variables, whether an actual relationship exists or not, we can use scatter plots. Data from the first variable is plotted on the x-axis and the y-axis for the second variable. This simultaneous plotting between the two variables enables us to see if there is a strong relationship or correlation between the points.

The scatter plot example below shows the relationship between a and b variables. Easily, we see the negative exponential correlation between the two variables – the higher the value of b, the lower the value of a. As an added information, we can also color the scatter points according to a third variable type.

Scatter plot

2. Correlation plots - If we want to see the relationship between multiple variables, then we should use a correlation plot instead. Correlation plots compute the correlation coefficient between pairs of variables in the dataset. The coefficients are then color-coded according to the value to give the correlation plot.

We use the correlation plot to see if there are significant relationships between variables. From the correlation plot below, notice the negative correlation of variable B with variables C and D.

Correlation plot

To See the Distribution

Numerical data is often aggregated to a single value, usually the average or mean value. However, we lose a lot of information if we are only looking at the aggregated values. Thus it is important to look at the distribution of the data to obtain more insights.

We suggest two ways to visualize the distribution of data:

1. Box plots - Using box plots give a lot of information about the distribution of the data. The plot shows five key statistics for each variable in one diagram: minimum, maximum, lower quartile, median, and upper quartile. Boxplot is also good at visualizing data that are considered outliers since we can use scatter plots for points outside the maximum and minimum ranges.

The plot below shows a boxplot of variable counts according to groups variable. We easily notice that the median of counts variable decreases with groups. The outliers among the higher-valued groups are also easily visualized as scatter points through the box plot.

Box plot

2. Density plots - density plots show the distribution of a continuous variable, in contrast to the histogram, which plots the distribution of a discrete variable. You can also consider it as a smooth version of the histogram.

As an example, let us look at the density plot below. Between the two measurement types, B looks more normally distributed compared to the other variable A. Clearly, we can see more information from distribution plots of continuous variables compared to plots that use aggregated values.


Summary

We summarize the appropriate chart according to the analysis goals in the table below.

Analysis Goal

Chart

Comparison

Bar, Line, Histogram

Proportion

Stacked bar, Treemap

Relationship

Scatter plot, Correlation plot

Distribution

Box plot. Density plot 


Organizing Your Data

By the way, choosing the right visualization for your data assumes that you already have a solid structure for your data.

Gathering your data from many sources, creating data models that show the relationships between the variables, and building a centralized data warehouse is essential for any data project.

As data analysts, we should also have a grasp on the underlying structures of data to create better analysis and visualization suited to the type of data.

If you need help in organizing your data, check out the following resources to get you started:


Final Tips on Creating Visualizations

After choosing the appropriate visualization for your data, be sure to also choose the right design and color schemes for your charts.

Aesthetics in presenting your visualization can make or break the information that you want to present.

If you want a more in-depth guide on creating better charts, check out the classic texts below:

Choosing the best chart for your data can sometimes be frustrating. There are several charts that you can choose from, but only a few can meet your goals — given you asked the right questions.

However if you follow the tips above, you should be on your way to creating a compelling data visualization story!