Statistics: The Big Picture

Sushil Singh
5 min readJun 10, 2020

Solving the statistics jigsaw puzzle

This post contains my notes from the Stanford Online Statistics & Probability course and the associated comments:

Unfortunately, this course has been removed from Stanford Online, they moved to an e-learning platform EdX for hosting their free courses.

Since the course has been removed, I just thought to put some of my notes in the form of an article. So, others can read and enjoy the statistics subjects.

Enough talk on course story. Let’s solve the statistics jigsaw with four pieces on our table, Population, Exploratory Data Analysis (EDA), Probability, and Inference.

So, let’s start our STAT’s journey with a famous quote:

“Most people use statistics like a drunk man uses a lamppost; more for support than illumination” ― Andrew Lang

Cheat Sheet: It has all the important concepts of statistics in one diagram
Source: Stanford Online Learning

The Big Picture

In a nutshell, what statistics is all about is converting data into useful information. Statistics is, therefore, a process in which we

  • Collect data
  • Summarize data
  • Interpret data

The process of statistics starts when we identify what group we want to study or learn something about. We call this group the population. Note that the word populations here do not refer only to people; it is used in the broader statistical sense to refer not only to people, but also to animals, objects, and so on. For example, we might be interested in

  • The number of people recovered from COVID-19 in a certain country.
  • How the population of mice reacts to a certain chemical.
  • The opinions of the population of INDIAN adults about the death penalty.

Population, then, is the entire group that is the target of our interest

Showing our entire group of our interest so called `Population`
Source: Stanford Online

In most cases, the population is so larger that, as much as we want to, there is absolutely no way we can study all of it (imagine trying to get opinions of all INDIANS about my blog). A more practical approach would be to examine and collect data only from a subgroup of the population, which we call a sample.

We call this first step, which involves choosing a sample and collecting data from it, producing data.

We took a quality sample of data from a population (entire target)
Source: Stanford

It should be noted that since, for practical reasons, we need to compromise and examine only a sub-group of the population rather than the whole population, we should make an effort to choose a sample in such a way that it will represent the population well.

For example, if we choose a sample from populations of INDIAN adults, and ask their opinions about the death penalty, we don’t want our sample to consist only from a particular state or region.

Once the data has been collected, what we have is a long list of answers to questions, or numbers, and in order to explore and make sense of the data, we need to summarize that list in a meaningful way. This second step, which consists of summarizing the collected data, is called Exploratory Data Analysis.

Source: Stanford Online

Now, we’ve obtained the sample results and summarized them, but we are not done. Remember that our goal is to study the population, so what we want is to be able to draw conclusions about the population based on the sample results. Before we can do so, we need to look at how the sample we’re using may differ from the populations as a whole, so that we can factor that into our analysis. To examine this difference, we use Probability.

In essence, the probability is the “machinery” that allows us to draw conclusions about the population based on the data collected about the sample.

Source: Stanford Online

Finally, we can use what we’ve discovered about our samples to draw conclusions about our population. We call this step the process Inference.

And this is the Big Picture of Statistics.

Hopefully, now you know how to solve the statistics jigsaw.

For understanding the puzzle perfectly, let me give you an example.

Example:

At the end of May 2020, a poll was conducted (by ABC News and the Economic Times) for the purpose of learning the opinions of INDIAN adults about lifting the lockdown.

  1. Producing Data: A (representative) sample of 2,000 INDIAN adults was chosen, and each adult was asked whether he or she favoured or opposed lockdown lifting.
  2. Exploratory Data Analysis (EDA): The collected data were summarized, and it was found that 70% of the sampled adults favour the lockdown, not lifting it.
  3. and 4. Probability and Inference: Based on the sample result (of 70% favouring the lockdown) and our knowledge of probability, it was concluded (with 95% confidence) that the percentage of those who favour the lockdown in the population is within 3% of what was obtained in the sample (i.e., between 67% and 73%).

The following figure summarizes the example:

EndNote:

The four-step process that encompasses statistics:

  1. Producing Data — Choosing a sample from the population of interest and collecting data.
  2. Exploratory Data Analysis (EDA) — Summarizing the data we’ve collected.

3. and 4. Probability and Inference — Drawing conclusions about the entire population based on the data collected from the sample.

--

--

Sushil Singh

I have no special talent. I am only passionately curious.