Q-Q Plots — A view from statistics perspective

Thiruthuvaraj Rajasekhar
2 min readMar 9, 2022

--

If we encountered any Regression problem, we would have heard about Q-Q plots at least once. In this article, I want to answer few questions that we generally encounter?

  1. What is QQ — plot?
  2. Why do we use QQ-plot?
  3. How are QQ-Plots created?
  4. How to interpret QQ-Plots?
  5. QQ — Plots, otherwise called as Quantile-Quantile plots are used to assess if a set of data come from theoretical distribution like Normal distributions. It takes theoretical quantiles on the x-axis and data on y-axis.
  6. To answer the 2nd question, we have to recollect the assumptions of linear regression, where one of the assumption is, residuals of the datasets should form a normal distribution with mean 0 and constant variance.
    So, we check if the residuals are coming from normal distribution.
  7. QQ-Plots can be demonstrated with an example below:

In the dataset above, residual is the column for which we are checking the normality. Hence, we first sort the values in ascending order.
Then Compute the percentile rank of the values. These percentile ranks are then converted to z-scores as shown above(theoretical quantiles).

Now, a scatter plot is plotted with theoretical quantiles vs residual.

4. If the data is coming from normal distribution, then we can see a 45 degree line between x and y. If there is skewness between the theoretical and actual data, then we can see few point away from 45 degree line. On these assumptions we can use the QQ-Plots in regression.

Hope you liked the article!!!!. Support by a thumps up or a clap!!!! 😊

--

--

Thiruthuvaraj Rajasekhar

Mining Data For Insights | Passionate to write | Data Scientist