When students learn that the least-squares line (regression line) drawn through standardized versions of two variables has a slope equal to the correlation between those variables, they sometimes wonder how this slope can be the same no matter which variable is presented as the predictor (plotted on the x-axis) and which is presented as the outcome (plotted on the y-axis). After all, the unstandardized slope depends on which variable is the predictor and which is the outcome, so why would correlation (standardized slope) be the same regardless? The images on this page attempt to make this concept more intuitive.

The first image shows four plots of x and y, two variables with fifty observations each. Each plot includes the least-squares regression line drawn through the points. In the first plot, each point falls exactly on the line, one unit above or below, or two units above or below. The slope of the line is .5. The correlation is .713. The second plot shows what would happen if x and y were transposed. The least-squares line is not completely transposed with them, because the residuals are defined by how far above or below the line each point falls, not by how far to the left or right each point falls. The slope in the second plot is 1.015, but the correlation is still .713. The third plot is the same as the first, but with both variables standardized. (In other words, each variable was Z-transformed by taking the raw scores, subtracting their mean, and dividing by their standard deviation.) The points got squished together more horizontally than they did vertically, resulting in a slope of .713, exactly the same as the correlation. The fourth plot is the same as the second, but with both variables standardized. The slope is once again .713.

The next image shows four similar plots. In this case, x and y only have 40 observations each, and they are clustered more closely around the regression line. Because of the way the y-values are defined, the slope in the first plot is still exactly .5. The slope in the second plot is now 1.245 - still not quite the inverse of .5. The correlation, as shown in the third and fourth plots, is .789.

In the next image, x and y only have 30 observations, and all the observations are on the regression line or one unit above or below it. The slope in the first plot is still exactly .5. The slope in the second plot is now 1.511. The correlation is .869.

In the next image, x and y only have 20 observations. The slope in the first plot is still exactly .5. The slope in the second plot is now 1.784. The correlation is .944 - close to perfect, but not quite there.

In the next image, x and y have 10 observations, and they all fall exactly on a line. The slope in the first plot is exactly .5, and the slope in the second plot is exactly 2 (the inverse of .5), because there is no longer any error (the residuals are all exactly 0). The correlation, of course, is now exactly 1.