Statistical & Financial Consulting by Stanford PhD
SPEARMAN'S RHO

Spearman's Rho (Spearman's Rank Correlation Coefficient) is a measure of nonlinear dependence between two random variables. If random variables $X$ and $Y$ have joint distribution $H(x,y)$ and random vectors $(X_1,Y_1),$ $(X_2,Y_2)$ and $(X_3,Y_3)$ are independent realizations from that distribution, then Spearman's rho of $X$ and $Y$ equals

$\rho^S(X,Y) = 3 \bigl(\textrm{P}\bigl((X_1 - X_2)(Y_1 - Y_3) > 0\bigr) - \textrm{P}\bigl((X_1 - X_2)(Y_1 - Y_3) < 0\bigr)\bigr).\ \textbf{(1)}$

If $X$ and $Y$ have continuous marginal distributions then $\rho^S(X,Y)$ has the same units as Pearson's correlation. Just like Pearson's correlation it covers the whole range of [-1,1], but now -1 corresponds to a perfect negative relationship ($Y$ is any decreasing deterministic function of $X$) and 1 corresponds to a perfect positive relationship ($Y$ is any increasing deterministic function of $X$). When $X$ or $Y$ has a discrete mass, interval [-1,1] is not covered fully. For example, if variable $X$ takes a given value with positive probability p, then with probability of at least p2 there is a tie: $X_1 = X_2.$ And so $\rho^S(X,Y)$ falls into interval [-1 + p2, 1 - p2] no matter what the bivariate relationship is. There are several proposals on how to adjust for ties, the most obvious one being to divide formula (1) by

$1 - \textrm{P}\bigl((X_1 - X_2)(Y_1 - Y_3) = 0\bigr).$

Still, no single generalization has been widely accepted.

Note that definition (1) depends on ranks only. We only care if $X_1$ is bigger than $X_2,$ the actual values being irrelevant. So Spearman's rho is invariant to any monotonically increasing nonlinear transformations of $X$ and $Y.$ If we raise $X$ to the third power Spearman's rho will stay the same. This is very important. Spearman's rho is naturally built to capture the strength of highly nonlinear relationships, where traditional linear association measures fail. The following graph illustrates the fact.

Spearman's rho has direct relation to the copula function $C(u,v)$ generated by random variables $X$ and $Y\!\!:$
$\rho^S(X,Y) = 12 \int\int_{[0,1]^2} u v\ dC(u,v) - 3$

The copula function does not depend on marginal distributions and captures what happens to $X$ and $Y$ if they are transformed into random variables uniformly distributed on [0,1]. The formula above signals once again that Spearman's rho does not depend on marginal distributions of $X$ and $Y$ and is invariant to any monotonically increasing transformations of $X$ and $Y.$

When the joint distribution of $X$ and $Y$ is unknown Spearman's rho can be estimated from the data as the correlation of ranks. Let $(X_1,Y_1), ..., (X_n,Y_n)$ denote the data and let

$r_{xi} =$ rank of $X_i$ in $\{X_1, ..., X_n\},$

$r_{yi} =$ rank of $X_i$ in $\{Y_1, ..., Y_n\},$

$\bar{r_x} = \frac{\sum_{i=1,...,n} r_{xi}}{n},$

$\bar{r_y} = \frac{\sum_{i=1,...,n} r_{yi}}{n},$

The estimator of Spearman's rho is given by

$\hat{\rho^S} = \frac{\sum_{i=1,..,n} (r_{xi} - \bar{r_x}) (r_{yi} - \bar{r_y})}{\sqrt{\sum_{i=1,..,n} (r_{xi} - \bar{r_x})^2 \sum_{i=1,..,n} (r_{xy} - \bar{r_y})^2}}.$

Identical values are assigned the same fractional rank, which is equal to the average of their positions in the ascending order of the values. For that reason the estimator is suitable for both discrete and continuous distributions. As the sample size converges to infinity, the estimator converges to the true Spearman's rho and can be used to test if the true Spearman's rho equals 0.

The sample rank correlation coefficient is a nonparametric estimator in the sense that no assumptions are made about the joint distribution of $X$ and $Y.$ In particular, no functional form is postulated for the conditional expectation of $Y$ given $X$ and the conditional expectation of $X$ given $Y.$

SPEARMAN'S RHO REFERENCES

Nelsen, R. B. (2006). An Introduction to Copulas (2nd ed). New York: Springer.

Salvadori, G., De Michele, C., Kottegoda, N. T., & Rosso, R. (2007). Extremes in Nature: An Approach Using Copulas. Springer.

Corder, G.W., & Foreman, D.I. (2014). Nonparametric Statistics: A Step-by-Step Approach. Wiley, Hoboken, New Jersey.

Gibbons, J. D., & Chakraborti, S. (2003). Nonparametric Statistical Inference (4th ed). New York: Marcel Dekker.

Nešlehová, J. (2007). On Rank Correlation Measures for Non-continuous Random Variables. Journal of Multivariate Analysis, Vol. 98, Issue 3, pp. 544-567.

Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, Vol. 15, pp. 72–101.

BACK TO THE STATISTICAL ANALYSES DIRECTORY