Correlation Test for One Discrete and One Continuous Variable

Spearman's correlation in statistics is a nonparametric alternative to Pearson's correlation. Use Spearman's correlation for data that follow curvilinear, monotonic relationships and for ordinal data. Statisticians also refer to Spearman's rank order correlation coefficient as Spearman's ρ (rho).

In this post, I'll cover what all that means so you know when and why you should use Spearman's correlation instead of the more common Pearson's correlation.

To learn more about correlation in general, and Pearson's correlation in particular, read my post about Interpreting Correlation Coefficients.

Throughout this post, I graph the data. Graphing is crucial for understanding the type of relationship between variables. Seeing how variables are related helps you choose the correct analysis!

Related post: Nonparametric versus Parametric Analyses

Choosing Between Spearman's and Pearson's Correlation

Let's start by determining when you should use Pearson's correlation, which is the more common form. Pearson's is an excellent choice when you have continuous data for a pair of variables and the relationship follows a straight line. If your data do not meet both of those requirements, it's time to find a different correlation measure!

The data in the graph have a correlation of 0.8. Pearson's correlation is valid for these data because the relationship follows a straight line.

Consider Spearman's rank order correlation when you have pairs of continuous variables and the relationships between them don't follow a straight line, or you have pairs of ordinal data. I'll examine those two conditions below.

Why Pearson's correlation is not Valid for Curvilinear Relationships

The graph below shows why Pearson's correlation for curvilinear relationships is not valid.

On the graph, the data points are the red line (actually lots and lots of data points and not actually a line!). And, the green line is the linear fit. You don't usually think of Pearson's correlation as modeling the data, but it uses a linear fit. Consequently, the green line illustrates how Pearson's correlation models these data. Clearly, the model doesn't fit the data adequately. There are systematic (i.e., non-random departures) between the red data points and green model fit. Right there, you know that Pearson's correlation is invalid for these data.

The Pearson's correlation is about 0.92, which is pretty high. However, the graph emphasizes how it does not capture the whole relationship. The real strength of the relationship is even higher. Later in this post, we'll work through a similar example using scientific data.

Determining when to use Spearman's Correlation

Spearman's correlation is appropriate for more types of relationships, but it too has requirements your data must satisfy to be a valid. Specifically, Spearman's correlation requires your data to be continuous data that follow a monotonic relationship or ordinal data.

When you have continuous data that do not follow a line, you must determine whether they exhibit a monotonic relationship. In a monotonic relationship, as one variable increases, the other variable tends to either increase or decrease, but not necessarily in a straight line. This aspect of Spearman's correlation allows you to fit curvilinear relationships. However, there must be a tendency to change in a particular direction, as illustrated in the graphs below.

Spearman's rho is an excellent choice when you have ordinal data because Pearson's is not appropriate. Ordinal data have at least three categories and the categories have a natural order. For example, first, second, and third in a race are ordinal data.

For example, imagine the same contestants participate in two spelling competitions. Suppose you have the finishing ranks for all contestants in both matches and want to calculate the correlation between contests. Spearman's rank order correlation is appropriate for these data.

Spearman's rho is also a great way to report correlations between Likert scale items!

How to Calculate Spearman's Rho

Spearman's correlation is simply the Pearson's correlation of the rankings of the raw data. If your data are already ordinal, you don't need to change anything. However, if your data are continuous, you'll need to convert the continuous data into ranks. Of course, many statistical software packages will do that preprocessing for you and simply calculate the answer!

The example dataset below shows data ranks for two continuous variables. The data are ranked such that a value of 1 indicates the highest, 2 the second highest, and so on.

Dataset that show ranks for continuous data to calculate Spearman's correlation.

To determine Spearman's correlation, simply calculate the Pearson's correlation for the two rank order columns instead of the raw data. We'll analyze these data later in the post!

Interpreting Spearman's Correlation Coefficient

Spearman's correlation coefficients range from -1 to +1. The sign of the coefficient indicates whether it is a positive or negative monotonic relationship. A positive correlation means that as one variable increases, the other variable also tends to increase. A negative correlation signifies that as one variable increases, the other tends to decrease. Values close to -1 or +1 represent stronger relationships than values closer to zero.

Comparing Spearman's and Pearson's Coefficients

If the Pearson's coefficient is a perfect -1 or +1, Spearman's correlation coefficient will be the same perfect value unless there are repeating data values.

When there is no tendency for two variables to change in tandem, both Spearman's and Pearson's will be close to zero, indicating no relationship.

This scatterplot displays a correlation of 0 where there is no relationship between the variables.

If there is a curvilinear but non-monotonic relationship, both Spearman's and Pearson's correlation will be close to zero.

Graph of non-monotonic relationship.

However, when you have two variables with a curvilinear, monotonic relationship, you'll find that Spearman's correlation indicates a stronger relationship (rho has a higher absolute value) than Pearson's. In those cases, the curvilinear nature "confuses" Pearson's, and it underestimates the relationship's strength. The upcoming example illustrates this aspect in action.

Example of a positive monotonic relationship

Spearman's Correlations for Likert Items and Other Ordinal Data

Statisticians report correlations of ordinal data, such as ranks and Likert scale items, using Spearman's rho. Strongly positive Spearman's correlations indicate that high ranks of one variable tend to coincide with high ranks of the other variable. Negative correlations signify that high ranks of one variable frequently occur with low ranks of the other variable.

For Likert items using the Strongly Agree to Strongly Disagree scale, Spearman's correlations mean the following:

  • Strongly positive coefficients: Strongly Agree values tend to occur together.
  • Strongly negative coefficients: Strongly Agree for one item is apt to coincide with Strongly Disagree on the other item.
  • Near zero coefficients: The value of one Likert item does not predict the other Likert item's value. There is no relationship between them.

Related post: Analyzing Likert Scale Data

Example of Spearman's Rank Order Correlation for a Monotonic Relationship

The graph below displays the relationship between density and electron mobility. The relationship is nonlinear. In fact, I fit a nonlinear regression model to these data. However, instead of fitting a regression model, let's calculate the correlation between these two variables. These data are a good candidate for Spearman's correlation because they follow a nonlinear relationship that is monotonic. As Density increases, electron mobility also increases., but not in a linear fashion.

Scatterplot displays a positive, monotonic relationship between electron data for the Spearman's correlation.

These data are freely available from the NIST and pertain to the relationship between density and electron mobility. Download the Excel data file to try it yourself: ElectronCorrelations.

I've done the calculations in Excel so you can see how they compare. Excel's Data Analysis ToolPak performs Spearman's correlation. It doesn't explicitly calculate Spearman's correlation. However, by using Excel's rank function to rank both variables, I can then use Pearson's correlation on those ranks to derive Spearman's rho.

First, I'll calculate Pearson's correlation.

Pearson's correlation output for the electron data.

The correlation is a very strong ~+0.96. Despite being nonlinear, Pearson's indicates it is a strongly positive relationship. However, despite being a high correlation, we know that it underestimates the strength because it can't model nonlinear relationships.

Now, let's calculate Spearman's rho. In the Excel spreadsheet, I used the rank function to convert the raw scores for both variables to ranks. Then, I calculated the correlation for the pair of ranked values to produce Spearman's rho.

Related post: Using Excel to Calculate Correlation

Spearman's correlation output for the electron data.

For the electron mobility data, Spearman's rho is a near perfect correlation of +0.99. It's nearly perfect because these data represent a physical process and the lab collected extremely precise measurements.

Spearman's correlation is a great addition to your statistical toolbox! It allows you to calculate correlations for data where Pearson's is invalid.

cunninghamfacen1987.blogspot.com

Source: https://statisticsbyjim.com/basics/spearmans-correlation/

0 Response to "Correlation Test for One Discrete and One Continuous Variable"

Publicar un comentario

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel