## Warming or Walking? – Stochastic Processes and Temperature Trends

Of all of the statistics that are cited to support the notion of “global warming,” the one that bothers me the most is the statistic claiming that \(n\) of the last \(m\) years have been the hottest years on record. For example, it’s not uncommon to hear that “10 of the last 12 years” are the warmest years in a temperature record that goes all the way back to 1880. This is often used as “irrefutable evidence” that mankind is driving up the Earth’s temperature and destroying the planet.

It is understandable that an activist would try to exploit this statistic. Most obviously, it emphasizes that recent global temperatures have been relatively high (where “high” corresponds to an increase of less than one degree Celsius over a 100-year period). The real purpose of repeating this factoid, however, is that it confuses and charms the numerically unsophisticated, leading them to assume that such a concentration of unprecedented, elevated temperatures in recent times is highly unlikely—unless some underlying cause is responsible.

This is quite misleading, however. In fact, it is not difficult to demonstrate that a relatively simple statistical model can account for this result, without requiring any bias toward warming.

The basic mistake that most people make when hearing about the recent number of warmest years is that they assume that, unless there is some underlying trend causing the warming, all years in the record are the same, and each is equally as likely to be one of the warmest. To have 10 out of 12 to be at the top of a list of 134 temperatures intuitively seems to be extremely unlikely, and indeed it is. Under these (incorrect) assumptions, the probability of getting 10 of the hottest years in the last 12 years is given by the hypergeometric distribution:\[ P(X=10)=\frac{{12\choose 10}{132\choose 2}}{{134\choose 12}} \]where the parentheses denote the binomial coefficient\[ {n\choose k}\equiv\frac{n!}{k!(n-k)!} \]The probability of 10 or more hottest years is 1 in 86 billion—very unlikely indeed!

The problem, however, is that each year’s temperature is not an independent random variable. This is clear from looking at the temperature record. For example, consider the annual global land and ocean surface temperature anomalies, which have been obtained from NOAA’s National Climatic Data Center. This temperature record is shown below. The points give the yearly global average temperature. The smooth curve is the five-year running average.

Although the temperatures in the record bounce up and down from year to year, each year’s temperature clearly depends on the temperature of the years immediately before it. That is, this time series is *autocorrelated*. In fact, the series clearly resembles a random walk.

This is even more clear when considering the change in temperature anomaly from year to year, which is shown below.

Unlike the temperature record itself, the differences show no obvious trend. They appear to be distributed as (mutually independent) white noise. A Q-Q plot, shown below, appears to indicate that these differences are normally distributed, which is confirmed by a Shapiro-Wilk test for normality (\(p=0.43\)).

Thus, the differences constitute a normally distributed set with a sample mean of \(0.0058\) and sample variance of \(0.0096\). The slightly positive mean corresponds to the upward trend that is observed in the temperature record. Over the entire record, it results in a \(0.0058\) °C/year average increase in temperature. Nevertheless, it should be kept in mind that this does not imply that the mean of the underlying distribution is this value or even that it is positive. The estimate for the population mean, including the standard error, is \(\bar x\pm s/\sqrt{n}\), where \(\bar x\) is the sample mean, \(s\) is the sample standard devation, and \(n\) is the size of the sample. In this case, the estimate for the mean of the underlying distribution is \(0.0058\pm 0.0085\), a range that includes zero at the 1-\(\sigma\) level. Thus, we cannot reasonably conclude from these data that there is a positive bias in the random walk process describing this temperature record.

So let’s consider the possibility that the temperature record is a result of a random walk process with no bias. The statistical model that describes this process is\[ T_n = T_{n-1} + \epsilon^{(n)} \]where \(T_n\) is the temperature anomaly of the \(n\)-th year in the series, and \(\epsilon^{(1)},\ldots,\epsilon^{(n)}\) is a series of normally distributed random variables with \(\epsilon\sim\mathcal N(0,\sigma^2)\). Although this model can be studied analytically, it lends itself very well to Monte Carlo simulation. For the results discussed here, I use a model in which the variance of the random term is \(\sigma^2 = 0.0096\), the sample variance of the NOAA temperature anomaly data.

When a model of red noise—the random walk described above—is used to generate a simulated 134-year temperature record, 10 or more of the last 12 years in the record end up being the “hottest” years on record about 7% of the time. Thus, while it is still unlikely with this model to have such a high concentration of warm years in recent times, it is far more likely (by nine orders of magnitude) than what the intuitive, but naive, assumption of completely independent yearly temperatures indicates.

It is common, when applying the techniques of statistical inference, to judge a set of empirical data by calculating the probability that such a set of data could appear purely by chance under the assumption that no real relationship exists. The conventional point at which something is considered, not necessarily true, but merely “significant” and worthy of further investigation, is if the probability of chance producing the observed result is less than 5% (i.e., a one-in-twenty chance). By these standards, the claim that 10 of last 12 years are the hottest on record doesn’t qualify as statistically interesting.

A more reasonable stochastic model, however, is one that combines red noise and white noise. In other words, it is a random walk with some additional “error” added to the final result. The statistical model can be described as\[ T_n = a r_n + b \epsilon_{\text w}^{(n)} \]where \(\epsilon_{\text w}\sim\mathcal N(0,\sigma^2)\) is the white-noise random variable, \(r_n\) is the red-noise term,\[ r_n = r_{n-1} + \epsilon_{\text r}^{(n)} \]with \(\epsilon_{\text r}\sim\mathcal N(0,\sigma^2)\) as the red-noise random variable, and \(a\) and \(b\) are the coefficients that determine the relative importance of each term. With this model, the annual change in temperature is\[ \Delta T_n = T_n – T_{n-1} = a \epsilon_{\text r}^{(n)} + b\bigl[\epsilon_{\text w}^{(n)} + \epsilon_{\text w}^{(n-1)}\bigr] \] Since this is a sum of three normally distributed random variables, \(\Delta T_n\sim\mathcal N\bigl(0,(a^2+2b^2)\sigma^2\bigr)\), and so, for the variance of these temperature differences to be \(\sigma^2\), the coefficients \(a\) and \(b\) must satisfy the following relation:\[ a^2 + 2b^2 = 1 \]

Numerical experiments indicate that the combination of red and white noise that is most likely to produce a shape that is similar to NOAA’s temperature anomaly record has a value of \(a\) that is between \(0.2\) and \(0.4\). For example, not very many tries were required to produce the following series (shown in red), which was generated with \(a=0.4\):

This series of randomly generated points is rather similar to the temperature series, which is shown in black.

## Conclusion

Above, I have demonstrated that claims such as 10 out of the last 12 years are the warmest on record are not very impressive. Such a situation has a reasonable probability of resulting from a simple, unbiased random walk. Therefore, although such claims serve as a reminder that recent years have been warmer than the slightly less recent past, they indicate almost nothing about the recent *trends* in global temperatures or what is to be expected in the future. An unbiased random walk is equally as lightly to trend down as it is to trend up, regardless of what it has done over the past 134 steps.

Naturally, it is quite possible that some steady upward trend does exist in the temperature record that biases the random walk to higher temperatures. This possibility would be consistent with the observed temperature record, making such a result more likely in the statistical models. Nevertheless, it is not surprising, since it is well known that temperatures in the modern era have been steadily trending upward from a minimum that occurred sometime in the seventeenth century—an era commonly referred to as the “little ice age.” (Whether this phenomenon was regional or global is a matter of debate.) In fact, there could be several naturally occurring cyclic trends that could be affecting the record, but there is no way to tell definitively from the series itself.

One think is certain, however. A dozen or so warm years in recent memory is very weak evidence of a deterministic warming trend. Without additional supporting evidence to say otherwise, such results could simply be the luck of the draw.

(Note: The R code used to generate the results discussed above is available for the reader’s amusement and edification.)