The Azimuth Project
Experiments with varieties of link strength for El Niño prediction

Idea

Ludescher et al use a link strength defined in Experiments in El Niño analysis and prediction. The aim of this page is to explore related notions, and especially to find more comprehensible definitions with similar bahaviour.

Peculiarities of the Ludescher definition

If independent gaussian noise is added to the temperatures for each day and for each grid point, the signal strength as calculated by Ludescher et al increases. Roughly speaking, adding noise with an standard deviation of 0.05C adds about 0.05 to the average link strength. This is very counter-intuitive behaviour for a “link strength” which is supposed to measure correlation.

The following graphs are all based on simulated data. There are two time series of length 565, called “signal 1” and “signal 2” in the graphs, which consist of quadratics q 1q_1 and q 2q_2 plus independent gaussian noise. The noise has the same amplitude (standard deviation) in all cases, but q 1q_1 and q 2q_2 are multiplied by 1000 (leftmost column), 9 (second column), 3 (third column) and 1 (fourth column).

Examples of the signals themselves are shown in the top two rows, the value of c i,j (t)(τ) c^{(t)}_{i,j}(\tau) is in the third row, and the fourth row shows an estimated density of the link strength derived from 100 replicates (different samplings of noise).

In the first column, the q 1q_1 and q 2q_2 overwhelm the guassian noise, so you can see their shapes. In particular, note that have positive correlation for all delays: it varies between about 0.87 and 0.97. The other three columns are intended to be more realistic signals which roughly resemble climate data. One would expect that as the multiplier for q 1q_1 and q 2q_2 decreases, the link strength would also decrease, but the opposite is the case.

Simulated link strengths

The code is below.

signal1 <- function(period) {
  x <- period / length(period)
  x + (x-.5)^2
}


signal2 <- function(period) {
  x <- period / length(period)
  x - (x-.5)^2
}


period <- 1:566
tau.max <- 200
tau.range <- -tau.max :tau.max 
cperiod <- 365

make.Csamples <- function(nreps, scale) {
  LSs <- rep(0, nreps)
  C <- rep(0, length(tau.range))
  for (r in 1:nreps) {
    t1 <- scale * signal1(period) + rnorm(length(period))
    t2 <- scale * signal2(period) + rnorm(length(period))
    for (tau in tau.range) {
      if (tau <= 0) {
        x <- t1[1:cperiod]
        y <- t2[(-tau+1):(-tau+cperiod)]
      } else {
        x <- t1[(tau+1):(tau+cperiod)]
        y <- t2[1:cperiod]
      }
      C[tau.max+1+tau] <- abs(cor(x,y))  
    }
    LSs[r] <- (max(C)-mean(C))/ sd(C)  
  }
  qauntiles <- quantile(LSs, probs=c(.05,.25,.5,.75,.95))
  list(C=C, LSs=LSs, t1=t1, t2=t2, qauntiles = round(qauntiles, digits=2))  
}


op <- par(mfcol=c(4,4), mar=c(4,5,1,1))
for (s in 1:4) {
  scaling <- c(1000,9,3,1)[s]
  dsmps <- make.Csamples(100,scaling)
  maintxt <- paste0("signal scaled by ", scaling)
  
  plot(period, dsmps$t1, type='l', ylab="signal 1", xlab="days")
  plot(period, dsmps$t2, type='l', ylab="signal 2", xlab="days")
  plot(tau.range, dsmps$C, type='l', ylab="C(tau)", xlab="tau")
  
  dens <- density(dsmps$LSs)
  plot(dens$x, dens$y, type='l', xlab="signal strength", ylab="density")
}
par(op)

An alternative

My basic idea is to define some statistics f(i,j,d;σ)f(i, j, d; \sigma) where ii and jj are grid points, dd is time (in days) and σ\sigma is a time scale (also in days) which measures the agreement of the air temperatures between ii and jj on a time scale of σ\sigma. (“Agreement” and “on a time scale of σ\sigma” are to be made precise later.) As far as I understand the behaviour of Ludescher et al’s link strength S(i,j,d)S(i,j,d), it responds positively to correlations on a small time scale, that is, if two grid points have similar fluctuations from day to day, possibly with a delay, then the signal goes up. And it responds negatively to correlations on a large time scale, so that if the temperatures of two points do dimilar things over several months, the signal goes down. So, loosely, I expect

S(i,j,d)f(i,j,d;small)f(i,j,d;big) S(i,j,d) \approx f(i, j, d; small) - f(i, j, d; big)

A recipe

Like Ludescher et al, the value of f(i,j,d;σ)f(i, j, d; \sigma) will be based on the 365+200 days leading up to dd.

  1. Seasonally adjust the temperatures by subtracting a climatalogical average for each day and grid point. Call these A(i,d)A(i,d).

  2. For any day xx, define Y(i,x,σ)Y(i, x,\sigma) to be the vector of numbers (A(i,x364),,A(i,x))( A(i,x-364), \dots, A(i,x) ), smoothed by a gaussian with scale constant σ\sigma using reflection to pad the ends. (Other options are available.)

  3. U(i,x;σ)U(i, x; \sigma) is the standard deviation of Y(i,x,σ)Y(i, x,\sigma). V(i;σ)V(i; \sigma) is a climatalogical average of U(i,x;σ)U(i, x; \sigma) over many xx values. It is intended to capture the spatial variation of the standard deviation over 365 days. It will tend be higher over land than sea, for example. It used to normalise values, and this is the only normalisation in this recipe.

  4. For pairs of points ii in the basin and jj outside the basin, and τ\tau in {0,1,2,,200}\{0,1,2, \dots, 200\} define

    W(i,j,d,τ;σ)=cov(Y(i,dτ,σ),Y(j,d,σ))V(i;σ)V(j;σ) W(i, j, d, \tau; \sigma) = \frac{cov(Y(i, d-\tau,\sigma), \; Y(j, d,\sigma))}{V(i; \sigma)V(j; \sigma)}

    and for τ\tau in {0,1,2,,200}\{0,-1,-2, \dots, -200\}, define

    W(i,j,d,τ;σ)=W(j,i,d,τ;σ) W(i, j, d, -\tau; \sigma) = W(j, i, d, \tau; \sigma)
  5. Find the maximum f(i,j,d;σ)f(i, j, d; \sigma) over τ\tau in some window (w,,1,0,1,w)(-w, \dots, -1,0,1, \dots w) of W(i,j,d,τ;σ)W(i, j, d, \tau; \sigma). ww should probably depend on σ\sigma and on the distance from ii to jj and is intended to be a plausible range over which two grid points could be physically connected.

category: experiments