Sunday, January 1, 2012

Epidemiological Significance vs. Statistical Significance



In Fricker (2011a), the author asks whether statistical methods are useful for early event detection and his suggestion is that he really does not know yet. Why so? First of all, because of the sequential nature of early detection, such fundamental concepts as significance level, power, specificity, and sensitivity cannot be used directly, without nontrivial modification. They are useful only for a fixed sample (Fricker (2011b). Secondly, biosurveillance data are usually autocorrelated, and even if such autocorrelation can be removed via modeling, the signaling statistics for early detection methods that use historical data in a moving baseline, are still strongly autocorrelated. As a result, again it is difficult to interpret specificity and sensitivity. Our approach to early detection is fundamentally different from the conventional ones. The mainstream approaches are based on removing autocorrelation from time series of daily counts by using some ad hoc regression methods, and then applying Statistical Process Control (SPC) charts to the residuals in regressions. Note that SPC charts were originally designed to work with uncorrelated data. Actually, the mainstream biosurveillance community considers autocorrelation as a nuisance. On the contrary, in our   approach autocorrelation is a major player: our only key parameter is the first-order autocorrelation coefficient, which  is related in a very simple way to the major epidemiological parameters, such as infection and recovery rates, and basic reproduction ratio R0 (see [3] and also our previous post “Epidemiological Surveillance: How It Works”).
.          Since statistical inference methods for AR(1), including  parameter estimating, confidence Interval constructing and hypothesis testing are well-developed and easily available, it would seem that they could successfully applied to the problem of early detection and early situational awareness, but it is not the case.

Note also that in mainstream approaches, early detection and situational awareness are to some extent disconnected from each other, they are considered absolutely separate problems. And even if we have detected outbreak, for situational awareness we have to start from scratch since we have no information about further development of the outbreak. In our approach, we estimate only one parameter, the first-order autoregression coefficient in AR(1) approximation of SIR model, and we are able not only to decide whether the outbreak has already started, but also to get preliminary estimates of what we need for effective response and consequence management. 

           To the criticism expressed in [1], [2] and [4] regarding usefulness of such fundamental statistical concepts as statistical significance, p-value, sensitivity, specificity, etc for early detection, we can add some skepticism of our own. It is shown in [3] that both confidence intervals and hypothesis testing at 0.05 or 0.10 significance level are impractical for early detection purposes if we work with a typical sample size (baseline) of 7 – 14 days. For example, a hypothetical influenza epidemic as strong as Spanish flu cannot be detected in 7 – 14 days at 0.05 or 0.10 significance level. It is not a surprise because statistical significance depends mostly on the sample size: in very large samples, even very small effects will be significant, whereas in very small samples very large effects still cannot be considered significant. See for instance data borrowed from Table 13 in a classical book of statistical tables [5] with some linear interpolation
     
Critical Values of Correlation Coefficient r
 for Rejecting the Null Hypothesis (r= 0)
at the .05 Level Given Sample Size n
                        ______________________________________________
                               n                                                                   r
                        ______________________________________________
               
                               5                                                                 0.878.
                               7                                                                 0.755 (interpolated)
                             10                                                                 0.632
                             15                                                                 0.538 (interpolated)
                             20                                                                 0.444
                              50                                                                 0.276               
                             .……………………………………………………….
                      10,000                                                                0.0196 

           According to a rule of thumb (see [6]), r = 0.5 is considered a large effect, but  still it cannot be distinguished from null hypothesis r = 0.0 with sample size n = 15 at significance level of 0.05 since critical level is 0.538. At the same time, a negligible correlation r = 0.02 is statistically significant with n = 10,000.

Thus, the early detection goal cannot be achieved with such a small sample size as 7 – 14 days at any acceptable significance level. Instead, we propose to use the concept of practical, epidemiological, significance. Actually, what really matters is estimating the magnitude of effects, not testing whether they are zero. In our case, the effect is assessed by the parameter R0, the basic reproductive ratio for the SIR model, and related to R0 the first-order autoregression coefficient in AR(1) approximation of the SIR model. In [3] it has been proposed  the following early detection-combined-early situational awareness strategy:

(1)   Every day we estimate the first-order autoregression coefficient based on the moving baseline (from 7-day to 14-day);
(2)   With a very simple relationship between the autoregression coefficient and R0, we actually estimate R0 (below we use the same notation for the parameter R0 and its estimate);
(3)   Then we compare the latter estimate with the known critical values for seasonal influenza (1.5 ≤ R0 ≤ 3.0) and for Spanish Flu pandemic (3.0 ≤ R0 ≤ 4.0);
(4)   Even R0 ≈ 1 is worth of some field investigations;
If R0 ≥ 1.5 then it is epidemiologically reasonable to report our findings as a significant risk of the epidemic;
If R0 ≥ 3.0 then it is epidemiologically reasonable to report a severe risk.     
(5)   Knowledge of R0 provides us with preliminary estimates of the number of
      infected at the epidemic peak and the total number of infected over the
      course of the outbreak.

Our critical levels (thresholds) have a very clear epidemiological meaning as opposed to rather arbitrary thresholds in the mainstream biosurveillance.

References  

[1] Fricker, R. D. (2011a). Some methodological issues in biosurveillance. Statistics in Medicine, [full text]
[2] Fricker, R. D. (2011b). Rejoinder: Some methodological issues in biosurveillance. Statistics in Medicine, [full text]  
[3] Shtatland, E. and Shtatland, T. (2011). Statistical approach to biosurveillance in crisis: what is next. NESUG Proceedings, [full text]                                                                   
 [4] Shmueli, G. and Burkom, H. S. (2010). Statistical challenges facing early outbreak detection in biosurveillance. Technometrics, 52(1), pp. 39-51.                                                                                                             
[5] Pearson, E. S. and Hartley, H. O. (Eds.). (1962).  Biometrika tables for statisticians (2nd ed.).  Cambridge, MA: Cambridge University Press.                       
 [6] Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.

No comments:

Post a Comment