This post is more technical than previous ones: it contains equations. Those who don’t like equations can skip the formulas and read the text only. In  we have used a Susceptible-Infected-Recovered (SIR) model mentioned in our post “Fusion of Biosurveillance with Epidemiological Modeling” (December 11, 2011). Here we discuss the SIR model in more details. Its graphical description is simple:
Mathematically, the SIR model is described by the following first-order nonlinear system of difference equations:
S(n+1) = S(n) – (β/N)S(n)I(n),
I(n+1) = I(n) + (β/N)S(n)I(n) - δI(n), (1)
R(n+1) = R(n) + δI(n),where S(n), I(n) and R(n) represent the numbers of susceptible, infected and recovered individuals correspondingly on day n; N is the total population (assumed constant); β is the infection transmission rate and δ is the average rate of recovery from infection. Note that d = 1/δ is the mean duration of infectivity (in days). Both rates β and δ are supposed to be constant. Unfortunately, variables S(n) and R(n) are not observed or measured systematically in the biosurveillance real-time context. Only I(n) can be estimated, though indirectly, through the assumption that the overall number of infected on each day can be approximated by the sum of the number of patient visits to a hospital emergency department or a clinic or a physician office during the past d days (d is an infectivity period in days.) Therefore, we will work only with the second equation in (1). At the very beginning of an emerging disease or a new pandemic, which is the most interesting moment for early detection and early situational awareness, we can assume that there is no immunity in the population to the new disease, i.e., S(0) ≈ N. Hence, at the early phase of the epidemic, the second equation in (1) is reduced to the closed linear equation which contains only number of infected
I(n+1) ≈ I(n) + (β – δ)I(n), (2)
Equation (2) is approximate (instead of = we have ≈), and taking into account errors of approximations we arrive at the stochastic equation
I(n+1)= I(n)(1 + (β – δ)) + w(n), (3)
where w(n) is an error or noise term (in more details see ). Formula (3) is the well-known equation of a first-order autoregression process AR(1). Equations (2) and (3) describe exponential growth if β – δ > 0 and exponential decay if β – δ < 0. Thus, the difference β – δ is a threshold parameter alternative to R0 mentioned in our previous post “Fusion of Biosurveillance with Epidemiological Modeling” (December 11, 2011). Here R0 can be expressed as R0 =β/δ. Obviously, the threshold parameters R0 and β - δ are equivalent:
(R0 > 1) <=> (β – δ) > 0 <=> Epidemic
(R0 < 1) <=> (β – δ) < 0 <=> Non-Epidemic
In the early detection context, the advantage of the threshold parameter β – δ over R0 is obvious: we have a linear parameter in linear equations (2) and (3) as opposed to nonlinear parameter R0 =β/δ in a nonlinear setting of system (1). Our approach to computing R0 is as follows: first, estimating the initial growth rate β – δ through autoregression model (3) and then evaluating parameter R0 through formula R0 = 1 + 7(β – δ) (let us remind that in case of influenza we assume that the mean duration of infectivity d equal to 7 days). Thus, by making statistical inferences about the only parameter β – δ in AR(1) model (3) we actually make inference about R0. As a result, we are able to decide whether or not the epidemic has started (early detection task). If the answer is “yes” , we can get preliminary estimates of the number of infected at the epidemic peak, the total number of infected over the course of the outbreak, the critical vaccination threshold, etc. needed to develop measures for timely response and consequence management (early situational awareness task). See  and previous post “Fusion of Biosurveillance with Epidemiological Modeling” (December 11, 2011). Note that statistical inference methods for AR(1), including parameter estimating, confidence Intervals constructing and hypothesis testing are well-developed and easily available (see ). Though, the situation is not as rosy as it looks. What exactly can be used from this statistical toolkit will be discussed in our next post.
And final remark, our approach could be called “Epidemiological Surveillance” (which is used in the title of the post). Really, on one hand, the approach is based on syndromic surveillance data (numbers of visits); on the other hand, the approach uses epidemiological models and their approximations for analysis and decision making. The problem is that this term (epidemiological surveillance) has been already used with the meaning of biosurveillance related to human health only (animals excluded!) (see ). We believe that our understanding of the term “epidemiological surveillance” is more to the point.
 Shtatland, E. and Shtatland, T. (2011). Statistical approach to biosurveillance in crisis: what is next. NESUG Proceedings. [full text]
 Fricker, R. D. (2011a). Some methodological issues in biosurveillance. Statistics in Medicine. [full text]