This post is more technical than previous ones: it contains equations. Those who don’t like equations can skip the formulas and read the text only. In [1] we have used a Susceptible-Infected-Recovered (SIR) model mentioned in our post “Fusion of Biosurveillance with Epidemiological Modeling” (December 11, 2011). Here we discuss the SIR model in more details. Its graphical description is simple:
Mathematically, the SIR model is described by the following first-order nonlinear system of difference equations:
S(n+1) = S(n) – (β/N)S(n)I(n),
I(n+1) = I(n) + (β/N)S(n)I(n) - δI(n), (1)
R(n+1) = R(n) + δI(n),
where S(n), I(n) and R(n)
represent the numbers of susceptible, infected and recovered individuals
correspondingly on day n; N is the
total population (assumed constant);
β is the infection transmission rate and δ is the average rate
of recovery from infection. Note that d
= 1/δ is the mean duration of infectivity
(in days). Both rates β and δ are supposed to be constant. Unfortunately,
variables S(n) and R(n) are not observed or measured
systematically in the biosurveillance real-time context. Only I(n) can
be estimated, though indirectly,
through the assumption that the overall number of infected on each day can be
approximated by the sum of the number of patient visits to a hospital emergency
department or a clinic or a physician office during the past d days (d is an infectivity period in days.) Therefore, we will work only with the second equation in (1). At the
very beginning of an emerging disease or a new pandemic, which is the most
interesting moment for early detection and early situational awareness, we can
assume that there is no immunity in the population to the new disease, i.e., S(0) ≈ N. Hence, at the early phase of the epidemic, the second equation
in (1) is reduced to the closed linear
equation which contains only number of infected I(n+1) ≈ I(n) + (β – δ)I(n), (2)
Equation (2) is approximate (instead of = we have ≈), and taking into account errors of approximations we arrive at the stochastic equation
I(n+1)= I(n)(1 + (β – δ)) + w(n), (3)
where w(n) is an error or noise term
(in more details see [1]). Formula (3) is the well-known equation of a first-order
autoregression process AR(1). Equations
(2) and (3) describe exponential growth if β – δ > 0 and exponential decay if β – δ
< 0. Thus, the difference β –
δ is a threshold parameter
alternative to R0
mentioned in our previous post “Fusion of Biosurveillance with Epidemiological
Modeling” (December 11, 2011). Here R0 can be expressed as R0 =β/δ.
Obviously, the threshold parameters R0 and β - δ are equivalent:
(R0 > 1) <=> (β – δ)
> 0 <=> Epidemic
(R0 < 1) <=>
(β – δ) < 0 <=> Non-Epidemic
In
the early detection context, the advantage of the threshold parameter β – δ over
R0 is obvious: we have a linear parameter in
linear equations (2) and (3) as opposed to nonlinear parameter R0 =β/δ in
a nonlinear setting of system (1). Our approach to computing R0 is as follows: first, estimating
the initial growth rate β – δ
through autoregression model (3) and then
evaluating parameter R0 through formula R0 = 1 + 7(β – δ) (let us remind that in case of influenza we
assume that the mean duration of
infectivity d equal
to
7 days). Thus, by making statistical inferences
about the only parameter β – δ in
AR(1) model (3) we actually make inference about R0. As a result, we are
able to decide whether or not the epidemic has started (early detection task). If
the answer is “yes” , we can get
preliminary estimates of the number of infected at the epidemic peak,
the total number of infected over the course of the outbreak, the critical
vaccination threshold, etc. needed to
develop measures for timely response and consequence management (early
situational awareness task). See [1] and previous post “Fusion of Biosurveillance with Epidemiological Modeling” (December 11, 2011). Note that
statistical inference methods for AR(1), including parameter estimating, confidence Intervals constructing and
hypothesis testing
are well-developed and
easily available (see [1]). Though, the situation is not as rosy as it looks. What
exactly can be used from this statistical toolkit will be discussed in our next
post.
And final remark, our
approach could be called “Epidemiological Surveillance” (which is used in the
title of the post). Really, on one hand, the approach is based on syndromic surveillance data (numbers of
visits); on the other hand, the approach uses epidemiological models and their approximations for analysis and
decision making. The problem is that this term (epidemiological surveillance)
has been already used with the meaning of biosurveillance related to human
health only (animals excluded!) (see [2]). We believe that our understanding of
the term “epidemiological surveillance” is more to the point.
References
[1] Shtatland, E. and
Shtatland, T. (2011). Statistical approach to biosurveillance in crisis: what
is next. NESUG Proceedings. [full text]
[2] Fricker, R. D.
(2011a). Some methodological issues in biosurveillance. Statistics in Medicine. [full
text]