Uncertainty intervals

Fayette Klaassen

2021/09/07

Covidestim presents estimates for several measures of the COVID-19 epidemic. Uncertainty intervals around point estimates indicate how certain these estimates are. The statistical model we use is not able to produce uncertainty intervals for all variables within the set time constraint for some—but not all—states. For these states, we are still able to provide point estimates of the outcomes of interest. Recently, we released a method that allows us to present uncertainty intervals around the point estimates for all states.

Why are not all intervals available?

To generate state-level results, we have been using a Bayesian sampling method, which produces a large set of epidemic trajectories that are consistent with the available data. We use this set of trajectories to produce a best estimate (the median), and an uncertainty interval (the 2.5th and 97.5th percentiles) for each outcome on any given date. To produce up-to-date daily estimates on our website, we set a time limit for the sampling procedure. Currently, this time limit is set to 10 hours. For some states, however, the sampling method is not able to consistently identify a best-fitting range for the model parameters within that time limit. For those states, we produce point estimates using an optimization algorithm, which identifies the best-fitting parameters but does not produce uncertainty intervals.

A new method for estimating uncertainty intervals

We recently developed and introduced a method that uses the uncertainty intervals of the states that did sample to predict the intervals for the states that did not sample. A spline regression is estimated using the 2.5th and 97.5th percentiles for those states that did sample. This creates a prediction equation for the ratio of the upper and lower intervals compared to the point estimate. This equation is used to compute intervals for the states run using the optimizer.

The regression equations for the ratio between the point estimate and the lower and upper bound of the uncertainty interval, respectively, are:

$$\frac{\tilde{y}}{y^{\text{2.5}}} = exp(a_{2.5} + \mathbf{b}_{2.5}*ns(\textrm{time}))$$

$$\frac{\tilde{y}}{y^{\text{97.5}}} = exp(a_{97.5} + \mathbf{b}_{97.5}*ns(\text{time}))$$

From fitting the regression models to the states that sampled, a prediction for $\frac{\tilde{y}}{y^{\text{(.)}}}$ is obtained. For the states that used the optimizer, the optimized estimates for variable of interest $y$ is plugged in to obtain the upper and lower uncertainty bounds. Note that a separate regression model is fit for each variable (e.g., $R_t$, number of infections).

Illustration

The figure below shows point estimates and uncertainty intervals for the number of infections and for $R_t$ in Alaska, as estimated on August 26, 2021. The sampled uncertainty intervals are presented in blue. A prediction model was built using the uncertainty intervals for Alaska and the 26 other states that sampled on August 26. The uncertainty interval computed using this prediction equation is shown in red. As this example illustrates, both approaches to uncertainty produce similar results.

Sampled and computed uncertainty intervals for number of infections and R_t Sampled and computed uncertainty intervals for number of infections and R_t

Figure: Sampled and computed uncertainty intervals for the number of infections and $R_t$ in Alaska
Prediction model based on the uncertainty intervals for the 27 states that sampled on August 26.

We have sampled uncertainty estimates until February 2021 available for all states from a run without time constraint for our latest manuscript. Using this data, we built a prediction model using the sampled uncertainty intervals of all states, and another prediction model using a subset of 27 states that generally finish estimation with the sampler. The predicted uncertainty intervals were similar and did not indicate a systematic difference between states that did finish with the sampler and states that did not.