PandemicScientists Model “True Prevalence” of COVID-19 Throughout Pandemic

Published 27 July 2021

Scientists have developed a statistical framework that incorporates key COVID-19 data — such as case counts and deaths due to COVID-19 — to model the true prevalence of this disease in the United States and individual states. Their approach projects that in the U.S. as many as 60 percent of COVID-19 cases went undetected as of 7 March 2021, the last date for which the dataset they employed is available.

Government officials and policymakers have tried to use numbers to grasp COVID-19’s impact. Figures like the number of hospitalizations or deaths reflect part of this burden. Each datapoint tells only part of the story. But no one figure describes the true pervasiveness of the novel coronavirus by revealing the number of people actually infected at a given time — an important figure to help scientists understand if herd immunity can be reached, even with vaccinations.

Now, two University of Washington scientists have developed a statistical framework that incorporates key COVID-19 data — such as case counts and deaths due to COVID-19 — to model the true prevalence of this disease in the United States and individual states. Their approach, published the week of July 26 in the Proceedings of the National Academy of Sciences, projects that in the U.S. as many as 60 percent of COVID-19 cases went undetected as of 7 March 2021, the last date for which the dataset they employed is available.

This framework could help officials determine the true burden of disease in their region — both diagnosed and undiagnosed — and direct resources accordingly, said the researchers.

“There are all sorts of different data sources we can draw on to understand the COVID-19 pandemic — the number of hospitalizations in a state, or the number of tests that come back positive. But each source of data has its own flaws that would give a biased picture of what’s really going on,” said senior author Adrian Raftery, a UW professor of sociology and of statistics. “What we wanted to do is to develop a framework that corrects the flaws in multiple data sources and draws on their strengths to give us an idea of COVID-19’s prevalence in a region, a state or the country as a whole.”

Data sources can be biased in different ways. For example, one widely cited COVID-19 statistic is the proportion of test results in a region or state that come back positive. But since access to tests, and a willingness to be tested, vary by location, that figure alone cannot provide a clear picture of COVID-19’s prevalence, said Raftery.