Better Data, More Lives Saved


Daniel Gillis

School of Computer Science, University of Guelph

Associate Professor & Statistician

Kurtis Sobkowich

Department of Population Medicine, University of Guelph

PhD Student - Epidemiology

Theresa Bernardo

Department of Population Medicine, University of Guelph

IDEXX Chair in Emerging Technologies and Preventive Healthcare

Authors: Daniel Gillis, Kurtis Sobkowich, Theresa Bernardo

On March 11, 2020, the World Health Organization declared the COVID-19 outbreak a pandemic1. Governments around the world have been working to develop measures to prevent its spread and to identify and care for those infected. In Canada we have seen a concerted effort from all levels of government to work together to ensure the health and safety of Canadians.

In times of crisis, the quality and timeliness of the data used to inform decisions can mean the difference between life and death, yet the availability and accessibility of data are often limited. The COVID-19 pandemic has highlighted the need to improve Canada’s data collection, management, and sharing protocols.

In particular, the pandemic has identified significant issues that limit our ability to respond appropriately and to keep the general public informed. These issues include data that are incomplete, inaccessible, or inconsistent. Several of these issues have been exacerbated by backlogs and changes to how testing was being performed and evaluated across Canada, but even with this in mind, there are opportunities for improvement.

Early in the pandemic, provincial and federal governments provided daily updates on the number of confirmed Canadian cases of COVID-19, including the number of patients who recovered or died. However, depending on the source, one would either find new daily cases, or a cumulative summary. The data were presented in aggregate, with no information on age, gender, race, socioeconomic status, or other variables typically included in epidemiological studies. No information was provided about the status of the case (e.g. asymptomatic, self-quarantining, intensive care, intensive care with ventilator, deceased).

Data Limitations

Data limitations led scholars to scour media reports to extract information that might help frame our understanding of the disease2. Age and gender data, along with disease status have been extracted from these stories; a time-consuming process susceptible to errors. As the number of confirmed cases grew, it is easy to see how this process became intractable.

As reports from around the world provided new information about the disease (e.g. subpopulations at higher risk), it became clear that some vital data to curtail the spread of the disease and treat the infected were not being collected. At the time of drafting this document, there was at least one petition requesting that the Ontario Government consider adding racial data to the list of data being collected from patients3. Failing to collect these data puts already marginalized communities at greater risk because we can’t know how the virus is affecting them, nor how to best support them.

Lack of Contact & Tracing Data

Beyond these specific data needs, there is a lack of contact and tracing data available. This may be due to a lack of resources to collect these data, or there may be unknown issues preventing them from being shared and/or centrally stored. These data may also not be made public due to privacy issues, but researchers are accustomed to working with data of this nature. Methods for contract tracing without compromising privacy is the subject of active research.

As the pandemic has progressed, the type and format of data made available have also changed. Using the Wayback Machine4 we know that as of March 20th,’s dedicated page to the novel coronavirus5 listed summary data per province and territory along with an epidemiological report. It wasn’t until March 30th that some of these data were made available to download in CSV format. Still, much of the additional information introduced remains incomplete or inconsistently updated, thus not viable for meaningful analysis. Before this, those wishing to track the pandemic were reliant on manually transcribing the numbers, using website scraping tools, or using the Wayback Machine to generate a complete data set. Similarly, Ontario introduced in late March, but prior to this, data were presented first as aggregated summaries, then as a list of the daily new cases at While these data included columns of age, gender, likely source of transmission, current status, and other information, much was listed as pending.

These changes were clearly created to better serve Canadians. Going forward, the Canadian pandemic data plan needs to include standards for how data are to be presented, including specific naming conventions. Canada did develop a plan with data in mind, but it hasn’t been updated since 20156. It should include data that are easy to find and that can be accessed in common file formats (e.g. CSV). Lastly, we need to develop better data sharing processes between the various levels of government so that the data presented are consistent, complete, and timely; even more essential to gauge the progress as we gradually move away from reliance on physical distancing to control the pandemic.

We must learn from COVID-19 and improve our Canadian pandemic data plan to fully capitalize on the scientific expertise in this country. Canadian lives are at risk.