Legal and Regulatory Framework Assessment
Through assessing legal and regulatory documents, relevant legislation was extracted to identify areas where both public health surveillance and data sharing regulations were embedded into existing legal frameworks.
Surveillance System Survey
Data were exported from KoboToolbox in the form of CSV files and analyzed using Stata software version 14.0 (College Station, TX: StataCorp LLC). The frequencies and percentages were calculated for categorical variables (questions with response categories). For continuous variables (questions with numerical responses), averages and standard deviations were calculated. Since different groups of participants completed different sets of questions the denominators were specific to the number of respondents for a given section.
The audio of the qualitative survey question responses was transcribed and exported into a Word document. Quirkos software was used for coding and analysis using a grounded theory approach. The qualitative data analysis team comprised two coders who coded an initial survey question as part of the open-coding process and to develop the initial code book that evolved into a list of hierarchical axial codes. These codes were used to identify categories and concepts that emerged from text. Those categories and concepts were then linked into substantive themes that emerged in the data. Once themes were developed, quotes from the surveys were identified to illuminate each theme. Quotes were not edited, but identifiers were removed. In quotes where a person was named or their identity could be inferred, the identifying data were removed and replaced with [ ], for example [MINSA] or [person].
Surveillance Data Assessment
Following co-development of a data analysis plan by the INSIGHT project team and CDC-Peru, CDC-Peru data analysts conducted the analyses on the non-open access databases, for non-COVID-19 conditions using R-Studio. Indicator estimates were stratified by year, region, and disease. Frequencies and proportions were calculated for categorical variables. As timeliness indicators were continuous (numeric) variables, means, medians, and interquartile ranges were calculated. In addition, trend graphs of the indicators were constructed.
During data cleaning for COVID-19 databases, we encountered missing data and inconsistencies that required cleaning. For data integration, we identified common identifiers by creating entity-relationship graphs. Merging of data was performed to integrate datasets such as hospitalization, positives, and triage, which was particularly important for evaluating timeliness indicators.
We also introduced additional dimensions that enabled our analysis to compare between datasets. These were aggregated by time and location, allowing us to compare indicators from different datasets across various temporal and spatial scopes.
Interoperability Assessment
Audio from survey responses were transcribed for questions related to informatics and data flow. Informatics experts triangulated data from survey responses with existing regulations, technical briefs, and guidelines to develop a data flow map using Draw.io (Figure 1). Additional data was extracted from interviews to describe systems.
Resources for Data Analysis
- Analysis of Survey Responses
- Secondary Analysis of Seven Pathogens of Interest
- Secondary Analysis of COVID-19