In response to the fast-growing number of COVID-19 research projects, new and enhanced research data tools and data sets are being made available to support these endeavors. The goal is to support the rapid turnover of analytics to help researchers understand the signs and symptoms, comorbidities, and treatments of COVID-19. Researchers can leverage any combination of these tools to support COVID-19 research.
- COVID-19 Data Mart housed in the Data Enclave
- COVID-19 Detailed Data Files housed in the Data Enclave
- COVID-19 Summary Table
- COVID-19 External Data Sets
- 'NEW' DICOM Images for Chest X-Ray with eUnity Viewer
- RPDR (Research Patient Data Registry) COVID-19 patient data updated daily
- Biobank Portal COVID-19 defined patients with genomic information
COVID-19 Research support email: MGBCOVIDResearchRequest@partners.org
The Data Enclave allows the analysis of multiple types of healthcare data in a secure environment. It provides direct access to data tables as well as multiple analytic tools (RStudio, Jupyter Notebook, and SPARK Hadoop) for a complete one-stop analysis option, removing the need to pull data into outside repositories.
COVID-19 Data Mart
To help researchers understand the signs and symptoms, comorbidities, and treatments of COVID-19 the new data mart was put together with patients across Mass General Brigham. The patient cohort includes COVID-positive patients along with a control group of randomly selected COVID-negative population. Please see Research COVID Data Mart Dashboard for additional details. You may review data dictionary for complete list of metadata for the items included in the data mart. The data is available in SQL Server database tables and can be accessed directly using SQL Server Management Studio, Aqua Data Studio, or an i2b2 client.
How to Access
Before applying for access, please review the RISC Dashboard: COVID-19 Mart in Enclave and available Data Dictionaries, as well as the requirements for requesting access. When ready, please follow the link to Enter your access request. Once your access request is submitted, the turnaround for granting access is 1-7 business days. If you are a Project Staff Member, your Project Leader will be requested to approve your access.
If you have any questions, please contact MGBCOVIDResearchRequest@partners.org.
How to Use
To get started, view the basic video tutorials illustrating how to access the Enclave and use the COVID-19 Data Mart here .
The COVID-19 detailed data files are flat files that are derived from the COVID-19 Data Mart tables and provide a quick and clear delineation of specific data without complex querying of multiple tables or a large database. The data was denormalized and pre-processed to facilitate data input for machine learning tools and data science analysis. The COVID-19 detailed data files reside in a high-security Enclave platform.
Current categories of data in the files include Demographics, Diagnosis, Labs, Medications, Procedures, Vitals, Encounter, Providers, RPDR concept, and Patient MRN. These files are derived from the related tables in Analytics and Dimension schemas in the COVID-19 data mart.
Detailed Data Files
- The data is refreshed every Sunday and available in the flat file table format on the following Wednesday. For example, on Wednesday, December 9, 2020, the files will be updated with the latest information available as of Sunday, December 6, 2020.
- The files are tab-delimited, which is an easy to read format that can be opened with R, Python or SAS.
- A data dictionary describing each table’s data values can be found in the same location as the detailed data flat files.
- Review the available in the same directory ReadMe file for additional information.
How to Access
Access to the COVID-19 detailed data files is dependent on provisioned access to the COVID-19 data mart housed in the Enclave. Before requesting access to the COVID-19 data mart (and consequently the Detailed Data Files), please review the RISC Dashboard: COVID-19 Mart in Enclave, as well as the requirements for requesting access. When ready, please follow the link to Enter your access request. Once your access request is submitted, the turnaround for granting access is 1-7 business days. If you are a Project Staff Member, your Project Leader will be requested to approve your access.
The COVID-19 Summary Table contains discrete data elements for quick identification and analysis of the MGB COVID-19 positive patient cohort. These include patient demographics, EPIC Infection flags, COVID-19 PCR, and antibody laboratory tests, inpatient admission infromation and phenotype data.
Research opportunities for using this data include:
- Evaluating the trajectory of infection by analyzing the first identification of COVID-19 (by EPIC infection flag or PCR lab test) to the most recent negative antibody laboratory test.
- Determining clusters of infection based on demographic location, age, or gender.
- Calculating the mortality rate of the cohort.
- Understanding disparities based on race or ethnicity.
The COVID-19 Summary Table data originates from the MGB COVID-19 Data Mart found in the Enclave which contains vast amounts of patient data originating from the RPDR and EDW.
How to Access
Before applying for access, please review the RISC Dashboard: COVID-19 Summary Table, as well as the requirements for requesting access. When ready, please follow the link to Enter your access request. Once your access request is submitted, the turnaround for granting access is 1-2 business days. If you are a Project Staff Member, your Project Leader will be requested to approve your access.
If you have any questions, please contact MGBCOVIDResearchRequest@partners.org.
External data sets are stored in either the Data Enclave or Shared File Areas.
Available in COVID-19 Data Mart housed in Data Enclave
Pulmonary X-Ray Severity (PXS) Score developed by the Center for Clinical Data Science
The PXS data consists of a numerical PXS score for over 8,600 COVID-tested patients. It is accessed directly in the COVID-19 Data Mart within the Radiology schema. Current users will automatically see this data in the Radiology.PulmonarySeverity table starting the Aug 12th data refresh.
Available in Shared File Area (SFA)
MGH COVID Registry validated by MGH Dept of Medicine. The MGH COVID Registry is only available to MGH researchers at this time
The MGH COVID-19 Registry contains manually reviewed data of over 500 data elements for 1200+ COVID-19 positive patients seen at MGH. Access must be requested separately to view and download these SAS files.
New as of March 3, 2021
From March through August, Chest X-Rays from COVID tested patients at MGH were run through a deep learning algorithm to calculate a Pulmonary X-Ray Severity (PXS) Score.
The Chest X-Ray images (in DICOM format) are now available to all researchers to conduct their own analysis or review. Images can be easily viewed by looking up specific patients and accession numbers using the eUnity Viewer configured for access to them. The images can also be downloaded for additional analyses, including creation of other algorithms, following MGB guidelines for securing identifiable patient data.
Not only are the PA (posterioranterior) & AP (anteriorposterior) views available that were used in the PXS score generation (70,000 images), but additional images from the exam are also included. There are approximately 125,000 images available in total. The studies are from CXR from MGH & BWH; however, the PXS scores are only available for MGH at this time.
Acknowledgement: Access to the images and eUnity Viewer is provided in partnership with CCDS and EMI.
How to Access
Let us know you want access! Access is currently handled by individual request during this pilot phase. Access permits download of the DICOM files and the specally configured eUnity Viewer, an application from Mach7 Technologies.
To request this additional access, please email: MGBCOVIDResearchRequest@partners.org
Note: As a pre-requisite, a user must also have access to the COVID-19 Data Mart. If you do not yet have access to the COVID-19 Data mart, please follow this link to Enter your access request.
The RPDR Daily Query Tool includes the following COVID-19 laboratory tests updated DAILY. Patients tested for COVID-19 from the previous day can be identified and data requested.
What Data is Available?
In the Query Tool hierarchy of terms, here is what is available under Laboratory tests\COVID-19:
- SARS coronavirus RNA/PCR lab tests;
- COVID-19 test comments;
- COVID-19 lab order result statuses;
- COVID-19 IgG and IgM antibody tests; and
- COVID-19 specimen sources.
Also available in the hierarchy of terms under Infection Control Flags\COVID-19 Flags:
- CoV-Exposed (INFG:67) – manually added/resolved to a patient’s chart by infection control staff
- CoV-Positive (INFG:65) – manually added/resolved to a patient’s chart by infection control staff
- CoV-Presumed (INFG:69) – manually added/resolved to a patient’s chart by infection control staff, assigned when a patient has recovered from symptoms that match COVID-19, but currently have a negative test. Those patients may have been positive, but were not tested within the timeframe that they were presumed to have been positive.
- CoV-Risk (INFG:66) – automatically added via a silent BPA based on the ordering of an outpatient Normal status COVID-19 lab order or outpatient COVID referral order, or automatically based on inpatient ordering/outpatient future or standing ordering of a COVID-19 lab order, automatically based on inpatient ordering of a Risk for COVID-19 nursing order, manually from infection control staff if patient is at-risk
- CoV-Clearance (INFG:68) – has been retired as of 5/11/20, was assigned to asymptomatic patients at low-risk for COVID-19 admitted to the hospital
Researchers can query for COVID-19 patients and request identified data.
Researchers may also navigate to the Previous Queries tab, click "Shared Queries" and run the query for "[RPDR] COVID Notes Search" to obtain COVID-19 related information from patient notes.
Match Control Functionality
The RPDR Daily Query Tool now allows researchers to identify a control population based on an existing patient population. After researchers have built their query using the Daily Query Tool and obtained an aggregate patient count, they will go to the Match Control tab to obtain an aggregate count for their new control population. When 'Use exact matches only' is selected in the right side panel, it will apply an 'AND' logic where all the selected items in that section must be true in the control population. On the other hand, when 'Use exact matches only' is NOT selected, it will apply an 'OR' logic where at least one item that is selected must be true in the control population.
After researchers click 'Submit request to find set of Control patients,' they will see a new Matched set query appear under the Previous Queries tab.
Once the Matched set query is ready, researchers may select the Matched set query, go to the Request Detailed Data tab and click 'Using a query' to request identified patient information on their newly established control population.
How to Access the Tool:
Instructions on how to query for COVID-19 patients in the RPDR Daily query tool can be found in the COVID How To document.
The Biobank Portal is a web-based query tool that allows Partners investigators to query and download data about consented Biobank subjects. You can also make Biobank sample requests for plasma, serum, DNA, and genomics data directly from the portal.
What Data is Available
Investigators can define a COVID-19 cohort of Biobank consented patients, access pre-disease samples, gather genomic data on a subset of patients who have been genotyped and request identified patient information (structured and unstructured).
To find patients who have tested positive for COVID-19, navigate to the lab tests folder under Healthcare Data and then to the Infectious Disease folder, and then to COVID-19. Drag over the SARS coronavirus 2 RNA folder and select all values of ‘detected’, ‘positive’ or ‘presumed positive’. If you are interested in those who have genomic data, drag the ‘All people with genomic data’ folder into a second panel and run the query. A genomic data request can then be made by selecting the Make Request tab and completing the Request Genomics form.
How to Access the Tool