In response to the fast-growing number of COVID-19 research projects, new and enhanced research data tools and data sets are being made available to support these endeavors. The goal is to support the rapid turnover of analytics to help researchers understand the signs and symptoms, comorbidities, and treatments of COVID-19. Researchers can leverage any combination of these tools to support COVID-19 research.
- 'NEW' COVID-19 Data Mart housed in the Data Enclave
- COVID-19 Summary Table
- 'NEW' COVID-19 External Data Sets
- RPDR (Research Patient Data Registry) COVID-19 patient data updated daily
- Biobank Portal COVID-19 defined patients with genomic information
COVID-19 Research support email: MGBCOVIDResearchRequest@partners.org
The Data Enclave allows the analysis of multiple types of healthcare data in a secure environment. It provides direct access to data tables as well as multiple analytic tools (RStudio, Jupyter Notebook, and SPARK Hadoop) for a complete one-stop analysis option, removing the need to pull data into outside repositories. The Data Enclave enables data science teams to develop algorithms and leverage different types of data coming from diverse data sources, to transform patient care and medical research.
COVID-19 Data Mart Patient Cohort
The COVID-19 patient cohort in the Data Mart is defined as a patient with at least one of the following:
1. Any positive or negative PCR test
2. Any infection flag that indicates COVID-19 infection: COVID-19 Positive (INFG:65), COVID-19 Presumed (INFG:69)
3. Any ICD-10 Diagnosis of U07.1
What Data is Available
The Data Enclave contains a COVID-19 data mart comprised of structured and unstructured patient data from the research sources:
- RPDR (Research Patient Data Registry). See the COVID-19 Data Mart User Guide for more information.
- EDW (Enterprise Data Warehouse). See the COVID-19 EDW Data Mart Document for more information.
- Pulmonary X-Ray Severity (PXS) Score external data set. See the Pulmonary X-Ray Severity (PXS) Score site for more information.
This data is refreshed every Sunday and available in the mart on the following Wednesday. The last 4 versions of the COVID-19 data mart will be stored in the Enclave and older versions will move to storage each week. The mart can be accessed directly using SQL Server Management Studio, Aqua Data Studio, or an i2b2 client.
How to Access the Tool
Before applying for access, you will need to be prepared with the following information:
- Your User Details
- If you’re a Contractor (POI), make sure to have completed your PHS HIPAA Training. Gather your MGB supervising manager's name, ID and email.
- When submitting your access request, ensure this information is correctly reflected in the Manager details section on the form.
- Know Your Project Details
- Currently, only research projects can be approved for access to COVID19 mart. You will need to provide the associated IRB Protocol ID. Make sure your IRB covers COVID-19 Research.
- Access can be granted for a specific time until your project ends. Your access expires on the earliest of two dates - Access Requested Until Date or IRB Protocol Expiration Date.
- You will need to provide the Project Name, Project Description, and Project Leader Name.
- If you plan to share data outside of the Partners network, prepare to provide the details.
- Prepare an explanation to jusify the Reason for access.
Once your access request is submitted, the turnaround for granting access is 1-4 business days. If you are a Project Staff Member, your Project Leader will be requested to approve your access.
The COVID-19 Summary Table contains discrete data elements for quick identification and analysis of the MGB COVID-19 positive patient cohort. These include patient demographics, EPIC Infection flags, COVID-19 PCR, and antibody laboratory tests.
Research opportunities for using this data include:
- Evaluating the trajectory of infection by analyzing the first identification of COVID-19 (by EPIC infection flag or PCR lab test) to the most recent negative antibody laboratory test.
- Determining clusters of infection based on demographic location, age, or gender.
- Calculating the mortality rate of the cohort.
- Understanding disparities based on race or ethnicity.
The COVID-19 Summary Table data originates from the MGB COVID-19 Data Mart found in the Enclave which contains vast amounts of patient data originating from the RPDR and EDW.
How to access it?
Request for access to the Shared File Area (SFA) is required to view the COVID19 Summary Table. An IRB Protocol covering research of all patients tested or flagged for COVID19 and associated data elements (including PHI) is necessary for the request to be reviewed and provisioned appropriately.
If you have any questions, please contact MGBCOVIDResearchRequest@partners.org.
External data sets are stored in either the Data Enclave or Shared File Areas.
Available in COVID-19 Data Mart housed in Data Enclave
Pulmonary X-Ray Severity (PXS) Score developed by the Center for Clinical Data Science
The PXS data consists of a numerical PXS score for over 4,000 COVID-tested patients. It is accessed directly in the COVID-19 Data Mart within the Radiology schema. Current users will automatically see this data in the Radiology.PulmonarySeverity table starting the Aug 12th data refresh.
Available in Shared File Area (SFA)
MGH COVID Registry validated by MGH Dept of Medicine. The MGH COVID Registry is only available to MGH researchers at this time
The MGH COVID-19 Registry contains manually reviewed data of over 500 data elements for 800+ COVID-19 positive patients seen at MGH. Access must be requested separately to view and download these SAS files.
The RPDR Daily Query Tool includes the following COVID-19 laboratory tests updated DAILY. Patients tested for COVID-19 from the previous day can be identified and data requested.
What Data is Available?
In the Query Tool hierarchy of terms, here is what is available under Laboratory tests\COVID-19:
- SARS coronavirus RNA/PCR lab tests;
- COVID-19 test comments;
- COVID-19 lab order result statuses;
- COVID-19 IgG and IgM antibody tests; and
- COVID-19 specimen sources.
Also available in the hierarchy of terms under Infection Control Flags\COVID-19 Flags:
- CoV-Exposed (INFG:67) – manually added/resolved to a patient’s chart by infection control staff
- CoV-Positive (INFG:65) – manually added/resolved to a patient’s chart by infection control staff
- CoV-Presumed (INFG:69) – manually added/resolved to a patient’s chart by infection control staff, assigned when a patient has recovered from symptoms that match COVID-19, but currently have a negative test. Those patients may have been positive, but were not tested within the timeframe that they were presumed to have been positive.
- CoV-Risk (INFG:66) – automatically added via a silent BPA based on the ordering of an outpatient Normal status COVID-19 lab order or outpatient COVID referral order, or automatically based on inpatient ordering/outpatient future or standing ordering of a COVID-19 lab order, automatically based on inpatient ordering of a Risk for COVID-19 nursing order, manually from infection control staff if patient is at-risk
- CoV-Clearance (INFG:68) – has been retired as of 5/11/20, was assigned to asymptomatic patients at low-risk for COVID-19 admitted to the hospital
Researchers can query for COVID-19 patients and request identified data.
Researchers may also navigate to the Previous Queries tab, click "Shared Queries" and run the query for "[RPDR] COVID Notes Search" to obtain COVID-19 related information from patient notes.
Match Control Functionality
The RPDR Daily Query Tool now allows researchers to identify a control population based on an existing patient population. After researchers have built their query using the Daily Query Tool and obtained an aggregate patient count, they will go to the Match Control tab to obtain an aggregate count for their new control population. When 'Use exact matches only' is selected in the right side panel, it will apply an 'AND' logic where all the selected items in that section must be true in the control population. On the other hand, when 'Use exact matches only' is NOT selected, it will apply an 'OR' logic where at least one item that is selected must be true in the control population.
After researchers click 'Submit request to find set of Control patients,' they will see a new Matched set query appear under the Previous Queries tab.
Once the Matched set query is ready, researchers may select the Matched set query, go to the Request Detailed Data tab and click 'Using a query' to request identified patient information on their newly established control population.
How to Access the Tool:
Instructions on how to query for COVID-19 patients in the RPDR Daily query tool can be found in the COVID How To document.
The Biobank Portal is a web-based query tool that allows Partners investigators to query and download data about consented Biobank subjects. You can also make Biobank sample requests for plasma, serum, DNA, and genomics data directly from the portal.
What Data is Available
Investigators can define a COVID-19 cohort of Biobank consented patients, access pre-disease samples, gather genomic data on a subset of patients who have been genotyped and request identified patient information (structured and unstructured).
To find patients who have tested positive for COVID-19, navigate to the lab tests folder under Healthcare Data and then to the Infectious Disease folder, and then to COVID-19. Drag over the SARS coronavirus 2 RNA folder and select all values of ‘detected’, ‘positive’ or ‘presumed positive’. If you are interested in those who have genomic data, drag the ‘All people with genomic data’ folder into a second panel and run the query. A genomic data request can then be made by selecting the Make Request tab and completing the Request Genomics form.
How to Access the Tool