In response to the fast-growing number of COVID-19 research projects, new and enhanced research data tools and data sets are being made available to support these endeavors. The goal is to support the rapid turnover of analytics to help researchers understand the signs and symptoms, comorbidities, and treatments of COVID-19. Researchers can leverage any combination of these tools to support COVID-19 research.
- 'NEW' COVID-19 Data Mart housed in the Data Enclave
- COVID-19 Summary Table
- RPDR (Research Patient Data Registry) COVID-19 patient data updated daily
- Biobank Portal COVID-19 defined patients with genomic information
COVID-19 Research support email: MGBCOVIDResearchRequest@partners.org
The Data Enclave allows the analysis of multiple types of healthcare data in a secure environment. It provides direct access to data tables as well as multiple analytic tools (RStudio, Jupyter Notebook, and SPARK Hadoop) for a complete one-stop analysis option, removing the need to pull data into outside repositories. The Data Enclave enables data science teams to develop algorithms and leverage different types of data coming from diverse data sources, to transform patient care and medical research.
COVID-19 Data Mart Patient Cohort
The COVID-19 patient cohort in the Data Mart is defined as:
1. Any positive or negative PCR test and custom SQL code to identify positive, detected, or negative test results (tval), where the former tests are found in the RPDR Query Tool under Laboratory Tests\Infectious Disease\COVID-19\SARS coronavirus 2 RNA, COVID-19;
2. Any infection flag COVID-19 Positive (INFG:65), COVID-19 Presumed (INFG:69) that has any flag status; and
3. Any ICD-10 Diagnosis of U07.1
What Data is Available
The Data Enclave contains a COVID-19 data mart comprised of structured and unstructured patient data from the research sources:
- RPDR (Research Patient Data Registry). See the COVID-19 Data Mart User Guide for more information.
- EDW (Enterprise Data Warehouse). See the COVID-19 EDW Data Mart Document for more information.
This data is refreshed every Sunday and available in the mart on the following Wednesday. The mart can be accessed directly using SQL Server Management Studio, Aqua Data Studio, or an i2b2 client.
How to Access the Tool
Before applying for access, you will need to be prepared with the following information:
- Your Project Role Details
- If you’re a Contractor (POI), make sure to have completed your PHS HIPAA Training. Gather your MGB supervising manager's name, ID and email.
- When submitting your access request, ensure this information is correctly reflected in the Manager details section on the form.
- Know Your Project Details
- Currently, only research projects can be approved for access to COVID19 mart. You will need to provide the associated IRB Protocol ID. Make sure your IRB covers COVID-19 Research
- Access can be granted for a specific time until your project ends. Your Access End Date cannot exceed the IRB Protocol Expiration Date.
- You will need to provide the Project Name, Project Description, and Project Leader Name.
- If you plan to share data outside of the Partners network, prepare to provide the details.
- Your Enclave Data Mart Needs
- Do you plan to bring additional Data to the Enclave? If yes, how much (GB)?
- Do you require additional storage beyond the Enclave default?
- Which tools do you plan to use to access the data?
- Prepare an explanation to justify the Reason for access.
Once your access request is submitted, the turnaround for granting access is 1-4 business days. If you are a Project Staff Member, your Project Leader will be requested to approve your access.
The COVID-19 Summary Table is a conveniently accessible spreadsheet that contains summary information on the defined COVID-19 patient cohort. This document gives researchers a glance into the COVID-19 data mart in a format that is easy to read, filter, and manipulate with no analysis tools necessary. The pre-defined concepts allow for a quick review of important patient elements for COVID-19 research. It is recommended that the summary data be validated with detailed patient information found in the Data Enclave.
What data is available?
The COVID-19 Summary Table contains one patient per row with information on over 100 patient data elements, including recent COVID-19 and routine laboratory tests, vital signs, oxygen therapy, ICU admission, risk factors, flag statuses, recent medications and much more.
The summary file is updated weekly to correspond with the current Enclave data update schedule. Previous files will be moved to an archive directory.
How to access it?
Request for access to the Shared File Area (SFA) is required to view the COVID19 Summary Table. An IRB Protocol covering research of all patients tested or flagged for COVID19 and associated data elements (including PHI) is necessary for the request to be reviewed and provisioned appropriately.
If you have any questions, please contact MGBCOVIDResearchRequest@partners.org.
The RPDR Daily Query Tool includes the following COVID-19 laboratory tests updated DAILY. Patients tested for COVID-19 from the previous day can be identified and data requested.
What Data is Available?
In the Query Tool hierarchy of terms, here is what is available under Laboratory tests\COVID-19:
- SARS coronavirus RNA/PCR lab tests;
- COVID-19 test comments;
- COVID-19 lab order result statuses;
- COVID-19 IgG and IgM antibody tests; and
- COVID-19 specimen sources.
Also available in the hierarchy of terms under Infection Control Flags\COVID-19 Flags:
- CoV-Exposed (INFG:67) – manually added/resolved to a patient’s chart by infection control staff
- CoV-Positive (INFG:65) – manually added/resolved to a patient’s chart by infection control staff
- CoV-Presumed (INFG:69) – manually added/resolved to a patient’s chart by infection control staff, assigned when a patient has recovered from symptoms that match COVID-19, but currently have a negative test. Those patients may have been positive, but were not tested within the timeframe that they were presumed to have been positive.
- CoV-Risk (INFG:66) – automatically added via a silent BPA based on the ordering of an outpatient Normal status COVID-19 lab order or outpatient COVID referral order, or automatically based on inpatient ordering/outpatient future or standing ordering of a COVID-19 lab order, automatically based on inpatient ordering of a Risk for COVID-19 nursing order, manually from infection control staff if patient is at-risk
- CoV-Clearance (INFG:68) – has been retired as of 5/11/20, was assigned to asymptomatic patients at low-risk for COVID-19 admitted to the hospital
Researchers can query for COVID-19 patients and request identified data.
Researchers may also navigate to the Previous Queries tab, click "Shared Queries" and run the query for "[RPDR] COVID Notes Search" to obtain COVID-19 related information from patient notes.
Match Control Functionality
The RPDR Daily Query Tool now allows researchers to identify a control population based on an existing patient population. After researchers have built their query using the Daily Query Tool and obtained an aggregate patient count, they will go to the Match Control tab to obtain an aggregate count for their new control population. When 'Use exact matches only' is selected in the right side panel, it will apply an 'AND' logic where all the selected items in that section must be true in the control population. On the other hand, when 'Use exact matches only' is NOT selected, it will apply an 'OR' logic where at least one item that is selected must be true in the control population.
After researchers click 'Submit request to find set of Control patients,' they will see a new Matched set query appear under the Previous Queries tab.
Once the Matched set query is ready, researchers may select the Matched set query, go to the Request Detailed Data tab and click 'Using a query' to request identified patient information on their newly established control population.
How to Access the Tool:
Instructions on how to query for COVID-19 patients in the RPDR Daily query tool can be found in the COVID How To document.
The Biobank Portal is a web-based query tool that allows Partners investigators to query and download data about consented Biobank subjects. You can also make Biobank sample requests for plasma, serum, DNA, and genomics data directly from the portal.
What Data is Available
Investigators can define a COVID-19 cohort of Biobank consented patients, access pre-disease samples, gather genomic data on a subset of patients who have been genotyped and request identified patient information (structured and unstructured).
To find patients who have tested positive for COVID-19, navigate to the lab tests folder under Healthcare Data and then to the Infectious Disease folder, and then to COVID-19. Drag over the SARS coronavirus 2 RNA folder and select all values of ‘detected’, ‘positive’ or ‘presumed positive’. If you are interested in those who have genomic data, drag the ‘All people with genomic data’ folder into a second panel and run the query. A genomic data request can then be made by selecting the Make Request tab and completing the Request Genomics form.
How to Access the Tool