November 14, 2023
This document details best practices and instructions for working with a User Work Database and Patient Data Science Repository (PDSR) Curated data in your Enclave project workspace. Note: This article only applies to projects with access to the PDSR Curated Data Set and an associated User Work Database.
For projects with access to the PDSR Curated Data Set, an empty User Work Database may be requested to save results from queries to the PDSR Curated data. Since the User Work Database and PDSR Curated reside on separate servers, a linked server setup and PDSR Curated table synonyms is implemented on the local User Work Database side to simplify queries. Linked servers present a performance risk - it is very important researchers understand these risks and follow best practices when querying the PDSR Curated data from your local User Work Database server.
- Ensure that you have the Mass General Brigham (MGB) credentials (username and password) before trying to connect to the Enclave platform.
- Ensure that you have established a MGB VPN connection. | KB0023967
- The requested User Work Database or Datamart has been provisioned. | KB0040551
- If accessing a User Work Patient Cohort Datamart, your access to the User Work Datamart has been submitted, approved, and provisioned. | KB0039860
- Please allow 2-3 hours after provisioning before accessing the User Work Database or Datamart.
- You're logged into your Enclave Project Workspace VDI and User Work Database via SQL Server Managed Studio (SSMS). | KB0040678
- You must have intermediate to advanced knowledge of working with SQL and clinical databases.
General Key Points & Best Practices
1. Run queries to the remote PDSR Curated server from your local User Work Database server. You are not required to be logged-in to the PDSR Curated server to execute a query.
2. Avoid using Distributed queries: Don't write queries that join in your local User Work Database to tables on the PDSR Curated linked server (remote server).
3. Ensure your query is optimized by using appropriate indexes, compatible data types, limiting the data returned, and avoiding unnecessary calculations or joins.
• Indexing: Ensure both your local and remote tables have appropriate indexes to support your query. Indexes can significantly improve query performance by speeding up data retrieval.
• Data Types: Be aware of data type conversions between your local and remote tables. Converting data types on-the-fly adds overhead. Use compatible data types.
• Data Size: Be mindful of the amount of data you are retrieving from the remote server. Fetching large datasets from the remote server leads to performance bottlenecks and increased network traffic. Select the columns you need and filter to retrieve only relevant rows.
• Filtering and Joins: Use filtering conditions to restrict the data retrieved from the remote server to only what is required. Minimize the use of joins involving tables from both the local and remote servers.
On your local User Work Database, synonym aliases are provided to each PDSR Curated table in the following format:
To see the full list of synonyms in SQL Server Management Studio, select the Synonyms node from your local User Work Database as pictured below:
SELECT TOP 100 * FROM PDSR_Curated_Analytics_Diagnoses
KB0014892 | About the Complete PDSR Curated Data Set — Structure, Data Details, & SQL Query Examples