How to access Shared Data sets on IDEA

Data Sets Index

  • ClinicalTrials.gov: Two versions of the database are currently available:
    • September 2014 - Use schema clintrialsgov_201409
    • March 2015 - Use schema clintrialsgov_201503
  • dbSNP: Human, Mouse, Fruit Fly. 
    • Name conventions: 
      • dbsnp_(genome data set)_(build number)_(major genome version)_(minor genome version), 
      • dbsnp_main_(build number)
    • Current versions:
      • dbsnp_main_145
      • dbsnp_human_9606_144_38_2
      • dbsnp_fruitfly_7227_130_0_0
      • dbsnp_mouse_10090_142_38_3

Connecting to the Database

Investigators may request access to our public datasets hosted on the IDEA platform by completing the Public Data Service Request form.  Once access is granted, the username will be your regular Partners ID and your password are used to gain access.

The data sources are hosted on HAWQ - a Postgres-compatible relational database. pgAdmin III or a similar Postgres-compatible tool may be used to connect to the database. HAWQ uses a forked version of PostgreSQL from 8.2.14, please use the pgAdminIII v.1.20.0 to avoid compatibility issues.

A screen shot showing the server connection setup for pgAdmin III is shown below.

 

Connect to HAWQ

 

Contact IDEA Support Team for questions on Public data Sets at: ideasupport@partners.org.

Querying the data in IDEA 

Once connected to HAWQ you can query the data inside the database.  The publicdatasets folder will show the different data sets available in the different schemas. The access to them will only be provided for the selected data set requested in the Public Data Service Request form.

HAWQ Interface

Use the SQL button SQL to display the sql editor, all the commands are postgres like. 

To extract selected data from IDEA when you need the data outside the platform (we highly recommend not to make duplicates of the datasets) please use one of the methods of querying the data: