September 21, 2022
Data Sets Index
- ClinicalTrials.gov: Two versions of the database are currently available:
- September 2014 - Use schema clintrialsgov_201409
- March 2015 - Use schema clintrialsgov_201503
- dbSNP: Human, Mouse, Fruit Fly.
- Name conventions:
- dbsnp_(genome data set)_(build number)_(major genome version)_(minor genome version),
- dbsnp_main_(build number)
- Current versions:
- dbsnp_main_145
- dbsnp_human_9606_144_38_2
- dbsnp_fruitfly_7227_130_0_0
- dbsnp_mouse_10090_142_38_3
- Name conventions:
Connecting to the Database
Investigators may request access to our public datasets hosted on the IDEA platform by completing the Public Data Service Request form. Once access is granted, the username will be your regular Partners ID and your password are used to gain access.
The data sources are hosted on HAWQ - a Postgres-compatible relational database. pgAdmin III or a similar Postgres-compatible tool may be used to connect to the database. HAWQ uses a forked version of PostgreSQL from 8.2.14, please use the pgAdminIII v.1.20.0 to avoid compatibility issues.
A screen shot showing the server connection setup for pgAdmin III is shown below.
Contact IDEA Support Team for questions on Public data Sets at: @email.
Querying the data in IDEA
Once connected to HAWQ you can query the data inside the database. The publicdatasets folder will show the different data sets available in the different schemas. The access to them will only be provided for the selected data set requested in the Public Data Service Request form.
Use the SQL button to display the sql editor, all the commands are postgres like.
To extract selected data from IDEA when you need the data outside the platform (we highly recommend not to make duplicates of the datasets) please use one of the methods of querying the data:
- From any other Postgres/SQL database: Use postgres_fwd.
- From Python: Use PyGreSQL.
- From R: Use RPostgreSQL.