The IDEA Analytics platform is ideal for performing analytics across large and diverse datasets, using the Hadoop distributed computing system. In addition to Hadoop and related open-source tools for machine learning and natural language processing, IDEA includes tools like Spark to perform high performance parallel operations.
- Association studies combining genomic information with medical records
- Applying natural language processing to textual datasets
- Predicting outcome using predictive modeling algorithms on large datasets
SUPPORTED METHODS OF CONNECTING TO THE CLUSTER
- SSH command line terminal for Hadoop to the workspace.
- Web portals and applications
Getting an account?
- Use the registration form
All IDEA documentation is hosted on the IDEA Confluence space. You can find here the most popular links.
- Getting Started on IDEA
- Beginners Guides
- External access to HDFS folders
- Jupyter Notebooks
- How to use Rstudio with Spark on IDEA
- Request PAS group for IDEA
- IDEA Hostnames and URLs Index