Posted on
Updated on
September 2, 2024
The IDEA Analytics platform is ideal for performing analytics across large and diverse datasets, using the Hadoop distributed computing system. In addition to Hadoop and related open-source tools for machine learning and natural language processing, IDEA includes tools like Spark to perform high performance parallel operations.
TYPICAL USES
- Association studies combining genomic information with medical records
- Applying natural language processing to textual datasets
- Predicting outcome using predictive modeling algorithms on large datasets
SUPPORTED METHODS OF CONNECTING TO THE CLUSTER
- SSH command line terminal for Hadoop to the workspace.
- Web portals and applications
Getting an account?
- Use the registration form
DOCUMENTATION
All IDEA documentation is hosted on the IDEA Confluence space. You can find here the most popular links.
- Getting Started on IDEA
- Beginners Guides
- External access to HDFS folders
- Jupyter Notebooks
- How to use Rstudio with Spark on IDEA
- Request PAS group for IDEA
- IDEA Hostnames and URLs Index