Sr. Big Data Solutions Engineer (Digital Health)

Visit the Partners careers page to apply online. Job ID is 3118943

General Summary/Overview Statement

Through investments in big data and analytics, Partners Healthcare Systems (PHS) is innovating and transforming the delivery of health care, research and discovery. PHS’ Enterprise Research IS (ERIS team is tasked to provide the enterprise platform for raw data and self-service analytics necessary to derive insights and applications through development of machine learning (ML) and artificial intelligence (AI) capabilities. 

ERIS is immediately seeking an experienced Big Data Systems Architect with demonstrated experience in transformative technology to develop, integrate and operationalize services for Big Data repositories and self-service analytics. This position will work in coordination with the program management office, and interface with the multi-disciplinary engineering team that architects, builds, maintains and provides the Data Enclave platform for Partners HealthCare.

The Data Enclave is a high-profile project that will allow the analysis of multiple types of healthcare data in a secure environment. It will enable data science teams to develop algorithms, leveraging different types of data coming from diverse data sources, such as wearable physiological sensors, EHR systems, etc., to transform patient care and medical research. The enclave environment will also enable Partners Healthcare to work directly beside our industry partner such that we maximize our compute capability but minimize the risk to the privacy of our patients.

The role is challenging and varied, requiring technical, interpersonal, technology leadership and problem-solving abilities. We are seeking a candidate that will understand the data, its flow and the underlying infrastructure and platform technologies, who will help Partners process these data and build the underlying repositories. In the course of developing and delivering the service capability, you will interface with commercial hardware, software and cloud vendors, select and deploy new technologies/PaaS and provide training mechanisms for platform users as well as internal team members. 

We are seeking someone with eagerness to learn and apply technology, versatility, and breadth in technical skills and customer service skills. 

Principal Duties and Responsibilities:

  • Select, use and develop best practices for core Big Data technologies including Apache Spark, Apache Kafka and Apache HBase to architect a scalable, high performance system that will include time-series data
  • Provide inputs to size infrastructure (storage, compute, memory) needs for new and existing solutions for efficient processing, storage and retrieval of data
  • Design, develop and deploy/operationalize the high-volume data processing pipelines to clean and currate large amounts of data from disparate sources
  • Leverage Hybrid Cloud Computing methodologies to integrate Enterpsise systems (on-premise) and Edge services with applications on the cloud
  • Design modules/interfaces for reducing coupling and dependencies between components
  • Workflow orchestration and automation
  • Identify security vulnerabilities and eliminate them with strategic solutions that increase data security
  • Collaborate with departmental managers to create and oversee budgets.
  • Communicate with the technology team and other departments to maintain healthy information flow and collaboration
  • Keep abreast of new and evolving technology trends by attending educational events and identifying training and certification opportunities
  • Maintain quality service by establishing and enforcing SOPs, SLAs, security policies for solid processing/operating standards.
  • Uses the Partners HealthCare values to govern decisions, actions and behaviors. These values guide how we get our work done: Patients, Affordability, Accountability & Service Commitment, Decisiveness, Innovation & Thoughtful Risk; and how we treat each other: Diversity & Inclusion, Integrity & Respect, Learning, Continuous Improvement & Personal Growth, Teamwork & Collaboration.
  • Perform other duties as assigned or required by the situation and circumstances


  •  Bachelor’s Degree in Computer Science or a related field or equivalent experience required
  • 15+ years overall experience in the IT field with demonstrated experience throughout of architecting, building, implementing large-scale parallel clustered/distributed computing systems. 
  • 10+ hands-on experience with technical implementation, managing of infrastructure components (clustered/distributed systems, storage), batch and real time processing, streaming, auditing and monitoring of Big Data systems (on-prem and cloud) leveraging technologies covering Hadoop, NoSQL, Bare Metal or similar platforms
  • 7+ Hands on experience integrating on-prem systems with public and private cloud
  • 5+ years working with Microsoft Azure (PaaS/SaaS) or equivalent
  • Experience working with time series data and tools like Influx DB, open TSDB is a plus
  • Previous experience in healthcare industry preferred. 
  • Experience with IT Management frameworks, such as ITIL Foundations, desired 

Skills/Abilities/Competencies Required:

  • The ideal candidate for this position possesses proven experience in developing and deploying Enterprise Data System for processing large amounts of data using parallel computing and cluster technology
  • Hands on experience in developing data applications using Java, Python, Scala, PySpark
  • Experience data modeling (NoSQL/Relational) of structured and unstructured data 
  • Hands on experience with Apache Spark, Apache Kafka, Apache Flume, HBase
  • Hands on experience with file formats, data formats such as Apache Parquet, Avro, ORC
  • Hands on experience with ETL and ELT processes and tools
  • Hands on experience with performance tuning, data serialization and deserialization techniques
  • Hands on experience with storage technologies and file systems including Object Storage and HDFS
  • Ability to create prototypes for proof of concept
  • General knowledge of data science tools such as R, H2o, Anaconda, Jupyter notebook preferred
  • Prior experience working with agile methodologies, scrum and Kanban are preferred.
  • Experience with monitoring software such as New Relic, Prometheus, Zabbix, Nagios, etc.
  • Deep understanding of TCP/IP networking principles, and protocols, such as DHCP, NFS, SMB and HTTP.
  • Working understanding of Active Directory including Kerberos, GPO, LDAP, and Forest/Domain design/deployment.
  • Deep understanding of High Availability and BCP concepts (Backup & Recovery, Server Clustering, Recovery Time Objectives, etc.). 
  • Solid experience working in Linux and Windows operating systems
  • Experience or deep understanding of Infrastructure-as-Code tools (Terraform, CloudFormation, etc.) and configuration management tools (Ansible or Puppet preferred) 
  • Experience with AZURE, GCP or other cloud providers
  • Familiarity with serverless computing, Lambda architecture and Containerization 
  • Hands-on experience or direct exposure to Software Engineering principles and best practices - code version control using Git, CI/CD, etc. 
  • Exposure to the entire application life cycle, supporting planned releases to different environments, such as QA, pre-production and production environments.
  • High level of initiative and eagerness to learn new technologies.
  • Familiarity with information technology security and data privacy considerations applicable to a healthcare environment is advantageous. 

Working Conditions:

  • Standard office environment with travel to Hospital locations in the Boston Metro area including the data centers
  • As projects and priorities dictate, flexible work and off-hours are required including evening, night and weekend hours to cover events, roll-outs and special projects
  • Occasionally lift and carry supplies and equipment weighing up to 25 pounds.
  • Expected to work during core business hours, however this position may often require working beyond that to meet project deliverables and operating conditions.

EEO Statement 
Partners HealthCare is an Equal Opportunity Employer & by embracing diverse skills, perspectives and ideas, we choose to lead. All qualified applicants will receive consideration for employment without regard to race, color, religious creed, national origin, sex, age, gender identity, disability, sexual orientation, military service, genetic information, and/or other status protected under law. 

Primary Location: MA-Somerville-PHS Assembly Row
Work Locations: 
PHS Assembly Row 
399 Revolution Drive   
Somerville 02145
Job: Systems/Network Administration
Organization: Partners HealthCare(PHS)
Schedule: Full-time
Standard Hours: 40
Shift: Day Job
Employee Status: Regular
Recruiting Department: PHS Enterprise Data & Digital Health
Job Posting: Feb. 7, 2020

Visit the Partners careers page to apply online. Job ID is 3118943