Service Alerts

green - services online

RISC Website Maintenance 17 Jul 2019

July 17, 2019 7:13 am to 7:20 pm

Research IS & Computing website maintenance completed on Wednesday, July 17 from 7-7:30a.m. EDT. The https://rc.partners.org website was offline from 7:13-7:20a.m. EDT for a database migration. Please see Mysql3 Shutdown and Retirement for details.

green - services online

Mysql3 Shutdown and Retirement

July 19, 2019 6:00 pm to July 21, 2019 6:00 pm

We have an operational event that requires the shutdown of the Mysql Service database server mysql3.research.partners.org. With this shutdown, the ERIS Infrastructure team will migrate all MySQL databases that reside on mysql3.research.partners.org to mysql4.research.partners.org. The ERIS Infrastructure team will start this process on Friday evening July 19th starting at 6:00 pm and the target completion date is Sunday, July 21, 2019, at 6:00 pm. During this period all databases on mysql3.research.partners.org will be down until they are migrated to mysql4.research.partners.org.

This change will require that all database owners update the MySQL datasource definition on their applications and change the reference from mysql3.research.partners.org to mysql4.research.partners.org.

Please forward all questions to rcc@partners.org.

green - services online

Emergancy Reboot of rfanfs.research.partners.org

July 13, 2019 10:01 pm to 11:01 pm

The ERIS Infrastructure team had to reboot rfanfs.research.partners.org on Saturday morning July 13th. The team started at 10:00am and the reboot and maintenance was completed by 11:00am. All applications and systems using a storage device from rfanfs.research.partners.org are asked to double check their storage device and should there be a problem, please send a note to rcc@partners.org.

green - services online

Partners HealthCare REDCap: Scheduled Downtime

July 12, 2019 3:00 pm

REDCap (https://redcap.partners.org/redcap) upgrades to v8.10.20 scheduled for Tuesday July 16th, 2019 at 6:30AM EST are complete. REDCap was offline from 6:30-6:47 AM EST. 

This release includes a number of bug fixes. 

Release notes can be found here.


IMPORTANT NOTICE: REDCap Database Migration

In order to meet the increased demand, usage, and amount of data stored within the REDCap system, it is necessary to upgrade the database infrastructure.

As a result, a REDCap database migration is scheduled for Saturday July 20th, 2019 at 7:00PM EST.  This will require an extended amount of downtime with REDCap being offline for approximately 14 hours

This will help optimize our REDCap infrastructure, resulting in faster overall performance, and prevent extended downtime for maintenance in the future. 

Additional alerts will be sent next week. 

If you have any concerns about REDCap being offline at either of these times, please email edcsupport@partners.org

green - services online

GitLab: Follow-Up to the Outage on July 11, 2019

July 11, 2019 8:15 am

On July 11, 2019 the GitLab service experienced a significant outage. The service was unavailable between 7:15AM and 12:20PM.

 

What lead to the outage:
An upgrade to GitLab 11.11.5 was scheduled. As part of the upgrade tasks, we apply OS-level patches before beginning the application upgrade process. Some OS-level patches require a restart. In this case, a restart was required. After the patches were installed and reported as having completed successfully, the server was restarted. After the restart, the server was never able to come back up fully and on-screen messages indicated a drive failure/issue.

 

NOTE: The drive failure did not affect the storage location where the GitLab/Git repositories are stored!

 

How we recovered:
As part of our recovery procedure, we created a new Virtual Machine, applied all relevant configuration and re-attached the storage where all GitLab/Git repositories are hosted.

 

NOTE: The drive failure did not affect the storage location where the GitLab/Git repositories are stored! We used the nightly backup of the GitLab database and supporting services to recover the additional configuration needed to bring the service back up.

 

What You should look out for:
Our backup process runs at midnight. Therefore, and data entered within the User Interface of GitLab (issues, Wikis, etc) between Midnight and 7:15 AM may have been lost. This does NOT apply to code committed to the remote repositories. Any code that has been committed to GitLab should not have been affected and no data should have been lost.

 

Since we are operating on a new Virtual Server, those of you who use SSH in order to interact with the server will have to update your "known_hosts". In a lot of cases, you will be prompted with a message along the lines of "Remote Host Identification Has Changed" with some additional text regarding a possible Man-in-the-middle attack. This is an expected side-effect of the migration to a new host - please do not be alarmed. To resolve this issue, please refer to the message that you see on the screen and take note of what "known_hosts" file you are being prompted for. These are typically located in your home folder under the ".ssh" hidden folder, but they can be stored elsewhere. Once you know where the know_hosts (or known_hosts2 on some MACs) is located, open that file with a text editor, find any line that has "gitlab.partners.org" or "gitlab.dipr.partners.org" in it and remove it - remove the whole line of text. Then save the file. NOTE: by modifying this file you do NOT risk damaging any configurations you may have for connections to other servers. This file simply contains the fingerprints of servers you have connected to previously.

 

Please check any automated processes and runners to ensure that they are working as expected. If you experience any issues, please feel free to contact us for assistance at rcc@partners.org with the words GitLab in the somewhere in the subject line.

 

 

---------------------

Additional Notes:
Q: Why did it take so long?
A: The outage was prolonged for a number of reasons. Chief among which was the fact that we had not done a full-scale disaster recovery of the service and had to take careful steps to ensure that no data was lost or corrupted.

 

Q: Are there any up-sides to this crash?
A: Actually, yes. We are now running on new-er Operating System and have more resources provisioned to the machine. We can also now use ED25519 keys for SSH communication with GitLab (as opposed to RSA only previously).

 

Q: Would the upgrade to GitLab 11.11.5 take place soon?
A: Yes. The upgrade to 11.11.5 will be re-scheduled. A separate notification will be sent out for that when we are sure that the new system is fully validated by you - the users.

green - services online

GitLab Upgrade Completed on June 28, 2019

June 28, 2019 7:00 am to 8:00 am

The GitLab service (https://gitlab.partners.org/) upgrade is scheduled for Friday, June 28th, 2019 from 7 to 8 AM. We are upgrading from v11.7.11 to v11.11.3 and the upgrade is scheduled to take up to 1 hour to complete. During this time, the service will be unavailable. This upgrade includes: Critical security patches, Bug fixes, and Minor new features.

For a complete list of changes, please refer to the GitLab project release notes:

Version 11.8 -  https://about.gitlab.com/2019/02/22/gitlab-11-8-released/
Version 11.9 -  https://about.gitlab.com/2019/03/22/gitlab-11-9-released/
Version 11.10 -  https://about.gitlab.com/2019/04/22/gitlab-11-10-released/
Version 11.11 - https://about.gitlab.com/2019/05/22/gitlab-11-11-released/

 

For any questions or concerns regarding this upgrade, please contact rcc@partners.org.

red - services offline, issue, active work

All mysql3 database password will expire on June 26, 2019

June 26, 2019 10:30 am to 12:00 pm

All database passwords on mysql3.research.partners.org will expire on June 26, 2019 at 10:30am which will impact all applications that use mysql3.research.partners.org as a database server. Please send a note to rcc@partners.org and identify your database, the database owner, and the mysql UserID's needed and we will coordinate a updated password for your team.

Please send all comments and questions to rcc@partners.org

yellow - pre-downtime, calls to action

Partners HealthCare REDCap: Unscheduled Maintenance Required

June 25, 2019 4:31 pm

REDCap (https://redcap.partners.org/redcap) requires server maintenance. REDCap will be offline tonight Tuesday June 25th, 2019 from approximately 9:00-9:30PM EST. 

There will be a short interruption of service during the maintenance window.

If you have any concerns, please email edcsupport@partners.org

green - services online

RISC Website Maintenance 26 Jun 2019

June 26, 2019 7:15 am to 7:45 am

Research IS & Computing website maintenance completed on Wednesday, June 26. The https://rc.partners.org website was offline during from 7:15-7:45a.m. EDT to apply security patches and bug fixes.

yellow - pre-downtime, calls to action

Action Required: Linux Updates

June 19, 2019 3:10 pm

Applies To: Linux & FreeBSD (SACK Panic)

A new vulnerability has been discovered in the Linux kernel handling TCP Selective Acknowledgments (SACKs). A remote attacker could use this to cause a denial of service attack, interrupting systems operations. 

This affects many current distributions being used, as RHEL 4, 6, 7 and 8 and Ubuntu 12.xx to 19.xx (or kernels 2.6.29 and above). This is being tracked as CVE-2019-11477 and is considered important and high impact. 

Please note that this might also impact appliances or IoT devices built based on those versions of the Linux kernel.

Some patches are already available and some vendors are still issuing software packages to fix the kernel vulnerability. If you manage Linux or FreeBSD systems please patch asap and reboot accordingly. If you can’t patch a system, please use compensating controls (as sysctl filters disabling tcp probing) as appropriate.

References:
http://packetstormsecurity.com/files/153346/Kernel-Live-Patch-Security-Notice-LSN-0052-1.html              
https://access.redhat.com/security/vulnerabilities/tcpsack  
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=3b4929f65b0d8249f19a50245cd88ed1a2f78cff
https://github.com/Netflix/security-bulletins/blob/master/advisories/third-party/2019-001.md  
https://support.f5.com/csp/article/K78234183          
https://wiki.ubuntu.com/SecurityTeam/KnowledgeBase/SACKPanic

 

yellow - pre-downtime, calls to action

Firefox Web Browser Update Needed

June 18, 2019 9:00 am

Security vulnerabilities fixed in Firefox 67.0.3 and Firefox ESR 60.7.1

Please update your Firefox web browser.

  1. Open Firefox
  2. Select About Firefox
  3. Select Update

Announced: June 18, 2019
Impact: critical
Products
Firefox, Firefox ESR

Fixed in
Firefox 67.0.3
Firefox ESR 60.7.1

Learn More

green - services online

List Manager (Lyris) Alert: Downtime on 23 June 2019

June 23, 2019 8:00 am to June 25, 2019 9:00 am

The database was successfully migrated to the new server, all systems are green. If you have any questions or concerns, relating to this change, please email us at  rcc@partners.org

On June 23rd at 8 AM we will be migrating our backend database for List Manager (Lyris) from Microsoft SQL 2008 to Microsoft SQL 2016.  During this migration, the List Manager web application (https://researchlist.partners.org and https://researchlistadmin.partners.org) will not be available.  We will be moving the database from one system to another. (CHG0147108).   Any scheduled emails will be sent after the system comes back online.  If you have any questions or concerns, relating to this change, please email us at  rcc@partners.org

green - services online

REDCap Service Alert: Scheduled Downtime

June 12, 2019 2:55 pm

REDCap (https://redcap.partners.org/redcap) upgrades to v8.10.18 scheduled for Tuesday June 18th, 6:30AM EST are complete. REDCap was offline from 6:28 to 6:44 AM EST. 

This upgrade will include a number of new features and bug fixes. These include:

  • New feature: REDCap Messenger
    • REDCap Messenger is a communication platform built directly into REDCap. It allows REDCap users to communicate easily and efficiently with each other in a secure manner. At its core, REDCap Messenger is a chat application that enables REDCap users to send one-on-one direct messages or to organize group conversations with other REDCap users. REDCap Messenger is also the best and easiest way to share documents with other REDCap users, in which you can upload documents and embed pictures inside any given conversation.
    • Watch 10-minute video on REDCap Messenger
  • Improvement: Performance boost – Certain pages in projects with thousands or more records should now load much faster in most cases. This includes the Record Status Dashboard, various pages utilizing Data Access Groups, and certain reports. Reports A and B should especially see significantly faster loading (excluding when viewing “all” pages in report A or B).
  • Change: Changed the text "Manage Survey Participants" to "Survey Distribution Tools," which more clearly describes the pages in that section.
  • New feature: Report Folders - Reports can now be organized into folders in any given project. If a user has "Add/Edit Reports" privileges, they will see an "Organize" link on the left-hand project menu above the project’s reports. They will be able to create folders and then assign their reports to a folder, after which the project's reports will be displayed in collapsible groups on the left-hand menu.
  • New feature: “Edit Access” for reports - In addition to setting "View Access" when creating or editing a report, users can now set the report's "Edit Access" (under Step 1) to control who in the project can edit, copy, or delete the report. This setting will be very useful if one wishes to prevent certain users from modifying or deleting particular reports.
  • Improvement: A project's Record ID field can now be used as a Live Filter in any given report, thus allowing users to easily view the report for a single record.
  • Improvement: New optional parameters added to the API Export Records method to filter data returned based on when a record was created or modified
    • dateRangeBegin – To return only records that have been created or modified *after* a given date/time, provide a timestamp in the format YYYY-MM-DD HH:MM:SS (e.g., '2017-01-01 00:00:00' for January 1, 2017 at midnight server time). If not specified, it will assume no begin time.
    • dateRangeEnd – To return only records that have been created or modified *before* a given date/time, provide a timestamp in the format YYYY-MM-DD HH:MM:SS (e.g., '2017-01-01 00:00:00' for January 1, 2017 at midnight server time). If not specified, it will use the current server time.

Full REDCap 8.10.18 release notes

New Features and Major Bug Fixes

If you have any concerns about REDCap being offline at this time or any questions about these new features, please email edcsupport@partners.org

green - services online

Freezerworks Service Alert: Maintenance

June 6, 2019 4:00 pm to 6:30 pm

Partners Healthcare Freezerworks: Scheduled Downtime

On Thursday June 6th from approximately 6 pm to 6:30 pm (EST), the Freezerworks application will be taken offline for maintenance. During this timeframe, we ask users to log out of the system so we can complete this necessary work. We apologize for any inconvenience during this interruption. 

 

If you have any questions or concerns about email: edcsupport@partners.org

 
green - services online

CrashPlan Access Resolved

June 5, 2019 3:53 pm to June 6, 2019 6:53 pm

The authentication server backup01.partners.org is up and running.  All services are green.  If you are having any issues email us at rcc@partners.org.

Currently, our CrashPlan authentication system (backup01.partners.org) is not functioning correctly.  We are in the process of rebuilding the system.  During this time you will not be able to backup or restore files.  Your backup client might report we are out of space, but that is a false message.   We will up this alert when everything is resolved.

red - services offline, issue, active work

MySQL and DIPR VM Problems

June 1, 2019 1:34 pm to June 3, 2019 5:00 pm

The ERIS MySQL Service and parts of the DIPR Virtual Machine service experienced problems over the weekend and had to be rebooted. The outages were related to the Compellent patching on Saturday, June 1st however, some work extended into Sunday, June 2nd and Monday, June 3rd. If you experience a problem with a DIPR VM please send a note to rcc@partners.org and the ERIS Infrastructure team will investigate.

green - services online

Compellent Outage 1 Jun 2019

June 1, 2019 1:45 pm to 2:45 pm
On Saturday, June 1st, 2019 around 1:45 pm ET, during a scheduled firmware upgrade of our backend Storage Area Network (CHG0145888), we experience an issue that caused some hosted servers that utilize this storage to be impacted.  The servers had to be rebooted, or our engineer team had to run file system consistency check to resolve the issue.
 
 Compellent Storage Area Network (SAN) service was fully restored with completed maintenance on Saturday, June 1 around 2:45p.m. EDT.
 
Many of our services rely on Compellent Storage Area Network (SAN) for underlying storage. Please check the Service Alerts for service-specific instructions and impact. Please make sure you subscribe to receive appropriate email notifications for the services you subscribe to: https://rc.partners.org/subscribe.
yellow - pre-downtime, calls to action

GitLab Unexpected Downtime

June 1, 2019 2:00 pm to June 3, 2019 9:30 am
Partners' GitLab: Intermittent service was unavailable between 2 PM on Saturday, June 01, 2019 until 9:30 AM, June 03, 2019.
green - services online

JIRA is now available

June 3, 2019 8:49 am to 11:49 am

Edit 10:38am: JIRA is now back online.  We apologize for the inconvencience

JIRA is currently down and unavailable.  An update to this alert will be posted shortly

yellow - pre-downtime, calls to action

Research Computing Ticketing System 29 May 2019

May 29, 2019 12:00 pm

Research Computing Ticketing System Now Online

The email component of the internal ticketing system (Kayako) used by ERIS has not been working properly. The issues started Wednesday, May 29 around 12p.m. EDT and are not yet fully resolved. 

The system may be delivering several of the same email or none at all. At times, ERIS was not receiving your tickets at all, so slightly longer than usual response times may be experienced.

Affected Services
Emails and requests sent to the following addresses were not accessed during this service disruption:
edcsupport@partners.org - REDCap-generated requests to approve, copy or move projects to production have not been received.
hpcsupport@partners.org
rcc@partners.org 
ideasupport@partners.org

Please allow our staff time to catch up on the tickets submitted while the system was offline.

yellow - pre-downtime, calls to action

ACTION REQUIRED: Urgent Microsoft Patching

May 23, 2019 8:08 am to 11:08 am

Microsoft has taken the unusual step of releasing sweeping system patches for older or unsupported but widely used Windows operating systems on Tuesday May 14 that include one critical patch that should be applied immediately. Details Here

 

 

yellow - pre-downtime, calls to action

Oracle PPM Database Work Extending Past Noontime

May 11, 2019 10:00 am to 1:00 pm

The Oracle patches scheduled for the Partners Personalized Medicine team on Saturday morning is taking longer than anticipated and will extending past noontime. We anticipate another hour of patching and rebooting as we should complete our efforts by 1:00pm.

Please forward any questions or concerns to rcc@partners.org

green - services online

Completed the ERIS PostgreSQL Service Maintenance on Saturday

April 27, 2019 9:00 am to 12:00 pm

The ERIS Infrastructure team will perform maintenance on the PostgreSQL Service on Saturday, April 27, from 9:00 a.m. - 12 p.m. EST.

Impact: All databases hosted on the ERIS PostgreSQL service will be impacted by one or two short reboots during this maintenance window.

Action Required: If you are the owner of a database on the PostgreSQL Service, we encourage you to check your applications after the maintenance window.

Completed

green - services online

Problem on Mysql.dipr.partners.org

April 22, 2019 11:04 am to 11:20 pm

Hi, We have a problem on mysql.dipr.partners.org and need to restart the node to get it online again. We are restarting mysql.dipr.partners.org

Completed Maintenance 

green - services online

Completed Emergency Postgres Maintenance

April 20, 2019 9:00 am to 10:15 am

The ERIS Infrastructure Team completed the emergency Postgres maintenance and all Postgres databases are online and available.

Pages