Mysql3 Shutdown and Retirement
We have an operational event that requires the shutdown of the Mysql Service database server mysql3.research.partners.org. With this shutdown, the ERIS Infrastructure team will migrate all MySQL databases that reside on mysql3.research.partners.org to mysql4.research.partners.org. The ERIS Infrastructure team will start this process on Friday evening July 19th starting at 6:00 pm and the target completion date is Sunday, July 21, 2019, at 6:00 pm. During this period all databases on mysql3.research.partners.org will be down until they are migrated to mysql4.research.partners.org.
This change will require that all database owners update the MySQL datasource definition on their applications and change the reference from mysql3.research.partners.org to mysql4.research.partners.org.
Please forward all questions to firstname.lastname@example.org.
Emergancy Reboot of rfanfs.research.partners.org
The ERIS Infrastructure team had to reboot rfanfs.research.partners.org on Saturday morning July 13th. The team started at 10:00am and the reboot and maintenance was completed by 11:00am. All applications and systems using a storage device from rfanfs.research.partners.org are asked to double check their storage device and should there be a problem, please send a note to email@example.com.
Partners HealthCare REDCap: Scheduled Downtime
REDCap (https://redcap.partners.org/redcap) upgrades to v8.10.20 scheduled for Tuesday July 16th, 2019 at 6:30AM EST are complete. REDCap was offline from 6:30-6:47 AM EST.
This release includes a number of bug fixes.
Release notes can be found here.
IMPORTANT NOTICE: REDCap Database Migration
In order to meet the increased demand, usage, and amount of data stored within the REDCap system, it is necessary to upgrade the database infrastructure.
As a result, a REDCap database migration is scheduled for Saturday July 20th, 2019 at 7:00PM EST. This will require an extended amount of downtime with REDCap being offline for approximately 14 hours.
This will help optimize our REDCap infrastructure, resulting in faster overall performance, and prevent extended downtime for maintenance in the future.
Additional alerts will be sent next week.
If you have any concerns about REDCap being offline at either of these times, please email firstname.lastname@example.org
GitLab: Follow-Up to the Outage on July 11, 2019
On July 11, 2019 the GitLab service experienced a significant outage. The service was unavailable between 7:15AM and 12:20PM.
What lead to the outage:
An upgrade to GitLab 11.11.5 was scheduled. As part of the upgrade tasks, we apply OS-level patches before beginning the application upgrade process. Some OS-level patches require a restart. In this case, a restart was required. After the patches were installed and reported as having completed successfully, the server was restarted. After the restart, the server was never able to come back up fully and on-screen messages indicated a drive failure/issue.
NOTE: The drive failure did not affect the storage location where the GitLab/Git repositories are stored!
How we recovered:
As part of our recovery procedure, we created a new Virtual Machine, applied all relevant configuration and re-attached the storage where all GitLab/Git repositories are hosted.
NOTE: The drive failure did not affect the storage location where the GitLab/Git repositories are stored! We used the nightly backup of the GitLab database and supporting services to recover the additional configuration needed to bring the service back up.
What You should look out for:
Our backup process runs at midnight. Therefore, and data entered within the User Interface of GitLab (issues, Wikis, etc) between Midnight and 7:15 AM may have been lost. This does NOT apply to code committed to the remote repositories. Any code that has been committed to GitLab should not have been affected and no data should have been lost.
Since we are operating on a new Virtual Server, those of you who use SSH in order to interact with the server will have to update your "known_hosts". In a lot of cases, you will be prompted with a message along the lines of "Remote Host Identification Has Changed" with some additional text regarding a possible Man-in-the-middle attack. This is an expected side-effect of the migration to a new host - please do not be alarmed. To resolve this issue, please refer to the message that you see on the screen and take note of what "known_hosts" file you are being prompted for. These are typically located in your home folder under the ".ssh" hidden folder, but they can be stored elsewhere. Once you know where the know_hosts (or known_hosts2 on some MACs) is located, open that file with a text editor, find any line that has "gitlab.partners.org" or "gitlab.dipr.partners.org" in it and remove it - remove the whole line of text. Then save the file. NOTE: by modifying this file you do NOT risk damaging any configurations you may have for connections to other servers. This file simply contains the fingerprints of servers you have connected to previously.
Please check any automated processes and runners to ensure that they are working as expected. If you experience any issues, please feel free to contact us for assistance at email@example.com with the words GitLab in the somewhere in the subject line.
Q: Why did it take so long?
A: The outage was prolonged for a number of reasons. Chief among which was the fact that we had not done a full-scale disaster recovery of the service and had to take careful steps to ensure that no data was lost or corrupted.
Q: Are there any up-sides to this crash?
A: Actually, yes. We are now running on new-er Operating System and have more resources provisioned to the machine. We can also now use ED25519 keys for SSH communication with GitLab (as opposed to RSA only previously).
Q: Would the upgrade to GitLab 11.11.5 take place soon?
A: Yes. The upgrade to 11.11.5 will be re-scheduled. A separate notification will be sent out for that when we are sure that the new system is fully validated by you - the users.
GitLab Upgrade Completed on June 28, 2019
The GitLab service (https://gitlab.partners.org/) upgrade is scheduled for Friday, June 28th, 2019 from 7 to 8 AM. We are upgrading from v11.7.11 to v11.11.3 and the upgrade is scheduled to take up to 1 hour to complete. During this time, the service will be unavailable. This upgrade includes: Critical security patches, Bug fixes, and Minor new features.
For a complete list of changes, please refer to the GitLab project release notes:
Version 11.8 - https://about.gitlab.com/2019/02/22/gitlab-11-8-released/
Version 11.9 - https://about.gitlab.com/2019/03/22/gitlab-11-9-released/
Version 11.10 - https://about.gitlab.com/2019/04/22/gitlab-11-10-released/
Version 11.11 - https://about.gitlab.com/2019/05/22/gitlab-11-11-released/
For any questions or concerns regarding this upgrade, please contact firstname.lastname@example.org.
All mysql3 database password will expire on June 26, 2019
All database passwords on mysql3.research.partners.org will expire on June 26, 2019 at 10:30am which will impact all applications that use mysql3.research.partners.org as a database server. Please send a note to email@example.com and identify your database, the database owner, and the mysql UserID's needed and we will coordinate a updated password for your team.
Please send all comments and questions to firstname.lastname@example.org
Partners HealthCare REDCap: Unscheduled Maintenance Required
REDCap (https://redcap.partners.org/redcap) requires server maintenance. REDCap will be offline tonight Tuesday June 25th, 2019 from approximately 9:00-9:30PM EST.
There will be a short interruption of service during the maintenance window.
If you have any concerns, please email email@example.com
RISC Website Maintenance 26 Jun 2019
Research IS & Computing website maintenance completed on Wednesday, June 26. The https://rc.partners.org website was offline during from 7:15-7:45a.m. EDT to apply security patches and bug fixes.
Action Required: Linux Updates
Applies To: Linux & FreeBSD (SACK Panic)
A new vulnerability has been discovered in the Linux kernel handling TCP Selective Acknowledgments (SACKs). A remote attacker could use this to cause a denial of service attack, interrupting systems operations.
This affects many current distributions being used, as RHEL 4, 6, 7 and 8 and Ubuntu 12.xx to 19.xx (or kernels 2.6.29 and above). This is being tracked as CVE-2019-11477 and is considered important and high impact.
Please note that this might also impact appliances or IoT devices built based on those versions of the Linux kernel.
Some patches are already available and some vendors are still issuing software packages to fix the kernel vulnerability. If you manage Linux or FreeBSD systems please patch asap and reboot accordingly. If you can’t patch a system, please use compensating controls (as sysctl filters disabling tcp probing) as appropriate.
Firefox Web Browser Update Needed
Security vulnerabilities fixed in Firefox 67.0.3 and Firefox ESR 60.7.1
Please update your Firefox web browser.
- Open Firefox
- Select About Firefox
- Select Update
Announced: June 18, 2019
Firefox, Firefox ESR
Firefox ESR 60.7.1
List Manager (Lyris) Alert: Downtime on 23 June 2019
The database was successfully migrated to the new server, all systems are green. If you have any questions or concerns, relating to this change, please email us at firstname.lastname@example.org.
On June 23rd at 8 AM we will be migrating our backend database for List Manager (Lyris) from Microsoft SQL 2008 to Microsoft SQL 2016. During this migration, the List Manager web application (https://researchlist.partners.org and https://researchlistadmin.partners.org) will not be available. We will be moving the database from one system to another. (CHG0147108). Any scheduled emails will be sent after the system comes back online. If you have any questions or concerns, relating to this change, please email us at email@example.com.
REDCap Service Alert: Scheduled Downtime
REDCap (https://redcap.partners.org/redcap) upgrades to v8.10.18 scheduled for Tuesday June 18th, 6:30AM EST are complete. REDCap was offline from 6:28 to 6:44 AM EST.
This upgrade will include a number of new features and bug fixes. These include:
- New feature: REDCap Messenger
- REDCap Messenger is a communication platform built directly into REDCap. It allows REDCap users to communicate easily and efficiently with each other in a secure manner. At its core, REDCap Messenger is a chat application that enables REDCap users to send one-on-one direct messages or to organize group conversations with other REDCap users. REDCap Messenger is also the best and easiest way to share documents with other REDCap users, in which you can upload documents and embed pictures inside any given conversation.
- Watch 10-minute video on REDCap Messenger
- Improvement: Performance boost – Certain pages in projects with thousands or more records should now load much faster in most cases. This includes the Record Status Dashboard, various pages utilizing Data Access Groups, and certain reports. Reports A and B should especially see significantly faster loading (excluding when viewing “all” pages in report A or B).
- Change: Changed the text "Manage Survey Participants" to "Survey Distribution Tools," which more clearly describes the pages in that section.
- New feature: Report Folders - Reports can now be organized into folders in any given project. If a user has "Add/Edit Reports" privileges, they will see an "Organize" link on the left-hand project menu above the project’s reports. They will be able to create folders and then assign their reports to a folder, after which the project's reports will be displayed in collapsible groups on the left-hand menu.
- New feature: “Edit Access” for reports - In addition to setting "View Access" when creating or editing a report, users can now set the report's "Edit Access" (under Step 1) to control who in the project can edit, copy, or delete the report. This setting will be very useful if one wishes to prevent certain users from modifying or deleting particular reports.
- Improvement: A project's Record ID field can now be used as a Live Filter in any given report, thus allowing users to easily view the report for a single record.
- Improvement: New optional parameters added to the API Export Records method to filter data returned based on when a record was created or modified
- dateRangeBegin – To return only records that have been created or modified *after* a given date/time, provide a timestamp in the format YYYY-MM-DD HH:MM:SS (e.g., '2017-01-01 00:00:00' for January 1, 2017 at midnight server time). If not specified, it will assume no begin time.
- dateRangeEnd – To return only records that have been created or modified *before* a given date/time, provide a timestamp in the format YYYY-MM-DD HH:MM:SS (e.g., '2017-01-01 00:00:00' for January 1, 2017 at midnight server time). If not specified, it will use the current server time.
If you have any concerns about REDCap being offline at this time or any questions about these new features, please email firstname.lastname@example.org
Freezerworks Service Alert: Maintenance
Partners Healthcare Freezerworks: Scheduled Downtime
On Thursday June 6th from approximately 6 pm to 6:30 pm (EST), the Freezerworks application will be taken offline for maintenance. During this timeframe, we ask users to log out of the system so we can complete this necessary work. We apologize for any inconvenience during this interruption.
If you have any questions or concerns about email: email@example.com
CrashPlan Access Resolved
The authentication server backup01.partners.org is up and running. All services are green. If you are having any issues email us at firstname.lastname@example.org.
Currently, our CrashPlan authentication system (backup01.partners.org) is not functioning correctly. We are in the process of rebuilding the system. During this time you will not be able to backup or restore files. Your backup client might report we are out of space, but that is a false message. We will up this alert when everything is resolved.
MySQL and DIPR VM Problems
The ERIS MySQL Service and parts of the DIPR Virtual Machine service experienced problems over the weekend and had to be rebooted. The outages were related to the Compellent patching on Saturday, June 1st however, some work extended into Sunday, June 2nd and Monday, June 3rd. If you experience a problem with a DIPR VM please send a note to email@example.com and the ERIS Infrastructure team will investigate.
Research Computing Ticketing System 29 May 2019
Research Computing Ticketing System Now Online
The email component of the internal ticketing system (Kayako) used by ERIS has not been working properly. The issues started Wednesday, May 29 around 12p.m. EDT and are not yet fully resolved.
The system may be delivering several of the same email or none at all. At times, ERIS was not receiving your tickets at all, so slightly longer than usual response times may be experienced.
Emails and requests sent to the following addresses were not accessed during this service disruption:
firstname.lastname@example.org - REDCap-generated requests to approve, copy or move projects to production have not been received.
Please allow our staff time to catch up on the tickets submitted while the system was offline.
ACTION REQUIRED: Urgent Microsoft Patching
Microsoft has taken the unusual step of releasing sweeping system patches for older or unsupported but widely used Windows operating systems on Tuesday May 14 that include one critical patch that should be applied immediately. Details Here
Oracle PPM Database Work Extending Past Noontime
The Oracle patches scheduled for the Partners Personalized Medicine team on Saturday morning is taking longer than anticipated and will extending past noontime. We anticipate another hour of patching and rebooting as we should complete our efforts by 1:00pm.
Please forward any questions or concerns to email@example.com
Completed the ERIS PostgreSQL Service Maintenance on Saturday
The ERIS Infrastructure team will perform maintenance on the PostgreSQL Service on Saturday, April 27, from 9:00 a.m. - 12 p.m. EST.
Impact: All databases hosted on the ERIS PostgreSQL service will be impacted by one or two short reboots during this maintenance window.
Action Required: If you are the owner of a database on the PostgreSQL Service, we encourage you to check your applications after the maintenance window.