Internet Service Provider (ISP) Outage
Incident Report for SCSU System Status
Postmortem

REASON FOR OUTAGE (RFO) Date: 2024-01-26 Incident Number(s):INC0163736 Member(s): Various RFO Number: 2024-01-25-01 Service Affected: DIA, DNS, FW, filtering Incident Start Time: 2024-01-25 11:47 PM Incident Clear Time: 2024-01-26 12:52 AM CEN is committed to member satisfaction and understands the significance of an impact on service. In the event of a service disruption, CEN works diligently to provide a resolution and to communicate helpful updates in a timely manner. We love the network as much as you and sincerely regret any disruption this may have caused. CAUSE Unexpected behavior and loss of SSH connectivity on the West Haven (WH) core router when performing Scheduled Maintenance: Configuration, Remaining RRs, Route-Reflecting Internet IPv4 Prefixes, CHG0075150 RESOLUTION Remote console based reboot of the WH router and roll back of related configs on adjacent core routers. SUMMARY The scheduled work was to optimize routing between core routers as part of a clean-up effort ahead of our Next Generation Infrastructure project implementation. The change would halved the memory consumption on our core internet routers, an optimization which would carry into the target-state hardware and simplify the upcoming migrations. This maintenance to revise route-policies was applied to the Storrs and West Haven core Junipers, following a recent, successful effort in Hartford. While Storr's Juniper MX change completed within the window, impacts occurred when CEN engineers encountered unexpected behavior in West Haven when the route-policies did not process as expected, impairing the default-route advertisement on our route-reflectors. Simultaneously, the WH-MX became unresponsive, challenging rollback attempts to back out of the change. The focus became to recover DIA service by modifying the Hartford and Storrs MX's route-policies for the RRs to resume DR advertisements, and recovering the WH-MX with a reload on both routing-engines simultaneously. Even after the reboot, CEN engineers observed the BGP topology table size continue to increase well past expectations, and backing out of the changes in both West-Haven and Storrs eventually stabilized the environment. CEN will investigate and analyze the outputs captured yesterday evening for a more successful reattempt in the coming weeks. ADDITIONAL INFORMATION None. This letter is for informational purposes only and is not meant to be an admission of liability (or otherwise) on the part of CEN. This letter does not amend or otherwise alter your rights or those of CEN as specified in the Network Access Service Agreement. We hope that the information provided has been helpful.

Posted Jan 26, 2024 - 14:20 EST

Resolved
Our ISP, CEN, conducted an infrastructure maintenance yesterday evening to optimize routing between core routers. There were unexpected impacts when the route-policies did not process as expected and one of the routers became unresponsive.
The focus became to recover service by backing out of the change and reboot the router.
Posted Jan 25, 2024 - 22:00 EST