Urgency: Unscheduled, Incident, 19-02-2024 03.00
Impacted services: Tuxis network
Monitoring notified us of high latency on several nodes in our datacenter in Amsterdam. Investigation revealed that one of the routers in Amsterdam was utilizing 100% CPU. As there were no processes that could be identified as the source for this high CPU usage, we rebooted the router.
The secondary router has taken over most of the traffic. Customers may have experienced higher latency on their internet traffic between 02.00 and 03.15.
[Update 04.00]
The router is still misbehaving after a reboot. We are investigating further.
[Update 05:30]
An engineer is on location physically powering down the router.
[Update 06:00]
It seems a partly failing powersupply was causing the router hardware to get confused, which kept the router running but extremely busy. In the proces, we have upgraded the firmware of the BMC, the Bios and the NIC's in the router. We also replaced the powersupply with a spare powersupply.