Partial Tiger kubernetes outage

Incident Report for OSG Consortium

Resolved

No further issues have been reported; closing incident.
Posted Mar 03, 2025 - 17:51 UTC

Monitoring

Issues with the UW-Madison campus network interrupted the operation of the Tiger kubernetes cluster. Campus IT staff implemented fixes to address said issues and allow the resumption of Tiger operations.
We are monitoring the situation to ensure that fixes work as intended.
Posted Feb 28, 2025 - 19:49 UTC

Update

Correction: this is an outage of the Tiger kubernetes cluster, not the Tempest kubernetes cluster.
Posted Feb 28, 2025 - 16:06 UTC

Investigating

Issues with the *Tiger* kubernetes cluster is affecting several systems managed by the OSG, including (but not necessarily limited to) OSDF central services and glide-in factories.
This outage has a negative impact on data transfers and the size of collaboration pools.
We are working to resolve the underlying cause.
Posted Feb 28, 2025 - 16:03 UTC
This incident affected: Kubernetes Infrastructure (Tiger) and GlideinWMS Factory.