Data Center loss of power affecting many services
Incident Report for OSG Consortium
Resolved
All services have been restored at the affected data center.
Posted Apr 16, 2024 - 20:03 UTC
Update
We are continuing to investigate this issue.
Posted Apr 16, 2024 - 17:08 UTC
Investigating
The data center hosting many of the OSG's services lost power, specifically housing the Tiger kubernetes cluster. Some highlighted service outages:

- ap40.uw.osg-htc.org is not available at all.
- The OSPool central manager is down. This means that although ap21 and ap20 are operational, no new jobs will start but currently running jobs will continue to run.
- The OSDF federation will not work. Some accesses of the OSDF may work if the data is already cached on a cache, but the pelican client and the older stashcp will not work.
- OSG software repositories are offline.
- OSG accounting webpage gracc.opensciencegrid.org and all associated services is offline.

No data loss on any services is expected at this time, only a disruption of service.
Posted Apr 16, 2024 - 17:06 UTC
This incident affected: Software Repositories (Yum Repos, GridCF Repo, OSG Hub), Open Science Data Federation (StashCache Redirector, Data Federation Accounting Service, Caches), Accounting (GRACC Frontend, GRACC Backend, GRACC APEL Reporting), Websites (Display, Topology), Hosted GlideinWMS (IGWN GWMS Frontend, JLAB GWMS Frontend, GLUEX GWMS Frontend), OSPool (AP 40, AP 21, AP 20, OSPool GlidenWMS Frontend, Jupyter Notebooks), Kubernetes Infrastructure (Tiger), PATh Facility (AP 1, AP 1 Origin), Hosted CEs (Hosted CE Infrastructure), and GlideinWMS Factory.