Infrastructure Operations Manager

Era4 develops, owns and operates AI infrastructure across the UK, powered by renewable energy. Converting legacy industrial and energy sites into modern data-centre facilities, Era4 is combining brownfield regeneration opportunities with cleaner, efficient, scalable compute capacity for healthcare, research, finance, enterprise, and public-sector organisations.

Role Summary:

We are hiring a hands-on Infrastructure Operations Manager to support and lead a Technical Operations Centre across SRE, NOC and Service Desk, this is a people leadership role with technical depth, you must be able to lead teams and step into P1/P2 incidents, interpret infrastructure signals (Linux, containers, observability), and drive resolution. The ideal candidate will combine operational leadership, incident command, customer communication and enough hands-on technical depth to lead confidently in a AI/HPC production environment.

Key Responsibilities:

Support & Lead the Function:

Support and run a 24x7 Technical Operations Centre.
Lead and develop a team of engineers across SRE, NOC and Service Desk.
Define processes, escalation paths, rotas and operating rhythms.

Incident Command & Operations:

Own P1/P2 incident response and act as escalation lead.
Run incident bridges and coordinate internal teams and vendors.
Drive post-incident reviews and continuous improvement.

Service Performance & Reliability:

Own SLA/SLO performance, MTTR, incident trends and reporting.
Embed SRE principles (error budgets, observability, automation).
Align operations with customer impact and engineering priorities.

Technical Leadership:

Remain hands-on enough to guide triage and challenge assumptions.
Interpret logs, dashboards and alerts across infrastructure.
Translate technical issues into clear customer communication.

Essential Experience:

Experience within AI/HPC, GPU infrastructure, data centre or cloud environments.
Experience leading technical operations / SRE / NOC / infrastructure teams.
Proven incident command / P1-P2 escalation leadership.
Strong technical grounding: Linux, containers, infrastructure, observability.
Experience with SLA/SLO, MTTR, ITSM (incident/problem/change).
Customer-facing communication (tickets → service reviews).

Preferred Experience

NVIDIA / InfiniBand / high-performance compute exposure.
Tools: Grafana, Prometheus, ServiceNow (or similar).
IaC / automation (Terraform, Ansible, GitOps).
AIOps, alerting, self-healing systems.

Why Join Era4:

You’ll be joining a mission-driven start-up building critical national infrastructure, where operational excellence directly enables growth. This role offers high visibility with leadership, real autonomy, and the chance to shape how a next-generation company operates at scale.

Diversity & Inclusion:

Era4 is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Executive & Operations

United Kingdom: (Hybrid - Visit to office / site locations required)

Partager sur :

Conditions d’utilisation Confidentialité Cookies Alimenté par Rippling