Era4

Technical Product Manager - AI Cloud Infrastructure

Era4 develops, owns and operates AI infrastructure across the UK, powered by renewable energy. Converting legacy industrial and energy sites into modern data-centre facilities, Era4 is combining brownfield regeneration opportunities with cleaner, efficient, scalable compute capacity for healthcare, research, finance, enterprise, and public-sector organisations


Role Summary:

We are seeking a Technical Product Manager – AI Cloud Infrastructure to join our fast-scaling team. In this role, you will embed with engineering to act as the "First Customer," owning the continuous validation, reliability strategy, and technical documentation for our bare-metal, VM, Kubernetes, and ML infrastructure. By treating testability as a core feature and shadowing real-world workflows, you will ensure our compute platform handles the demands of advanced AI training and engineering workloads. This is an opportunity to join a mission-led AI business that is redefining infrastructure, intelligence, and impact for enterprise customers.

 

Key Responsibilities:

  • Execute integration testing in staging environments, work closely with the platform engineers to build repeatable test frameworks, and shadow internal and external AI infrastructure engineers to translate their real-world usage patterns into automated in-house test cases.
  • Establish strict quality gates, performance SLOs, and scheduling benchmarks that our compute and orchestration services must pass before production deployment.
  • Review, refine, and author technical guides, API documentation, and CLI guides, using them as the blueprint to test the platform exactly as an external engineer would.
  • Partner with software and platform engineers to design robust validation suites, anticipating complex edge cases and structural failure modes across bare-metal provisioning and Kubernetes cluster lifecycles.


Essential Experience:

  • Technical familiarity with bare-metal infrastructure (e.g., PXE booting, IPMI/Redfish), virtualization layers (e.g., KVM), and container orchestration (Kubernetes or similar).
  • Track record designing comprehensive test strategies, validation frameworks, and acceptance criteria for highly technical cloud-native, API, or infrastructure-as-a-service (IaaS) products.
  • Analyse infrastructure services, CLIs, and APIs from a developer’s perspective to identify friction points, usability gaps, and reliability risks.
  • Working knowledge of modern CI/CD pipelines, automated testing, and automation tooling (e.g., GitLab CI, GitHub Actions, Terraform, Ansible) to help engineering shape automated quality gates.
  • Proven experience in a highly technical role embedded directly within a core infrastructure or platform engineering team.


One or more would be an advantage:

  • Direct exposure to high-performance computing (HPC) setups, large-scale cluster scheduling (e.g., Slurm), or infrastructure optimized for heavy AI/ML training workloads.
  • Experience using cloud observability, telemetry, and monitoring tools (e.g., Prometheus, Grafana, Datadog) to track and improve system reliability metrics.
  • Experience writing or structuring technical documentation, API reference guides, and developer tutorials from scratch.

 

 Why Join Era4:

You’ll be joining a mission-driven start-up building critical national infrastructure, where operational excellence directly enables growth. This role offers high visibility with leadership, real autonomy, and the chance to shape how a next-generation company operates at scale.

 

Diversity & Inclusion:

Era4 is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Executive & Operations

London, United Kingdom (Visit to office required)

Teilen auf:

NutzungsbedingungenDatenschutzCookiesPowered by Rippling