Senior Site Reliability Engineer

About Latitude.sh

Latitude.sh global computing platform was launched in 2019, enabling businesses to programmatically deploy single-tenant Bare Metal instances in different parts of the world.

We are a team of passionate individuals about hardware, software, and network infrastructure looking to build the fastest, easiest-to-use, developer-centric single-tenant Cloud infrastructure. If you share this passion, join our growing team of talented people and help build the future of the Internet.

Summary

At Latitude.sh, the Reliability team is responsible for the health and resilience of the infrastructure that powers our global bare metal cloud. As a Senior Site Reliability Engineer (SRE), you’ll focus on building reliable, observable, and self-healing systems at scale.

SREs at Latitude.sh work at the intersection of software engineering and infrastructure. You’ll design and implement tools that automate operations, improve incident response, and enhance system observability, ensuring our platform is always ready for the workloads of our customers.

This might be a good opportunity if you’re passionate about reliability, automation, and creating cloud-like experiences for bare metal infrastructure.

Key Responsabilities

Continuously improve Latitude.sh’s platform reliability and performance.
Design, build, and maintain tools to automate operational tasks and incident response.
Implement and improve observability solutions, including monitoring, alerting, and tracing.
Collaborate with engineering and platform teams to design scalable and resilient systems.
Participate in on-call rotations and lead post-incident reviews with a focus on learning.
Develop and document processes and runbooks that ensure operational excellence.
Contribute to SLOs/SLIs definition and reliability metrics adoption across teams.

Skills and Qualifications

Strong verbal and written English communication skills.
Advanced knowledge of Linux/Unix systems in production environments.
Experience with Kubernetes and container orchestration.
Proficiency with infrastructure automation tools (e.g., Terraform, Ansible).
Experience with observability stacks (e.g., Prometheus, Grafana, Loki, ELK).
Familiarity with scripting and programming languages such as Bash, Python, Go, or Ruby.
Working knowledge of Git and CI/CD pipelines.
Solid understanding of incident management and root cause analysis processes.
Knowledge of cloud-native reliability and security best practices.

What do we offer?

Contractor (PJ);
Paid Time Off;
Competitive Compensation;
Annual Bonus based on company and team performance;
Flexible work hours
Opportunities for professional growth and development.

Why Latitude.sh?

We're a lean, agile team of passionate professionals who believe in the power of innovation and creative problem-solving. As part of our team, you won't be lost in the crowd – you'll be an essential contributor, making a real impact from day one.

Our values at Latitude.sh guide us in all our work and partnerships. We're proud to be an inclusive company, and we welcome all applicants for our open positions, regardless of their background, religion, sexual orientation, gender identity, age, nationality, or disability. If these values speak to you, we'd love for you to become a part of our team.

Engineering

Global

Share on: