Site Reliability Engineer II

At Advisor360°, we build technology that transforms how wealth management firms operate, scale, and serve their clients. As a leading SaaS platform in the fintech space, we’re trusted by some of the largest independent broker-dealers and RIAs to power the full advisor and client experience—from portfolio construction and reporting to compliance and client engagement.

What sets us apart? It's not just the tech (though it's best-in-class). It's the people, the purpose, and the passion behind everything we do. We’re a team of builders, thinkers, and doers who believe that great companies are defined by the stories they tell and the experiences they create—internally and externally. We bring deep industry expertise, a collaborative spirit, and a commitment to innovation as we reshape what’s possible in wealth management. 

As we grow, we’re looking for teammates who are ready to roll up their sleeves, think big, and help elevate our brand in a way that reflects the bold ambitions we have for our company and the clients we serve. 

Join us, and be part of a company that's not only moving fast—but making it count. 


Site Reliability Engineer II

​We are seeking a highly motivated Site Reliability Engineer (SRE) to join our team and drive operational excellence across our systems. In this role, you will serve as a key steward of reliability, scalability, and performance for our mission-critical SaaS platform.

You will operate at the intersection of software engineering and operations, applying SRE principles to improving system reliability, reducing operational toil through automation, and enhancing observability across the platform. As an SRE, you will play a critical role in maintaining the health of production environments, proactively identifying risks, and ensuring rapid and effective incident response.  If you are passionate about automation, operational excellence, and building highly reliable systems at scale—and thrive in fast-paced, high-impact environments—this role is for you.


Key responsibilities

  • Implement, maintain, and support infrastructure and system environments across cloud and hybrid platforms
  • Design and implement monitoring, alerting, and observability solutions (e.g., Dynatrace, Grafana, Datadog)
  • Build dashboards and alerting frameworks that provide actionable insights and reduce mean time to detection (MTTD)
  • Define and manage SLIs, SLOs, and error budgets to establish measurable reliability targets and drive data-driven decisions
  • Lead incident response efforts, including troubleshooting, root cause analysis (RCA), and post-incident improvements
  • Implement, automate and manage effective OS patching and upgrade processes to ensure security, stability, and compliance
  • Develop and maintain automation for deployment, scaling, recovery, and operational tasks using Python, Go, Bash, or PowerShell
  • Proactively identify risks, bottlenecks, and reliability gaps, and drive remediation efforts
  • Collaborate with engineering teams to improve application reliability, performance, and scalability
  • Partner with cross-functional teams (Engineering, Security, Platform, support) to embed reliability practices
  • Create and maintain runbooks, playbooks, and operational documentation with an automation-first mindset
  • Participate in on-call rotations and contribute to a sustainable, balanced on-call culture
  • Mentor junior engineers and advocate for SRE best practices across the organization


Required Skills & Qualifications

  • 5+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, or Systems Engineering
  • Proven experience operating and supporting large-scale distributed systems in production environments
  • Strong understanding of SRE principles and practices, including SLIs/SLOs, error budgets, and reliability engineering methodologies
  • Strong programming and automation skills in Python, Go, Bash, or similar scripting languages
  • Demonstrated experience on automation and execution on OS patching and upgrade processes
  • Deep expertise in monitoring and observability platforms such as Dynatrace, Splunk, Prometheus, Grafana, and ELK stack
  • Well versed  in all aspects of incident management, including troubleshooting, root cause analysis (RCA), and post-incident improvements
  • Experience working with relational and NoSQL databases in high-availability and production-grade environments
  • Understanding of networking fundamentals, including TCP/IP, DNS, load balancing, CDN, and firewalls
  • Hands-on experience with cloud platforms (AWS, Azure, or GCP) and associated managed services

 

Why You’ll Love Working Here: 

It’s not just about work—it’s about building a career and enjoying the ride! Here’s what you can expect:

We believe in recognizing and rewarding performance. Our compensation package includes competitive base salaries, annual performance-based bonuses, and the chance to share in the equity value you and your colleagues create during your time with the company. We offer comprehensive health benefits, including dental, life, and disability insurance. We also trust our employees to manage their time effectively, which is why we offer an unlimited paid time off program to help you perform at your best every day.  

Join us on this journey. Advisor360° is an equal opportunity employer committed to a diverse workforce. We believe diversity drives innovation and are therefore building a company where people of all backgrounds are truly welcome and included. Everyone is encouraged to bring their unique, authentic selves to work each and every day. The way we see it, we are here to learn from each other.    

Client Ops & Services

Bangalore, India

Deel met:

Algemene voorwaardenPrivacyCookiesPowered by Rippling