Cloud Reliability Engineer II

The Role:

We are seeking a dedicated Cloud Reliability Engineer to champion the reliability, availability, and security of our production SaaS platform. In this role, you will act as the first line of defense for cloud infrastructure, balancing your time between core production day to day operations —such as incident management, change management, monitoring, and triage—and automation to reduce operational toil. You will play a pivotal role in maintaining customer trust by strictly adhering to SLAs and compliance processes while driving continuous improvement through code.

What You'll Do:

Operational Excellence & Incident Management

  • Monitoring & Triage: Proactively monitor cloud infrastructure health to ensure high availability and performance. Act as the primary owner for production alert monitoring, triage, and swift resolution.
  • Incident Response: Manage critical incidents and escalations from identification to resolution. Lead root cause analysis (RCA) and post-incident reviews to minimize Mean Time To Recovery (MTTR) and prevent recurrence.
  • Change & Release Management: Execute and track production upgrades, multi-tenant deployments, and change requests within defined SLAs, ensuring zero-downtime maintenance where possible.
  • Escalation Support: Handle escalated Support cases and provide infrastructure support for field teams and other environments.
  • 24/7 Availability: Participate in a shift-based schedule and on-call rotation to provide round-the-clock support for critical production systems.
     

Automation & Continuous Improvement

  • Task Automation: Utilize Python and Jenkins to script and automate repetitive operational tasks, reducing manual intervention and increasing efficiency.
  • Tooling Optimization: Assist in maintaining and optimizing monitoring, alerting, and CI/CD tools to streamline workflows.
  • Process Evolution: Identify opportunities to shift left on operations, transforming manual runbooks into automated self-healing mechanisms over time.

What You Bring:

  • 2–5 years of professional experience in Cloud Operations, Site Reliability Engineering (SRE), or K8s administration. 
  • Hands-on experience with public cloud platforms (AWS, GCP, or Azure) in a production environment.
  • Operational knowledge of Kubernetes (EKS, GKE, or AKS) including troubleshooting and cluster management.
  • Moderate proficiency in scripting and automation, specifically using Python and Jenkins.
  • Strong understanding of ITIL processes (Incident, Change, Problem Management).
  • Demonstrated ability to prioritize tasks under pressure while maintaining strict SLAs.
  • Excellent collaboration skills to work effectively with Engineering, Product, and Support teams.
  • Bachelor’s degree in Computer Science, Information Technology, or equivalent work experience.


Preferred Skills:

  • Experience with Infrastructure as Code (IaC) tools such as Terraform, Ansible, or CloudFormation.
  • Familiarity with cloud-native observability tools (e.g., CloudWatch, Stackdriver, Prometheus, Grafana).
  • Strong Linux system administration and networking troubleshooting skills.
  • Background in supporting enterprise-grade SaaS platforms with strict compliance and security requirements.


Working Conditions:

  • Shift-Based Role: This position requires working in defined shifts to ensure global coverage.
  • On-Call: Regular participation in an on-call rotation is required.
  • Environment: Fast-paced, collaborative, and process-oriented environment with a strong focus on production stability.



Remote Work at ThoughtSpot

This role is available as a remote position.



About ThoughtSpot

The world’s most innovative companies turn to ThoughtSpot’s AI-Powered Analytics to put data in the hands of everyone, from the C-suite to the frontline. With simple, natural language search and AI, anyone can ask questions, discover insights, and act with confidence. Unlike legacy tools that sacrifice performance for complexity, ThoughtSpot is intuitively designed for every business user while being built to handle the most complex, large-scale data, wherever it resides. This unique combination of speed and simplicity is why enterprise leaders trust ThoughtSpot to transform decision-making into a truly data-driven culture.


At ThoughtSpot, we’re a curious, data-driven bunch. We believe the world works better when everyone has access to facts. That’s why we build products that make asking and answering data questions as natural as having a conversation.


Mandatory and Required Skills for All ThoughtSpot Roles

Spotters are expected to demonstrate AI literacy and workflow integration to include to ability to:

  • Comfortably and confidently integrate artificial intelligence into their daily workflow to increase productivity and quality.
  • Hands-on experience to leverage AI tools (industry-leading LLMs) to increase productivity, automate routine tasks, and improve work quality.
  • Speak to the experience of using AI for research, content creation, and document summarization while maintaining ownership of judgment and final decisions.
  • Write effective prompts to get the most accurate and creative results from AI tools.


Spotters are expected to exemplify these key traits and AI Mindset:

  • Curiosity in exploring new AI tools
  • Adaptability to quickly learn and implement new, emerging AI technologies
  • Critical thinking to know when to identify when AI should be used versus when human judgement is necessary


This combination of curiosity, adaptability, and discernment defines the AI mindset, and it’s required for every role at ThoughtSpot.


AI Mindset for All Spotters

At ThoughtSpot, we believe AI is a necessary and essential part of how we work. Every role, across every team, is expected to be fluent and comfortable with using AI to do their best work.


All Spotters are expected to experiment with ThoughtSpot’s AI tools (like Spotter and SpotterViz) and leading industry LLMs to streamline workflows, enhance output, and uncover new insights. Whether drafting content, analyzing data, or summarizing documents, AI is a daily partner. We value curiosity, openness to learning, and thoughtful application of AI to create real value. Training and resources are provided so every Spotter can confidently create with AI.


ThoughtSpot for All

At ThoughtSpot, diverse teams build better products. Complex data problems need many perspectives, not just one. We welcome different backgrounds, identities, and experiences, and we work to create a place where everyone can be themselves and do their best work. If this role excites you and you believe you’re a strong match, we encourage you to apply.


What Makes ThoughtSpot a Great Place to Work?

ThoughtSpot is the Agentic Analytics Platform that empowers every enterprise to transform insights into action, on a mission to make the world more fact driven. We hire people with unique identities, backgrounds, and perspectives - this balance-for-the-better philosophy is key to our success. When paired with our culture of Trust, Customer Obsession, Innovation and Intensity, ThoughtSpot cultivates a respectful culture that pushes norms to create world-class products. If you’re excited by the opportunity to work with some of the brightest minds in the business and make your mark on a truly innovative company, we invite you to read more about our mission, and apply to the role that’s right for you.

Engineering

Bengaluru, India

Udostępnij w:

Warunki korzystania z usługPrywatnośćPliki cookieUsługa działa z technologią Rippling