Senior Site Reliability Engineer (DevOps)

About Azira LLC

Azira is a data-first media and insights company on a mission to reinvent how brands use data to make smarter decisions, from where to open their next location to how they connect with customers in the real world. We blend marketing, location analytics, and strategy into a single platform, helping leading brands take action with confidence. We move fast, think boldly, and care deeply about building things that matter.

Why This Role Matters

Azira is seeking a Senior Site Reliability Engineer to perform day-to-day activities that support the company’s data centers, software, and application platforms that service the entire business. It is a demanding role that requires the candidate to be capable of working with cross-functional teams and diagnosing complex issues on various platforms. At Azira, an SRE is essentially a cloud infrastructure engineer, focusing on ensuring the reliability, scalability, and efficiency of our systems.

The ideal candidate should have extensive experience with cloud infrastructure, as well as superior troubleshooting skills and knowledge of monitoring and alerting mechanisms.

What You’ll Do

Manage large-scale production environment and mission-critical cloud infrastructure.
Handle stability, automation, scalability, deployment, monitoring, alerting, and security and ensure maximum availability of Azira’s tech infrastructure.
Manage distributed big data systems composed of Kafka, EMR, Spark, MongoDB, Elasticsearch, Redis/Valkey, Google AppEngine, and other cloud services.
Work closely with big data, data science, and software engineering teams to ensure the infrastructure is capable of serving current and future needs, and work independently when needed.
Set up monitoring of and create and maintain operational run-books.
Participate in 24/7 on-call support roles on a rotational basis as needed.
Influence, create, and contribute to the automation platform.
Ability to work independently and take complete ownership of assigned modules, including collaborating with other teams.

What You Bring

Bachelor’s/Master’s degree in B.Tech/M.Tech.
6-8 years of working experience as a Site Reliability Engineer
Experience with RHEL, CentOS, or Ubuntu system administration
3+ years of strong proficiency with essential Google Cloud and AWS services, including IAM, S3/buckets, VPCs
2+ years of, experience working with automation tools such as Terraform and Ansible
4+ years of strong knowledge of DevOps principles and the use of CI/CD tools such as Github Actions, Jenkins, Artifactory, Nexus, Bitbucket, etc.
1+ years of experience with observability/monitoring tools such as Prometheus or Grafana
High-level technical experience with front-end web technologies, CDN, and web server configuration (Apache/Nginx)
2+ years of, experience with container orchestration services including Docker or Kubernetes
2+ years of, experience with defining and deploying monitoring, metrics, and logging systems.
4+ years of experience with source code versioning and pull requests with Git
Proficiency in documenting processes and monitoring performance metrics.
Good interpersonal and communication skills are necessary to work effectively with other team members.

Nice to have

Solid experience in Bash scripting; proficiency in Python or other languages is a plus.

Why You’ll Love It Here

Competitive compensation package
Comprehensive health and wellness benefits
Flexible, hybrid work culture with a supportive environment
High-impact opportunities with low ego and big ambition

Meaningful mission with a fun, collaborative team atmosphere.

Engineering

Bengaluru, India

Share on: