Site Reliability Engineer

Company Description:

Camlin is a global technology leader that operates with the vision of bringing revolutionary products to life for a wide range of industries, including power and rail, and also has interests in a number of R&D projects in a variety of scientific sectors.


At Camlin we believe in high quality engineering and design, allowing us to develop market leading products and services. In short, we love creating value for our customers by solving difficult problems. As of today, the Camlin operation spans over 20 countries across the globe.

We are seeking a dedicated and experienced Site Reliability Engineer (SRE) to join our dynamic team. The SRE will be responsible for ensuring the reliability, performance, and availability of our critical systems and services. This role requires a blend of software engineering and operations skills to build and run large-scale, distributed, fault-tolerant systems. 


Key Responsibilities: 

  1. System Reliability and Performance: 

  • Design, implement, and maintain scalable and reliable infrastructure. 

  • Monitor system performance, detect issues, and ensure maximum uptime. 

  • Develop and implement strategies for disaster recovery and data backup. 

  1. Automation and Tooling: 

  • Automate repetitive tasks to improve efficiency and reduce human error. 

  • Build and maintain tools for deployment, monitoring, and operations. 

  • Create and maintain CI/CD pipelines to streamline application delivery. 

  1. Incident Management: 

  • Respond to and resolve incidents, minimizing impact on customers. 

  • Conduct post-incident reviews to identify root causes and prevent recurrence. 

  • Develop and maintain incident response protocols and playbooks. 

  1. Collaboration and Communication: 

  • Work closely with development teams to integrate reliability into the software development lifecycle. 

  • Communicate effectively with stakeholders about system status and health. 

  • Provide guidance and mentorship to junior team members. 

  1. Security and Compliance: 

  • Ensure systems comply with security standards and best practices. 

  • Implement and maintain security measures, including patch management and vulnerability assessments. 

  • Assist in audits and compliance initiatives as required. 

 

Required Qualifications: 

  • Bachelor's degree in Computer Science, Engineering, or a related field. 

  • 4+ years of hands-on experience in Site Reliability Engineering or DevOps role. 

  • Strong experience in maintaining cloud platforms (e.g., AWS, Azure). 

  • Proficiency in programming and scripting languages (e.g., Python, Go, Bash). 

  • Experience with infrastructure automation and container orchestration tools - (e.g., Docker, Kubernetes, Terraform, Ansible, Helm etc) 

  • Familiarity with continuous integration and deployment tools (e.g., Gitlab CI , Argo workflow ,Argo CD.) 

  • Experience in managing distributed systems like Kafka 

  • Experience with monitoring/logging solutions (e.g., DataDog, ELK, Prometheus.) 

  • Good understanding of concepts related to computer architecture, data structures and programming practices.  

  • Solid understanding of networking, databases, and security principles. 

  • Excellent problem-solving skills and attention to detail. Strong debugging / troubleshooting skills. 

  • Strong communication and collaboration skills.  

  • Success at Camlin demands demonstrable cultural traits such as being a fast learner, Adaptable to changing landscape and most importantly a strong believer in being hands-on. 

 

Nice to Haves: 

  • Kubernetes Certification. 

  • AWS Certifications. 

  • Linux Certifications - RHCE. 

  • Open Source Contributions. 

  • DataPlatform Operations. 

Engineering

Beograd, Serbia

Share on:

Terms of servicePrivacyCookiesPowered by Rippling