Stratus, is the leading cloud-based platform for MEP contractors. We’re on a mission to revolutionize the construction industry by providing innovative data driven solutions that seamlessly layer across a contractor’s entire workflow from design, to fabrication, to installation.
We are seeking a Senior Platform/DevOps Engineer to join our growing Platform Engineering team. This role will focus on building and maintaining automation, infrastructure as code, and platform tooling that enables our development teams to ship reliable software quickly across our multi-cloud infrastructure (Azure/AKS and AWS/EKS).
Core Responsibilities
Automation & Infrastructure
- Design, implement, and maintain infrastructure automation using Terraform/OpenTofu
- Build, optimize, and improve CI/CD pipelines and processes using GitHub Actions for .NET, Python, and Go applications
- Improve developer experience through workflow automation and tooling enhancements
- Develop processes to enable better testing and debugging of production issues in lower environments
- Develop Infrastructure as Code patterns using Kustomize and Helm for Kubernetes deployments
- Implement GitOps workflows using Flux for declarative infrastructure management
- Create self-service platform capabilities that empower development teams
- Automate operational tasks to reduce manual overhead and improve reliability
Platform Engineering
- Manage and optimize Kubernetes clusters (Azure AKS) across multiple environments (CI, QA, RC, Production)
- Contribute to maintaining and upgrading existing Azure infrastructure
- Contribute to Azure B2C authentication replacement/upgrade initiative planned for early 2026
- Contribute to AWS/EKS infrastructure research, planning, and buildout initiatives for 2026 expansion
- Design and implement platform services and tools that improve developer productivity
- Build and maintain observability infrastructure (Grafana, Prometheus, Loki, Tempo)
- Establish platform engineering best practices and standards across both cloud providers
- Collaborate with application teams to understand platform requirements
- Optimize resource utilization and cost efficiency across Azure and AWS infrastructure
Documentation & Knowledge Sharing
- Create comprehensive documentation for platform services, tools, and processes
- Develop runbooks and troubleshooting guides for operational procedures
- Build knowledge base for platform operations and best practices
- Conduct knowledge sharing sessions with team members and application developers
- Document architecture decisions and infrastructure patterns
- Maintain up-to-date system diagrams and technical documentation
Operations & Reliability
- Participate in on-call rotation for platform infrastructure support (required)
- Investigate and resolve infrastructure incidents
- Perform root cause analysis and implement preventive measures
- Monitor platform health and proactively address issues
- Contribute to incident response and post-mortem processes
Technical Skills
Cloud & Infrastructure (Required):
- 5+ years of experience with Azure cloud services (Azure primary focus)
- 5+ years of hands-on experience with Kubernetes
- Experience with AWS services and willingness to lead AWS/EKS expansion initiatives
- Deep understanding of Kubernetes architecture, networking, storage, and security
- Production experience with container orchestration and microservices architectures
- Multi-cloud architecture understanding and cross-cloud portability considerations
Infrastructure as Code (Required):
- Expert proficiency with Terraform or OpenTofu
- Strong experience with Kustomize and Helm for Kubernetes deployments
- Experience with GitOps methodologies and tools (Flux, ArgoCD, or similar)
- Understanding of declarative infrastructure management
CI/CD & Automation (Required):
- Strong experience with GitHub and GitHub Actions
- Proven track record of building and optimizing CI/CD pipelines
- Experience automating operational tasks using scripting (Bash, Python, or Go)
- Understanding of automated testing strategies and deployment patterns
Application Support (Required):
- Experience supporting .NET applications in production environments
- Experience with Javascript services and deployment patterns
- Familiarity with Python application deployment and runtime requirements
- Understanding of application observability and monitoring needs
DevOps Practices (Required):
- Strong understanding of DevOps principles and methodologies
- Experience with monitoring and observability tools (Prometheus, Grafana, or similar)
- Knowledge of logging aggregation systems (Loki, ELK, or similar)
- Understanding of distributed tracing concepts and tools
Professional Skills
Documentation & Communication (Critical):
- Exceptional technical writing skills with ability to create clear, comprehensive documentation
- Strong verbal communication skills for knowledge sharing and collaboration
- Experience creating runbooks, architecture diagrams, and technical specifications
- Ability to explain complex technical concepts to various audiences
Problem Solving & Discovery:
- Strong analytical and troubleshooting skills
- Proactive approach to identifying and solving problems
- Curiosity-driven mindset for discovering better solutions and practices
- Ability to balance pragmatic solutions with long-term architectural considerations
Collaboration & Leadership:
- Experience mentoring junior engineers and sharing knowledge
- Collaborative working style with ability to work independently
- Strong stakeholder management skills
- Experience working in cross-functional teams
Operational Excellence:
- Experience with on-call rotations and incident response
- Understanding of SRE principles and practices
- Focus on reliability, availability, and performance
- Experience with capacity planning and performance optimization
Preferred Qualifications
Advanced Technical Experience:
- Experience with service mesh technologies (Istio, Linkerd)
- Knowledge of Kubernetes operators and custom resource definitions (CRDs)
- Experience with distributed tracing systems (Tempo, Jaeger)
- Familiarity with policy enforcement tools (OPA, Kyverno)
- Experience with secrets management (Azure Key Vault, Vault, Sealed Secrets)
- Experience with advanced deployment strategies (Blue/Green, Canary, automated rollbacks)
Additional Skills:
- Certifications: Azure Administrator Associate, Azure DevOps Engineer Expert, CKA/CKAD
- Experience with Active Directory and Azure Active Directory (Entra ID)
- Experience with Spacelift or similar infrastructure orchestration platforms
- Cloud cost optimization experience and financial operations (FinOps) practices
- Experience with security scanning and compliance tooling
- Background in software development or site reliability engineering
- Experience with AI-powered tooling and workflow automation platforms
- Technical writing and standards documentation experience
Domain Knowledge:
- Experience in [relevant industry vertical]
- Understanding of compliance requirements (SOC2, FedRamp, etc.)
- Experience with multi-region deployments and disaster recovery
- Knowledge of networking fundamentals and Azure networking services