Site Reliability Engineer

Site Reliability Engineer - Remote

About Fullcast, Inc

Fullcast empowers Go-To-Market leaders to make revenue predictable and attainable through innovative technologies in territory design, capacity planning, target management, and enhanced data routing & hygiene.

Our team is composed of industry experts with an aspiration to revolutionize the Revenue Operations landscape by building the go-to-market cloud. Our core values are Stewardship, Impact, Simplicity, and Grit.

Summary

We are seeking a Senior Site Reliability Engineer (SRE) who is an expert in CI/CD automation, GitOps practices, compliance and aggressive security posture for a SaaS platform, cloud infrastructure, and software-driven automation. You will play a critical role in maintaining the high availability and performance of Fullcast’s platform while ensuring security and compliance standards are met.

This role requires deep expertise in GitHub Actions, Terraform, AWS, SAST/DAST security tools, and automation using Python or similar programming languages. The ideal candidate will write and maintain automation scripts and infrastructure as code (IaC), build custom tooling for system reliability, and integrate security best practices into the deployment process.

Key Responsibilities

Automation & Infrastructure Management:

Automate and optimize CI/CD Pipelines: Design, implement, and maintain GitHub Actions workflows, leveraging Python or Go to build automation tools.
Infrastructure as Code (IaC): Manage and scale cloud infrastructure using Terraform, ensuring full automation and compliance.
Build Custom Automation Tools: Develop internal tools and scripts for automating infrastructure provisioning, monitoring, scaling, and self-healing.

Security & Compliance:

Security Posture Management: Integrate SAST (Static Application Security Testing) and DAST (Dynamic Application Security Testing) tools into CI/CD pipelines to proactively detect vulnerabilities.
Secure CI/CD Workflows: Implement role-based access control (RBAC) and automated security scanning for code, dependencies, and infrastructure.

Cloud Reliability & Performance:

Cloud Architecture & Scaling: Deploy, monitor, and optimize AWS-based services for scalability, performance, and cost efficiency.
Observability & Incident Response: Implement and refine monitoring, logging, and alerting solutions using Prometheus, Grafana, CloudWatch, or similar tools.
Disaster Recovery & High Availability: Architect fault-tolerant solutions with automated failover, backups, and incident response strategies.

Support & Collaboration:

On-Call Support (Tier 3): Participate in a 24/7 on-call rotation, diagnosing and resolving production incidents with a focus on root cause analysis and post-mortem improvements.
Cross-Functional Collaboration: Work with development and security teams to streamline the DevSecOps pipeline, improve deployment automation, and enhance system resilience.
Documentation & Best Practices: Maintain clear and up-to-date documentation on automation processes, infrastructure, and CI/CD workflows.

Qualifications & Experience

Required Skills & Qualifications

5+ years of experience in Site Reliability Engineering (SRE), DevOps, or Platform Engineering roles.
Strong programming skills in Python, Go, or Bash scripting to develop automation solutions and tooling.
Expertise in CI/CD automation using GitHub Actions (workflow automation, self-hosted runners, security best practices).
Strong knowledge of Terraform for infrastructure provisioning and lifecycle management.
Deep experience with AWS (EC2, S3, RDS, Lambda, IAM, EKS, etc.) and cloud security best practices.
Experience with SAST & DAST security tools (e.g., SonarQube, Snyk, Checkmarx, ZAP, Burp Suite, etc.).
Hands-on experience managing deployments using GitOps principles (ArgoCD, Flux, or similar).
Strong Linux administration and system troubleshooting skills.
Experience with monitoring & observability tools (Datadog, Prometheus, Grafana, CloudWatch).
Ability to handle Tier 3 on-call incidents with a focus on proactive issue resolution

Preferred Skills & Qualifications

Experience with Kubernetes (EKS, GKE, or OpenShift) and container orchestration.
Knowledge of service mesh technologies (Istio, Linkerd, Consul).
Familiarity with compliance frameworks like SOC 2, ISO 27001, or HIPAA.
Familiarity with Privacy and PII compliance such as GDPR, CCRA and other US Privacy Policies.
Experience with database performance tuning (MySQL and NoSQL solutions).

Why Join Fullcast?

Innovative Work: Help shape the future of our cloud-native, secure, and high-performance platform.
Tech-Forward Culture: Work with modern tools and methodologies (GitOps, Terraform, AWS).
Autonomy & Impact: Own critical reliability and security initiatives with a direct impact on business continuity.
Competitive Benefits: Flexible remote work, health insurance, professional development budget, and more.

The pay range for this role is:

142,000.00 - 157,500.00 USD per year (Remote - United States)

40,000 - 65,000 USD per year (Remote - Costa Rica)

The pay range for this role is:

142,000 - 175,000 USD per year (Remote (United States))

40,000 - 65,000 USD per year (Remote (Costa Rica))

Product

Remote (Costa Rica)

Remote (United States)

Share on: