Staff Software Engineer: Microservice Infrastructure & Real-Time ML Inference

Sanas.ai is pioneering the future of human communication. Founded by a team of Stanford researchers and entrepreneurs with deep industry experience, Sanas has developed the world’s first real-time speech transformation platform capable of accent translation, noise elimination, speech enhancement, and cross-language communication.

Sanas makes conversations clearer, more inclusive, and more effective, removing barriers that prevent people from being understood, regardless of accent, background noise, or native language.

Since going to market in 2023, Sanas has scaled at an extraordinary pace, growing from $0 to $32M ARR in under two years, with a projected >$50M ARR by the end of 2025. The company recently recorded its first $10M quarter and is on track to achieve $120M in ARR next year. With a SaaS-based model, Sanas serves some of the world’s largest enterprises, including Comcast, UPS, UHG. Today, Sanas technology is deployed across >17 of the Fortune 500 and continuing to accelerate growth.

The company’s valuation has a clear trajectory toward multi-billion-dollar market capitalization as it continues to expand into new verticals and product categories. With a TAM that spans all human in the loop communications and beyond, Sanas has the potential to impact every industry and every global interaction.

Sanas is revolutionizing the way we communicate with the world’s first real-time algorithm, designed to modulate accents, eliminate background noises, and magnify speech clarity. Pioneered by seasoned startup founders with a proven track record of creating and steering multiple unicorn companies, our groundbreaking GDP-shifting technology sets a gold standard.

Sanas is a 200-strong team, established in 2020. In this short span, we’ve successfully secured over $100 million in funding. Our innovation has been supported by the industry’s leading investors, including Insight Partners, Google Ventures, Quadrille Capital, General Catalyst, Quiet Capital, and other influential investors. Our reputation is further solidified by collaborations with numerous Fortune 100 companies. With Sanas, you’re not just adopting a product; you’re investing in the future of communication.

About the role

We're looking for a Staff Software Engineer (Backend) to design and build the next generation of our real-time translation infrastructure. You'll architect mission-critical microservices that power low-latency audio/video processing pipelines, working with cutting-edge speech recognition, translation, and voice synthesis technologies. You'll be instrumental in scaling our platform to handle millions of concurrent

streaming sessions while maintaining sub-100ms latency requirements. This role combines deep systems programming, distributed systems architecture, and cloud infrastructure expertise.

Mission & Scope

Own Sanas’ microservice and streaming architecture, that power sub-100 ms, real-time language translation in both B2B and B2C environments. Define Technical Strategy, align multiple teams, and raise the bar on reliability, performance, and reliability across regions.

What you'll do

Lead the design for high-throughput, low-latency microservices that enable bidirectional streaming in Sanas’ audio/video pipelines.
Build event/telemetry/feature pipelines (Kafka/Redis/DynamoDB) that support near-real-time decisions and model features at scale.
Productionize model serving (Triton/vLLM/TorchServe), implement autoscaling/batching/shadow-deploys, and enforce p99/p999 budgets.
Establish SLOs/error budgets, graceful degradation (keep call quality first), idempotency, circuit breakers, retries with jitter, and chaos drills.
Lead Sanas-wide logging/metrics/tracing (OpenTelemetry), RED/USE dashboards, and symptom-based alerting.
Drive cross-team designs, mentor seniors, lead postmortems/design reviews, and lay the foundation for shared libraries and patterns (auth, interceptors, tracing, schema rollout).

Qualifications

7+ years of Software Engineering experience, with a focus on distributed architecture and technical leadership.
Strong proficiency in Python or Go; strong async/concurrency (asyncio/futures), profiling, and GC/heap tuning.
Strong proficiency in Containerization and Orchestration: AWS/Azure, Terraform, Kubernetes, IaaC patterns and node pools. (CPU/GPU)
Experience in ML Inference: Triton/vLLM/TorchServe; GPU scheduling/packing, batching, A/B and shadow traffic.
Experience with gRPC/protobuf at scale (versioning, interceptors, performance tuning, and compatibility testing)
Nice-to-have: Experience with WebRTC/SRTP, RTP/RTCP, NAT traversal STUN/TURN,, SIP interop; FFmpeg/codec tradeoffs.
Nice-to-have: Experience in data streaming with Kafka, Redis, DynamoDB; exactly-once/at-least-once patterns; stream-batch bridges.

Science

Palo Alto, CA

Share on: