Senior Backend Engineer

About FlexAI

Build and Deploy AI the right way, anywhere.

The FlexAI Compute Infrastructure Platform provides an "end-to-end AI compute layer" for running and managing workloads across any cloud, any GPU, and any deployment model (public, hybrid, or on-prem). It brings together "1-click simplicity" for users with "enterprise-grade orchestration, security, and automation" under the hood.

Founded by Brijesh Tripathi, who bring experience from Nvidia, Apple, Tesla, Intel and Zoox, FlexAI is not just building a product – we’re shaping the future of AI. Our teams are strategically distributed across Silicon Valley and Bengaluru, united by a shared mission: to deliver more compute with less complexity.

If you're passionate about shaping the future of artificial intelligence, driving innovation, and contributing to a sustainable and inclusive AI ecosystem, FlexAI is the place for you !

Role Overview

FlexAI is looking for a Senior Backend Engineer (Infrastructure & AI Platform) with deep Golang expertise to architect and build the core backend systems powering our next-generation AI compute and PaaS platform. This role sits at the intersection of distributed systems, cloud infrastructure, and AI platform engineering — enabling large-scale model training, inference, and orchestration across heterogeneous compute. This is not a traditional backend role; you will be building platform-grade systems that support AI runtimes, scheduling, resource orchestration, and multi-tenant cloud infrastructure.

As a Senior Backend Engineer, you'll drive backend architecture, scale platform services, and build high-performance infrastructure components that power AI workloads in production environments — influencing how the platform evolves from Beta to enterprise-grade deployment. Expect high ownership and technical autonomy in a research-driven, deep-tech environment — not SaaS CRUD apps.

This position is In-Person and located at our San Jose, CA Office.

What You'll Do

Core Platform & Infrastructure Backend:

Architect and develop high-performance Golang services for FlexAI's AI PaaS and infrastructure platform
Build internal APIs powering model deployment, job scheduling, and compute lifecycle management
Develop components interfacing with GPU/compute infrastructure and AI runtimes

Distributed Systems & Scalability:

Design and scale microservices and event-driven systems for high-throughput AI workloads
Optimize for low latency, high concurrency, and fault tolerance
Implement service-to-service communication (gRPC/REST, message queues, async pipelines)
Drive reliability, observability, and resilience across services

AI Platform Integration:

Collaborate with AI/ML and Runtime teams to integrate systems with training pipelines, inference infrastructure, experimentation workflows, and dataset/artifact management
Enable orchestration across cloud and on-prem environments
Build abstractions that simplify AI infrastructure consumption

Cloud-Native & Platform Engineering:

Design cloud-native, Kubernetes-native services
Work with DevOps/SRE on CI/CD, deployment automation, and scalability
Contribute to architecture decisions for multi-region, multi-cloud infrastructure
Improve monitoring, logging, and diagnostics

Technical Leadership:

Lead architecture reviews and set engineering standards
Mentor engineers and guide complex problem-solving
Drive long-term roadmap for backend infrastructure and AI platform capabilities
Partner with Product, Runtime, and Infra leadership to translate requirements into scalable systems

Tech Stack (Indicative):

Languages: Golang (Primary), Python (Secondary)
Infrastructure: Kubernetes, Docker, Cloud (AWS/GCP/Azure)
Architecture: Microservices, gRPC, Event-driven systems
Data: SQL + NoSQL databases, caching, streaming systems
Observability: Prometheus, Grafana, OpenTelemetry (or similar)

What You'll Need to Be Successful

Core Engineering:

5+ years of Backend or Infrastructure Engineering experience
Expert-level proficiency in Golang (must-have, heavy hands-on)
Strong experience building production-grade distributed systems
Proven track record on infrastructure platforms, PaaS, or deep-tech systems

Infrastructure & Systems:

Deep understanding of cloud-native architectures and containerized environments
Strong experience with Kubernetes, Docker, and cluster orchestration
Familiarity with compute scheduling, resource management, or platform runtimes is a strong plus

Databases & Data Systems:

Experience with distributed databases (PostgreSQL, Cassandra, DynamoDB, etc.)
Strong understanding of caching, queues, and streaming systems (Redis, Kafka, etc.)

AI / Platform Exposure (Highly Preferred):

Experience on AI/ML platforms, model infrastructure, or data platforms
Familiarity with ML pipelines, inference systems, or GPU-backed workloads
Exposure to PyTorch, TensorFlow infrastructure, or model serving systems is a plus

Ideal Candidate Profile (Who Will Thrive Here)

Infra-first backend engineers (not just API developers)
Background in AI infra, cloud platforms, developer platforms, or deep-tech systems
Strong systems thinkers who enjoy low-level performance, scalability, and architecture challenges
Startup-minded builders comfortable in ambiguous, high-ownership environments

What We Offer

Competitive salary and benefits package
Work on cutting-edge AI infrastructure
Build products used by developers and enterprises
High ownership, fast execution, real impact
Collaborative, high-caliber team

The pay range for this role is:

180,000 - 225,000 USD per year (US)

Engineering

Santa Clara, CA

Share on: