About FlexAI
Build and Deploy AI the right way, anywhere.
The FlexAI Compute Infrastructure Platform provides an "end-to-end AI compute layer" for running and managing workloads across any cloud, any GPU, and any deployment model (public, hybrid, or on-prem). It brings together "1-click simplicity" for users with "enterprise-grade orchestration, security, and automation" under the hood.
Founded by Brijesh Tripathi, who bring experience from Nvidia, Apple, Tesla, Intel and Zoox, FlexAI is not just building a product – we’re shaping the future of AI. Our teams are strategically distributed across Silicon Valley and Bengaluru, united by a shared mission: to deliver more compute with less complexity.
If you're passionate about shaping the future of artificial intelligence, driving innovation, and contributing to a sustainable and inclusive AI ecosystem, FlexAI is the place for you !
Role Overview
FlexAI is looking for a Senior Backend Engineer (Infrastructure & AI Platform) with deep Golang expertise to architect and build the core backend systems powering our next-generation AI compute and PaaS platform. This role sits at the intersection of distributed systems, cloud infrastructure, and AI platform engineering — enabling large-scale model training, inference, and orchestration across heterogeneous compute. This is not a traditional backend role; you will be building platform-grade systems that support AI runtimes, scheduling, resource orchestration, and multi-tenant cloud infrastructure.
As a Senior Backend Engineer, you'll drive backend architecture, scale platform services, and build high-performance infrastructure components that power AI workloads in production environments — influencing how the platform evolves from Beta to enterprise-grade deployment. Expect high ownership and technical autonomy in a research-driven, deep-tech environment — not SaaS CRUD apps.
This position is In-Person and located at our San Jose, CA Office.
What You'll Do
Core Platform & Infrastructure Backend:
- Architect and develop high-performance Golang services for FlexAI's AI PaaS and infrastructure platform
- Build internal APIs powering model deployment, job scheduling, and compute lifecycle management
- Develop components interfacing with GPU/compute infrastructure and AI runtimes
Distributed Systems & Scalability:
- Design and scale microservices and event-driven systems for high-throughput AI workloads
- Optimize for low latency, high concurrency, and fault tolerance
- Implement service-to-service communication (gRPC/REST, message queues, async pipelines)
- Drive reliability, observability, and resilience across services
AI Platform Integration:
- Collaborate with AI/ML and Runtime teams to integrate systems with training pipelines, inference infrastructure, experimentation workflows, and dataset/artifact management
- Enable orchestration across cloud and on-prem environments
- Build abstractions that simplify AI infrastructure consumption
Cloud-Native & Platform Engineering:
- Design cloud-native, Kubernetes-native services
- Work with DevOps/SRE on CI/CD, deployment automation, and scalability
- Contribute to architecture decisions for multi-region, multi-cloud infrastructure
- Improve monitoring, logging, and diagnostics
Technical Leadership:
- Lead architecture reviews and set engineering standards
- Mentor engineers and guide complex problem-solving
- Drive long-term roadmap for backend infrastructure and AI platform capabilities
- Partner with Product, Runtime, and Infra leadership to translate requirements into scalable systems
Tech Stack (Indicative):
- Languages: Golang (Primary), Python (Secondary)
- Infrastructure: Kubernetes, Docker, Cloud (AWS/GCP/Azure)
- Architecture: Microservices, gRPC, Event-driven systems
- Data: SQL + NoSQL databases, caching, streaming systems
- Observability: Prometheus, Grafana, OpenTelemetry (or similar)
What You'll Need to Be Successful
Core Engineering:
- 5+ years of Backend or Infrastructure Engineering experience
- Expert-level proficiency in Golang (must-have, heavy hands-on)
- Strong experience building production-grade distributed systems
- Proven track record on infrastructure platforms, PaaS, or deep-tech systems
Infrastructure & Systems:
- Deep understanding of cloud-native architectures and containerized environments
- Strong experience with Kubernetes, Docker, and cluster orchestration
- Familiarity with compute scheduling, resource management, or platform runtimes is a strong plus
Databases & Data Systems:
- Experience with distributed databases (PostgreSQL, Cassandra, DynamoDB, etc.)
- Strong understanding of caching, queues, and streaming systems (Redis, Kafka, etc.)
AI / Platform Exposure (Highly Preferred):
- Experience on AI/ML platforms, model infrastructure, or data platforms
- Familiarity with ML pipelines, inference systems, or GPU-backed workloads
- Exposure to PyTorch, TensorFlow infrastructure, or model serving systems is a plus
Ideal Candidate Profile (Who Will Thrive Here)
- Infra-first backend engineers (not just API developers)
- Background in AI infra, cloud platforms, developer platforms, or deep-tech systems
- Strong systems thinkers who enjoy low-level performance, scalability, and architecture challenges
- Startup-minded builders comfortable in ambiguous, high-ownership environments
What We Offer
- Competitive salary and benefits package
- Work on cutting-edge AI infrastructure
- Build products used by developers and enterprises
- High ownership, fast execution, real impact
- Collaborative, high-caliber team