About Harell Data
AI has transformed digital industries, but progress in the physical sciences — drug discovery, materials science, climate modeling — is stuck. The bottleneck isn't compute or algorithms. It's data. The most valuable scientific datasets are locked away in silos, unstructured, and hard to access.
We're fixing that. Harell Data is a managed platform where organizations can securely share proprietary datasets and train models on high-performance GPUs — all in one place. Dataset owners can share their data for model training without giving up control of it — trainers get access to compute against the data, not the data itself. Think of it as the infrastructure layer that turns scattered scientific data into domain-specific foundation models.
Software Engineer, AI Infrastructure
About the Role
As a founding AI Infrastructure Engineer, you will report directly to the CTO and lead the development of our core compute and orchestration layer. This is a high-impact role where you will hold a significant ownership stake in the company and lead the 0-to-1 build of our infrastructure. You will work closely with our customers to translate their needs into a world-class platform, while simultaneously shaping our engineering culture and technical direction from the ground up.
What You Will Do
- Architect GPU Compute Fabric: Build and manage the orchestration layer for GPU workloads, ensuring efficient resource allocation and cost management for large-scale training, fine-tuning, and inference.
- Design Developer Interfaces: Build developer-centric SDKs and APIs that transform complex ML workflows into intuitive experiences for researchers and data scientists.
- Operationalize the ML Lifecycle: Develop robust, end-to-end pipelines-from data ingestion and preprocessing to secure model serving and monitoring.
- Client Success & Observability: Work closely with customers to debug fine-tuning jobs and build the observability tools required to track model performance and resource health in real-time.
- Define Systems & Culture Strategy: Lead the technical roadmap by making critical "build vs. buy" decisions on infrastructure and security, while directly shaping the team’s engineering standards and hiring processes.
Qualifications
- 5+ years of software engineering experience, with focus on ML infrastructure or backend systems supporting ML workloads
- Experience deploying and operating ML/DL training or inference pipelines in production (PyTorch, Hugging Face, or similar)
- Hands-on experience with Kubernetes on AWS/GCP, ideally for GPU workloads
- Strong CS fundamentals and system design skills
- Ability to thrive in fast-paced, dynamic environments and navigate ambiguity