ML Engineer

About Shipium

Shipium is on a mission to solve the ‘Prime problem’ that modern retailers face today how to make fast, free, and on-time delivery promises a cornerstone of the customer’s shopping experience. We work with both fast-growing companies and well-known large retailers to cut costs, speed deliveries, and increase flexibility through great APIs, easy integrations into existing infrastructure, and powerful services optimizing the full retail supply chain.

We’re building tech that connects previously fragmented systems and automates complex supply chain decisions to deliver speed and value across operations.

Founded in 2019 by supply chain technology experts from Amazon and Zulily, the company is on a mission to help every eCommerce company provide its customers a great delivery experience while simultaneously reducing their costs to fulfill orders.

About the role

This position is responsible for solving complex problems related to the design, deployment, and continuous optimization of scalable machine learning platforms and production workflows. You will be responsible for architecting and scaling our ML systems to support a growing number of machine learning models and an increasing volume of real-time predictions. This role will also spearhead our initiatives in Generative AI, designing and implementing systems that leverage Large Language Models (LLMs) to translate model predictions into powerful user-facing insights and agents. You will have a significant impact on the performance, reliability, and scalability of our machine learning and AI solutions, directly contributing to the success of the Shipium platform. The ideal candidate has a proven record of building and managing large-scale ML platforms and leveraging expertise in machine learning, software engineering, Generative AI, and cloud technologies to optimize performance while collaborating effectively across teams.

What you'll do

Architect and implement a scalable, high-performance machine learning platform to support model development, deployment, monitoring, and analysis for both predictive and Generative AI models.
Lead the technical strategy and evaluation for our Generative AI infrastructure. This includes assessing the trade-offs between managed services and self-hosted open-source models, defining our LLM hosting strategy, and validating the end-to-end architectural approach for scalable, reliable AI features.
Ensure the platform supports a wide range of ML use cases, including real-time prediction serving, batch processing, and model experimentation.
Design and implement robust LLM orchestration for advanced applications, enabling the integration of our proprietary predictive models with LLMs to power new insights and workflows.
Ensure the platform supports a wide range of ML use cases, including real-time prediction serving, batch processing, model experimentation, and advanced Generative AI applications.
Optimize system performance and model latency to ensure robust, low‑latency inference across distributed systems, with a specific focus on the unique challenges of LLM serving.
Identify bottlenecks, evaluate, and integrate new technologies and tools.
Collaborate closely with data scientists to productionize both traditional predictive models and novel Generative AI solutions, focusing on systems that combine proprietary model outputs with LLMs to create actionable insights.
Contribute to the overall quality of the codebase, ensuring maintainability and best practices.
Drive ML / DS best practices, give technical recommendations on challenging problems.

Qualifications

Core Programming & Machine Learning:
- Proficiency in Python and deep experience with its data science and ML ecosystem (e.g., PyTorch, TensorFlow, scikit-learn, Pandas, NumPy).
- Hands-on experience with Generative AI frameworks and libraries such as LangChain, LlamaIndex, or Hugging Face Transformers.
MLOps & Infrastructure:
- Expertise in building and maintaining MLOps infrastructure, including containerization (Docker), orchestration (Kubernetes), and CI/CD pipelines for both traditional ML models and LLM-based applications.
- Proven skill in managing cloud resources using Infrastructure as Code (Terraform).
Cloud Platforms & Services:
- Extensive hands-on experience with cloud platforms, particularly AWS. Required experience with core services (S3, EC2, Lambda) and ML services (SageMaker).
- Direct experience with or deep knowledge of managed Generative AI services like AWS Bedrock, Amazon Titan, or equivalents (e.g., Google Vertex AI, Azure OpenAI Service).
Data Systems & Storage:
- Advanced proficiency in SQL for complex data extraction and transformation.
- Experience with a variety of data storage solutions, including relational databases, NoSQL databases, and vector databases (e.g., Pinecone, Weaviate, ChromaDB)

Master’s Degree in Computer Science, Software Engineering, or a related field and 4-5 years of experience building and managing production-level machine learning platforms and infrastructure, with a focus on model deployment, optimization, and scalability; demonstrated ability to improve the performance, reliability, and cost-efficiency of ML systems; strong experience with cloud-based ML infrastructure (AWS, GCP, Azure) and MLOps practices; and/or equivalent combination of education and experience such as a PhD in Computer Science, Data Science, Mathematics, Statistics, or related quantitative field and strong knowledge of machine learning.

At Shipium, Employees enjoy full medical, dental & vision coverage (with 50% coverage for dependents), optional life insurance and long-term disability coverage, a 401(k) retirement plan, fully remote work-from-home options in 25* states, 8 paid weeks of parental leave, 12 paid holidays annually, self-managed vacation time, sick & safety leave, and volunteer time off.

Shipium is committed to creating a diverse environment and is proud to be an equal opportunity employer. Women, people of color, people with disabilities, and veterans are strongly encouraged to apply. We prohibit discrimination and harassment of any kind based on race, color, sex, religion, sexual orientation, national origin, disability, genetic information, pregnancy, or any other protected characteristic as outlined by federal, state, or local laws. If you need reasonable accommodation because of a disability for any part of the employment process, please email Human Resources (hr@shipium.com) and let us know the nature of your request and your contact information.

This applies to all employment practices within our organization, including hiring, recruiting, promotion, termination, layoff, recall, leave of absence, compensation, benefits, training, and mentorships. Shipium makes hiring decisions based solely on qualifications, merit, and business.

*Although based out of Seattle, WA, Shipium is 100% remote in the following states: Arizona, California, Colorado, Connecticut, District of Columbia, Florida, Georgia, Idaho, Illinois, Indiana, Maryland, Massachusetts, Michigan, Montana, Missouri, Nevada, New Jersey, New York, North Carolina, Ohio, Oregon, South Carolina, Tennessee, Texas, Vermont, Washington & Wisconsin.

Shipium participates in e-Verify

Must have a green card or be a U.S. Citizen

We do NOT work with OPTs, transfer H1Bs

The pay range for this role is:

130,000 - 150,000 USD per year (Remote (United States))

Machine Learning / Data Science

Remote (United States)

Share on: