Senior Data Scientist

Senior Data Scientist

Supplier.io is the market leader in supplier intelligence, trusted by over half of the Fortune 100 to power smarter, more responsible sourcing decisions. Our platform helps corporate procurement teams discover, evaluate, and engage with over 11 million suppliers with a focus on local, small, diverse, and sustainable businesses. This helps organizations build supply chains that are resilient, inclusive, and built for impact.

Our solutions empower today’s procurement teams with accurate data, actionable insights, and measurable impact, which helps them mitigate risk, expand sourcing options, achieve ESG goals, and advance economic inclusion. Whether tracking spend, sourcing alternate suppliers, or measuring program results, Supplier.io transforms complexity into clarity; empowering teams to lead with confidence and build supply chains that deliver for both business and community.

Join a company committed to innovation, inclusion, and making a difference one sourcing decision at a time. For more information, visit www.supplier.io.

The Opportunity

Supplier.io is expanding our data team and is seeking a Senior Data Scientist with a strong data science orientation to play a critical role in scaling and modernizing our supplier intelligence platform. This role is weighted approximately 80% toward data science and 20% toward data engineering, which is ideal for someone with deep, hands-on experience building and training ML and NLP models and who is equally comfortable operationalizing those models within production data pipelines. You will bring strong architectural thinking, thrive in complex environments, and enjoy mentoring others while collaborating across teams, geographies, and disciplines.

A central focus of this role is Entity Resolution, which is the process of identifying, linking, and merging records across disparate data sources that refer to the same real-world entity (suppliers in our case). This involves resolving inconsistencies, handling missing data, and eliminating duplicates to create a single, accurate, and trustworthy supplier profile, often referred to as a “golden record” or 360-degree view. Our current systems leverage Lucene-based search and XGBoost ML models, and we are exploring the use of LLMs to further enhance these capabilities. The ideal candidate will improve and reimagine our existing legacy entity resolution systems, bringing experience with ML-based approaches to matching and deduplication at scale.

As a Senior Data Scientist, you will drive, shape, and execute our long-term data and data science strategy, design resilient and scalable data architectures, and champion technical excellence across our data ecosystem. You will work closely with Product and the Engineering teams to ensure our data systems support business growth, advance our matching capabilities, and enable data-driven decision-making.

To support Supplier.io growth, we are investing heavily in cloud-native technologies. This role will be instrumental in leveraging modern data services and ML capabilities, optimizing cost, and ensuring our data platform is secure, reliable, and scalable.

What You Will Do

Design, build, and iterate on ML-based entity resolution systems that match, link, and deduplicate supplier records across disparate data sources to produce trusted golden records.
Build, train, and refine NLP and ML models (e.g., XGBoost, search ranking models) for supplier matching, classification, and data enrichment, with a focus on improving accuracy and recall.
Evaluate and integrate emerging approaches, including LLMs, into our entity resolution and data intelligence workflows.
Own the full ML model lifecycle: feature engineering, training, evaluation, monitoring, feedback loops, and iterative tuning in partnership with data engineering and product teams.
Translate model results into business impact and clearly communicate tradeoffs, performance metrics, and recommendations to non-technical stakeholders.
Build and maintain data products end-to-end, operationalize them within production data pipelines, and ensure they deliver reliable, scalable results.
Execute and influence a cohesive data strategy that aligns with company objectives and supports analytics, reporting, and downstream product use cases.
Own complex data modeling initiatives, including dimensional and analytical models that support business intelligence and advanced analytics.
Drive continuous improvement by optimizing data pipelines, query performance, reliability, observability, and cost efficiency.
Partner with Infrastructure, Product, and Engineering teams to ensure data systems meet best practices, security standards, and business needs.
Create and maintain comprehensive technical documentation, including architecture diagrams, data flow maps, runbooks, and operations procedures.
Troubleshoot and resolve complex, cross-system data issues and incidents.

What You Will Need to Succeed:

Bachelor’s degree in Data Science, Computer Science, Machine Learning, Statistics, Engineering, or a related field.
7+ years of progressive experience in data science and/or data engineering, with demonstrated ownership of ML-based systems in production environments. At least 2 years in a senior or lead capacity preferred.
Hands-on experience building NLP and LLM-based models in Python for real-world data science applications.
Strong understanding of ML model lifecycle considerations, including evaluation, monitoring, feedback loops, and iterative tuning in partnership with data engineering and product teams.
Strong ability to translate model results into business impact and communicate tradeoffs to non-technical stakeholders.
Direct experience building or significantly improving entity resolution or search ranking systems, including ML-based approaches to record matching, linking, and deduplication at scale.
Proficiency with ML frameworks and tools such as XGBoost, scikit-learn, PyTorch, or TensorFlow, and familiarity with search technologies such as Lucene/Elasticsearch.
Demonstrated ability to build and maintain data products end-to-end by operationalizing models within production data pipelines, not solely tuning them.
Advanced proficiency with Python and SQL for both data science and data engineering workflows.
Experience with Snowflake and cloud-native data platforms (Azure, AWS, GCP, or multi-cloud environments).
Familiarity with data modeling, ETL/ELT processes, and modern data warehousing principles.
Experience working in an agile development environment and collaborating through ticketing systems such as Jira and Github.
Ability to communicate technical concepts clearly to technical and non-technical teams and influence decision-making.
Strong problem-solving skills with the ability to troubleshoot and resolve ambiguous, high-impact issues.
A results-oriented mindset with a demonstrated history of driving process improvements and technical excellence.
Ability to work independently while also serving as a trusted technical partner and mentor to others.
Ability to take vague requirements and turn them into technical roadmaps.

We do no accept unsolicited resumes from recruitment/search firms.

Supplier.io participates in E-Verify. For more information, click here. We will provide the Social Security Administration and, if necessary, the Department of Homeland Security, with information from each new employee’s Form I-9 to confirm work authorization.

Supplier.io is an Equal Employment Opportunity employer. All qualified applicants will receive consideration for employment without regard to race color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status.

Supplier.io is unable to sponsor work visas (e.g., H-1B, TN, OPT, etc.) for US positions.

If you require reasonable accommodation to complete the application or interview process, please contact the Human Resources department at hr@supplier.io or 978-843-5747.

Product & Engineering

Remote (United States)

Partilhar em:

Termos de serviço.Privacidade Cookies Desenvolvido pela Rippling