Sanas

AI Data Ops Lead

Sanas.ai is pioneering the future of human communication. Founded by a team of Stanford researchers and entrepreneurs with deep industry experience, Sanas has developed the world’s first real-time speech transformation platform capable of accent translation, noise elimination, speech enhancement, and cross-language communication.

Sanas makes conversations clearer, more inclusive, and more effective, removing barriers that prevent people from being understood, regardless of accent, background noise, or native language.


Since going to market in 2023, Sanas has scaled at an extraordinary pace, growing from $0 to $32M ARR in under two years, with a projected >$50M ARR by the end of 2025. The company recently recorded its first $10M quarter and is on track to achieve $120M in ARR next year. With a SaaS-based model, Sanas serves some of the world’s largest enterprises, including Comcast, UPS, UHG. Today, Sanas technology is deployed across >17 of the Fortune 500 and continuing to accelerate growth. 


The company’s valuation has a clear trajectory toward multi-billion-dollar market capitalization as it continues to expand into new verticals and product categories. With a TAM that spans all human in the loop communications and beyond, Sanas has the potential to impact every industry and every global interaction.


Sanas is revolutionizing the way we communicate with the world’s first real-time algorithm, designed to modulate accents, eliminate background noises, and magnify speech clarity. Pioneered by seasoned startup founders with a proven track record of creating and steering multiple unicorn companies, our groundbreaking GDP-shifting technology sets a gold standard.


Sanas is a 200-strong team, established in 2020. In this short span, we’ve successfully secured over $100 million in funding. Our innovation has been supported by the industry’s leading investors, including Insight Partners, Google Ventures, Quadrille Capital, General Catalyst, Quiet Capital, and other influential investors. Our reputation is further solidified by collaborations with numerous Fortune 100 companies. With Sanas, you’re not just adopting a product; you’re investing in the future of communication. 

About the role

Weʼre looking for a hands-on AI Data Ops Lead to own the datasets that power ourspeech and language models and analytics thereof. Youʼll design and maintain data pipelines, labeling workflows, and dashboards that transform raw multimodal data into actionable insights. This role blends data engineering with analytical depth—ideal for someone who can write production-grade Python, evaluate dataset quality, and surface trends that shape model development.
Youʼll collaborate with and support Scientists, Data Collection teams, Executives, and external vendors to bring new data sources online, run data collection and labeling, automate data ingestion, and deliver transparent reporting across the AI data lifecycle


What you'll do

 Data Collection

  • Build and maintain internal tools for data collection, labeling, and ingestion.
  • Discover new data sources and prepare them into unified data frames for consumption
  • Coordinate with multiple stakeholders to ensure timely delivery of high quality data.
  • Operate and design ETL data pipelines for large-scale audio, text, and metadata.

Data Quality, Analytics, Insights

  • Own data quality: Build tooling for quality assurance across all dimensions, discover inaccuracies and fix them + feed back into improving the QA tooling
  • Analyze dataset coverage, diversity, and quality; monitor bias and data drift.
  • Create dashboards and visual reports tracking data distribution, collection throughput, and collection quality.
  • Work cross-functionally to ensure that the data being made available meets our continuously evolving needs.
  • Run a monthly newsletter reporting about any changes being made to the data and all the new data sources being made available.

 Experimentation & Data Quality

  • Design validation experiments for labeled datasets.
  • Implement automated checks for consistency, completeness, and noise reduction.
  • Support research teams with well-documented, high-integrity datasets

Qualifications

  • 3–6 years of experience in data science, data operations, or ML data workflows
  • Strong programming skills in Python (pandas, NumPy, SQL, FastAPI or similar).
  • Proven experience building and maintaining Data dashboards (Gradio, Streamlit, Plotly, Dash, PowerBI, or similar).
  • Strong data analysis and visualization skills; comfort working with large, complex datasets
  • Familiarity with databases and cloud data infrastructure (SQL, DynamoDB, AWS Glue, S3, BigQuery, etc.)
  • Excellent communication and documentation skills; thrive in a fast-moving AI environment.

Preferred Experience:

  • Experience with speech or audio datasets (e.g., ASR, TTS, voice embeddings, or diarization).
  • Familiarity with data labeling workflows for audio or text.
  • Knowledge of signal processing, spectrogram analysis, or acoustic feature extraction.
  • Experience with data orchestration tools (Dagster, Airflow, etc.)
  • Experience with building custom tooling on a need-basis (Retool, Replit, etc.)
  • Exposure to dataset versioning, evaluation pipelines, and MLOps principles.
  • Interest in advancing the data foundations of AI research

Science

Palo Alto, CA

Share on:

Terms of servicePrivacyCookiesPowered by Rippling