Staff Data Engineer

About Winston Artory Group

WAG is transforming how art and collectibles are valued, managed, and traded. Born from the merger of Winston Art Group (the largest independent appraisal and advisory firm in the U.S.) and Artory (a pioneer in art tokenization), we combine deep industry expertise with technologies like AI and blockchain to modernize a $2.9 trillion global asset class.

We're already generating significant revenue and recently raised our Series A from top-tier VCs. Now, we're building the next-generation platform to unlock liquidity, trust, and intelligence in the art market—and we're looking for exceptional engineers to help us do it.

Why Join Us

Meaningful equity and competitive compensation
High-impact role at a growing company with revenue, funding, and a compelling vision
Build at the intersection of art, fintech, AI, and blockchain
A collaborative, pragmatic team that values speed, clarity, and technical quality
Remote-flexible culture with an HQ in NYC
Backed by top VCs and trusted by leading collectors, advisors, and institutions

About the role

Location

US Remote or Hybrid (East Coast or Central time-zone required).
For candidates in NY or Miami, an interview may be conducted in person.

The Role

We’re hiring a Staff Data Engineer to be the founding technical leader of our data team. You’ll build the data platform from the ground up—the engine that powers WAG’s AI-driven valuations, market analytics, and collector intelligence. As the first dedicated data engineering hire, you’ll own the entire data architecture—from ingestion and scraping infrastructure to enrichment pipelines, data warehousing, and the datasets that feed our machine learning models and public indices. This is a team-founding, tech lead role: you’ll lay the technical foundation, establish data engineering standards and culture, hire your team, and scale the data platform alongside the company.

Reporting & Collaboration:

You’ll report directly to our Head of Engineering, a highly hands-on technical leader who you’ll partner closely with on data architecture decisions and technical strategy. Together, you’ll co-own the data platform vision—balancing immediate pipeline needs with long-term scalability, quality, and governance. Our Head of Engineering is deeply involved in code, design reviews, and technical discussions, so you’ll have a close working relationship focused on building world-class data systems.

You’ll also work closely with our CPO (Chief Product Officer), domain experts, and company leadership to turn fragmented, messy real-world data into a durable competitive advantage. This isn’t just about building pipelines—it’s about making architectural decisions on storage, processing, and quality; evaluating tradeoffs between speed and rigor; and building a data platform that can evolve from early-stage to enterprise-scale.

This is an AI-native environment. We move fast using tools like Cursor and Claude Code, build with LLM APIs from OpenAI and others, and actively leverage AI in our product and development workflows. If you’re excited about being a founding technical leader who ships production data systems with AI as a core tool, you’ll thrive here.

What You'll Do

Team Building & Technical Leadership:

Build the data team from scratch—define the hiring roadmap, recruit and onboard your first 2–3 data engineers, and establish the team’s culture, standards, and ways of working
Own the entire data platform architecture from day one—make the critical decisions on storage layers, processing frameworks, orchestration, and data modeling patterns
Define technical standards and best practices for data quality, testing, documentation, lineage, and governance
Lead system design for complex problems involving large-scale ingestion, entity resolution, LLM-powered data extraction, and real-time analytics
Evaluate and adopt new technologies that improve data velocity, quality, reliability, or capabilities
Establish data governance frameworks including versioning, reproducibility, validation, and compliance

Hands-On Development:

Design and operate scalable data ingestion and web scraping systems, including best practices around retries, proxies, rate limiting, and anti-bot strategies
Build batch and real-time pipelines to normalize, enrich, deduplicate, and version data across structured and unstructured sources
Architect systems to support LLM- and ML-based document parsing, OCR, entity extraction, and classification at scale
Own the data storage and processing stack, including PostgreSQL, data lakes, data warehouses, and vector databases
Operationalize AI/ML workflows by preparing clean training and inference datasets with robust lineage, validation, and error handling
Design and maintain data models that serve backend APIs, valuation services, analytics dashboards, and public indices
Contribute to infrastructure tooling, including CI/CD, IaC (Terraform), data observability, and cost management

Cross-Functional Collaboration:

Co-own data platform vision with Head of Engineering: collaborate daily on architecture, technical roadmap, and engineering standards
Partner with backend engineers to define API contracts, data serving patterns, and integration points between pipelines and application services
Collaborate with product and domain experts to translate business requirements into reliable, well-modeled datasets
Work with company leadership (Head of Engineering, CPO, President) on data strategy, hiring, and long-term platform vision
Communicate technical decisions clearly to non-technical stakeholders

You Might Be a Fit If You

Required:

Education: B.S. in Computer Science or equivalent
Experience: 7+ years of data engineering experience with at least 2+ years in a technical lead, staff, or principal role at a high-growth startup or product company
Leadership: Proven track record of building or scaling data teams, mentoring engineers, and making foundational architectural decisions that set the direction for an entire data organization
Technical Skills:
- Expert in Python and SQL, with deep understanding of performance, data modeling, and processing patterns
- Strong database expertise (PostgreSQL or similar) including query optimization, schema design, indexing, and partitioning strategies
- Deep experience with pipeline orchestration tools like Airflow, Dagster, Prefect, or Temporal
- Hands-on experience designing and maintaining web scraping systems at scale, including retries, proxies, and anti-bot strategies
- Production experience integrating structured and unstructured sources, with a track record of resolving messy, real-world data challenges
- Hands-on experience with LLM/AI integration in data workflows—you’ve built pipelines using OpenAI, Anthropic, or open-source models for document understanding, NLP, entity extraction, or classification
- Deep knowledge of data architecture patterns including ETL vs. ELT, data lakes vs. warehouses, batch vs. streaming, and schema evolution
- Production experience with AWS (or GCP/Azure) including compute, storage, networking, and managed data services
- Strong DevOps fundamentals: Docker, Terraform, CI/CD, and data observability/monitoring

Mindset:
- You balance pragmatism with data quality—you know when to move fast and when to invest in governance and reliability
- You have a bias toward clean architecture and can articulate tradeoffs between speed, cost, and correctness
- You’re excited about AI tooling and actively use tools like Cursor, Claude, or Copilot to increase velocity
- You thrive in ambiguity and can chart technical direction for the data platform with incomplete information
- You’re energized by building from zero—you want to lay the foundation, not inherit someone else’s

Preferred:

Experience building 0→1 data platforms at early-stage startups (seed through Series B)
Prior experience founding or building a data team from scratch—hiring, onboarding, and establishing team processes
Prior tech lead or staff engineer experience at a Series A+ company
Experience with data warehousing (Snowflake, BigQuery, Redshift) and modern data stack tools (dbt, Fivetran, etc.)
Familiarity with vector databases and semantic search (Pinecone, Weaviate, pgvector)
Experience with ML/AI model deployment and managing inference costs/latency in data pipelines
Domain knowledge in art, collectibles, fintech, or fragmented asset classes where clean data is rare but valuable
Experience with data governance and compliance requirements

We offer great benefits, including:

A Welcoming Team

A friendly, international, agile team that works together with cutting-edge technologies
Generous paid time off, including vacation, sick days, and holidays
Paid parental leave (maternity, paternity, adoption, leave)
Paid volunteer days to encourage community involvement
Collaborative, innovative, and inclusive company culture
Employee recognition and appreciation programs
Team-building activities and social events
Transparent communication and feedback channels

Competitive Compensation

Competitive salary based on experience and skills
Discretionary performance-based bonuses
Equity option grants of company shares, offering alignment with company success

Health, Wellness, and Benefits

Comprehensive health insurance (medical, dental, and vision) with employees covered 100%
On-site in our NY HQ fitness center and sauna
Generous leave policies, including bereavement and reproductive loss leave
Opportunities for continuous learning and training
Mentorship programs and leadership development initiatives
401(k)
Life insurance and disability coverage

Engineering

New York, NY

East Coast Timezone

Hybrid

Share on: