
Read this before applying
We are not posting a traditional quality role. We are posting a builder role. If your quality background is primarily pre-release test coverage, test automation management, or QA team leadership — this is not the right fit, and the application questions below will make that clear immediately.
If your background includes governing AI systems in production — agents making autonomous decisions in live enterprise environments — and you have implemented (not managed the implementation of) frameworks to make those systems trustworthy, keep reading.
About 7SG
Enterprise AI doesn't fail in the lab — it fails in production. Over 95% of AI pilots stall before delivering business impact, not because models are weak, but because enterprises lack an operating layer built for AI's speed, volatility, and real-world constraints.
7SG is the production execution layer for enterprise AI. We govern cost, compliance, and performance at runtime — enabling AI systems to operate reliably at scale across cloud, on-prem, and sovereign environments. We sit after pilots, before business impact.
We are a small, fast-moving team. Our engineers run weekly sprints, generate code at roughly 4x industry baseline using AI agents, and operate with a 10:1 agent-to-human ratio internally. This is not a 9-to-5 environment. It is a high-risk, high-reward startup for people who want to build the future of production AI — not manage its periphery.
The Role
The Senior Director / VP of AI Production Reliability & Trust is accountable for two things:
1. Production reliability: ensuring HAP (our Hybrid AI Platform) and our customer deployments operate dependably under real traffic, real data, and real regulatory constraints.
2. Agent trust: building the frameworks — technical and operational — that allow enterprise customers to trust autonomous AI agents doing work on their behalf.
This is not a pre-release testing role. It is not a test automation role. It is not a QA team management role.
This is a runtime governance role for non-deterministic, agentic AI systems. The systems you govern make decisions autonomously. The customers who depend on them are sovereign governments and large enterprises with no tolerance for unpredictable agent behavior.
What you will actually build
• A production quality operating system: quality gates, phase transition criteria, incident taxonomy, observability spec across our 6-layer Reference Architecture
• A continuous validation framework for agentic workflows — not test scripts run by humans, but autonomous evaluation pipelines that catch regression without human intervention
• An agent decision qualification framework: risk-tiered oversight for autonomous agent decisions, from ephemeral actions that need no review to high-stakes decisions that require multi-model consensus
• A trust evidence system: the observable signals — audit trails, behavioral consistency records, policy compliance evidence — that enterprise customers use to extend trust to agents operating on their behalf
• Production observability: instrumentation across Ingest, Prepare, Serve, Orchestrate, Monitor, and Optimize layers of the Reference Architecture
• A post-mortem and CAPA system: every production incident produces a root cause, a corrective action, and a new test that prevents recurrence
What we are not looking for
We want to be direct so you don't waste your time:
• Leaders who will hire a team first and direct them to build — we need someone who builds first and delegates second
• Candidates whose answer to "how would you do X?" is "I'd talk to my network" or "I'd evaluate vendors" — we need someone who already has answers
• Enterprise QA professionals whose toolkit is Selenium, Datadog, LoadRunner, or similar AI-washed commercial tools — we use open-source and next-gen frameworks and we expect you to know them
• Candidates whose production AI experience means "I oversaw a team monitoring an AI model" — we need someone who has implemented governance for autonomous agent systems
• People who need defined scope and predictable hours to do their best work
What we are looking for
• 10+ years in quality, reliability, or production operations for complex distributed systems — with at least some of that time governing AI or ML systems in live production
• Direct implementation experience with AI quality frameworks — you built it, not just led a team that built it
• Familiarity with the agentic AI quality problem: non-deterministic systems, hallucination detection, behavioral drift, autonomous decision governance
• Working knowledge of open-source evaluation and observability frameworks (LangSmith, Arize/Phoenix, RAGAS, PromptFlow, Weights & Biases, or similar) — not just commercial alternatives
• Background in regulated industries (financial services, telecom, healthcare, government) where AI quality failures have real contractual and commercial consequences
• Startup orientation: comfortable with ambiguity, iterative scope, and a team that moves faster than most people expect
Product / R&D
Remote (United States)
Share on: