Kai is the AI company rebuilding cybersecurity for the machine-speed era. Founded by second time founders and trusted by Fortune 500 enterprises, Kai is building a future where security has no categories, no silos, and no human speed bottlenecks. The Kai Agentic AI Platform replaces fragmented, human-limited workflows with agentic AI systems that continuously contextualize, assess, reason, and execute security work at machine speed - making human defenders, superhuman.
Why Join Kai
- Well-funded: With $125M raised, we have the capital, runway, and resolve to rebuild cybersecurity from first principles.
- Proven: We've earned the trust of Fortune 500 and Global 1000 companies, and we're just getting started. Their confidence in Kai reflects what we've built: an AI-powered cybersecurity platform that performs at the scale and speed the enterprise demands.
- Experienced founders: Our founding team consists of second-time entrepreneurs, each with over 20 years of experience in the cybersecurity industry. Their proven expertise and vision drive our ambitious goals.
- World-class leadership team: Our Heads of AI, Engineering, and Product bring extensive experience from some of the world’s most influential companies, ensuring top-tier mentorship, direction, and vision.
- Frontier AI Applied Research Team: Our researchers operate at the leading edge of agentic AI systems, translating breakthrough capabilities into real-world cybersecurity applications.
- Generous compensation: We offer highly competitive salaries, equity options, and a supportive work environment. Your contributions will be valued and rewarded as we grow together.
About the Role
We are looking for a Senior Data Engineer (AI Platform) to design and build scalable data systems that power next-generation AI and Generative AI applications.
This is a senior, hands-on technical role for someone who can operate across both classical data engineering and modern AI data infrastructure — including large-scale data pipelines, vector databases, and retrieval systems for LLM-powered applications.
You will work at the intersection of data engineering, AI infrastructure, and LLM systems, enabling high-quality data flow, retrieval, and storage for production-grade intelligence systems.
Key Responsibilities
- Design and build scalable data pipelines for batch and real-time processing
- Develop and maintain data infrastructure supporting AI/ML and Generative AI systems
- Build and optimize retrieval pipelines for RAG and LLM-based applications
- Design and manage vector data pipelines (embedding generation, indexing, storage, retrieval)
- Implement hybrid retrieval systems (BM25 + vector search)
- Work closely with AI/ML teams to enable training, evaluation, and inference workflows
- Develop data models and storage systems optimized for large-scale AI applications
- Ensure data quality, consistency, and reliability across pipelines
- Optimize systems for performance, latency, scalability, and cost
- Collaborate with product, engineering, and AI teams to translate requirements into data solutions
Required Qualifications
- 4+ years of experience in Data Engineering or related fields
- Strong experience building large-scale distributed data pipelines
- Proficiency in Python and SQL; experience with Spark or similar frameworks
- Experience with both batch and streaming systems (e.g., Kafka, Flink, Spark Streaming)
- Experience working with cloud data platforms (AWS, GCP, Azure)
- Solid understanding of data modeling, storage systems, and distributed systems
- Experience supporting AI/ML workloads through data infrastructure
- Strong ownership mindset and ability to operate in fast-paced environments
Preferred Qualifications
- Experience working with LLM-powered systems and RAG pipelines
- Familiarity with vector databases and ANN search systems
- Experience in data systems for AI platforms or ML infrastructure
- Background in search, recommendation systems, or information retrieval
Core Technical Expertise
Data Engineering & Pipelines
- Batch and streaming pipelines (Spark, Flink, Kafka)
- ETL/ELT design, data modeling, and data warehousing
- Data quality, validation, and observability
AI Data Infrastructure
- Data pipelines for ML training and inference
- Feature stores and dataset versioning
- Data preparation for LLM and GenAI systems
Vector Databases & Retrieval Systems
- Milvus, Pinecone, Databricks Vector Search, FAISS
- ANN algorithms (HNSW, IVF, PQ)
- Hybrid retrieval (BM25 + vector search)
- Embedding pipelines (text, code, image)
RAG & LLM Data Systems
- Retrieval pipelines for LLM applications
- Context construction and ranking
- Data indexing and chunking strategies
Storage & Distributed Systems
- Data lakes (S3, GCS, ADLS), Parquet, Delta Lake, Iceberg
- Distributed systems design and scalability
- Caching and low-latency data access
Platforms & Infrastructure
- AWS, GCP, Azure
- Databricks, BigQuery, Snowflake
- Kubernetes, Ray (nice to have)
Performance & Optimization
- Query optimization and indexing strategies
- Cost optimization for large-scale data systems
- Latency optimization for real-time retrieval