About Us:
Positron.ai specializes in developing custom hardware systems to accelerate AI inference. These inference systems offer significant performance and efficiency gains over traditional GPU-based systems, delivering advantages in both performance per dollar and performance per watt. Positron exists to create the world's best AI inference systems.
Principal Software Engineer – High-Performance LLM Inference on Custom FPGA & x86 Hardware
We are seeking an experienced Principal Software Engineer to contribute to the development of high-performance software that powers execution of open-source large language models (LLMs) on our custom appliance. This appliance leverages a combination of FPGAs and x86 CPUs to accelerate transformer-based models. The software stack is written primarily in modern C++ (C++17/20) and heavily relies on templates, SIMD optimizations, and efficient parallel computing techniques.
[Example Projects:
Model Wrangling - Abstracting behavior from Model Names; Templating over fewer Parameters; Composing Inference Schedule from Model Features (Moving Inference Schedule composition from Compile-time to Runtime]
Key Areas of Focus & Responsibilities
- Design, implement, and optimize high-performance inference software for LLMs on custom hardware.
- Develop and fine-tune C++-based libraries that efficiently utilize SIMD instructions, threading, and memory hierarchy.
- Work closely with FPGA and systems engineers to design efficient data movement and computational offloading between x86 CPUs and FPGAs.
- Optimize transformer model execution via low-level optimizations, including vectorization, cache efficiency, and hardware-aware scheduling.
- Develop performance profiling tools and methodologies to analyze execution bottlenecks at the instruction and data flow levels.
- Implement NUMA-aware memory management techniques to optimize memory access patterns for large-scale inference workloads.
- Ensure all code contributions include unit, performance, acceptance, and regression tests as part of a continuous integration-based development process.
- Provide technical leadership and mentorship for a growing team of high-performance computing (HPC) and ML systems engineers.
Required Skills & Experience
- 10+ years of professional experience in C++ software development, with a focus on performance-critical applications.
- Deep understanding of C++ templates, and modern memory management.
- Strong experience with SIMD programming (AVX-512, SSE, or equivalent).
- Experience in high-performance computing (HPC), numerical computing, or ML inference optimization.
- Knowledge of multi-threading, NUMA architectures, and low-level CPU optimization.
- Strong background in systems-level software development, profiling tools (perfetto, VTune, Valgrind), and benchmarking.
- Experience working with hardware accelerators (FPGAs, GPUs, or custom ASICs) and designing efficient software-hardware interfaces.
Preferred Skills (Nice to Have)
- Hands-on experience with LLVM/Clang or GCC compiler optimizations.
- Experience in LLM quantization, sparsity optimizations, and mixed-precision computation.
- Knowledge of networking optimizations in distributed inference settings.
Why Join Us?
- Work on a cutting-edge ML inference platform that redefines performance and efficiency for LLMs.
- Tackle some of the most challenging low-level performance engineering problems in AI today.
- Collaborate with a team of hardware, software, and ML experts building an industry-first product.
- Opportunity to contribute to and shape the future of open-source AI inference software.