Principal Software - Dan

About Us:

Positron.ai specializes in developing custom hardware systems to accelerate AI inference.  These inference systems offer significant performance and efficiency gains over traditional GPU-based systems, delivering advantages in both performance per dollar and performance per watt.  Positron exists to create the world's best AI inference systems.


Principal Software Engineer – High-Performance LLM Inference on Custom FPGA & x86 Hardware

We are seeking an experienced Principal Software Engineer to contribute to the development of high-performance software that powers execution of open-source large language models (LLMs) on our custom appliance. This appliance leverages a combination of FPGAs and x86 CPUs to accelerate transformer-based models. The software stack is written primarily in modern C++ (C++17/20) and heavily relies on templates, SIMD optimizations, and efficient parallel computing techniques.

[Example Projects:

Model Wrangling - Abstracting behavior from Model Names; Templating over fewer Parameters; Composing Inference Schedule from Model Features (Moving Inference Schedule composition from Compile-time to Runtime]

Key Areas of Focus & Responsibilities

  • Design, implement, and optimize high-performance inference software for LLMs on custom hardware.
  • Develop and fine-tune C++-based libraries that efficiently utilize SIMD instructions, threading, and memory hierarchy.
  • Work closely with FPGA and systems engineers to design efficient data movement and computational offloading between x86 CPUs and FPGAs.
  • Optimize transformer model execution via low-level optimizations, including vectorization, cache efficiency, and hardware-aware scheduling.
  • Develop performance profiling tools and methodologies to analyze execution bottlenecks at the instruction and data flow levels.
  • Implement NUMA-aware memory management techniques to optimize memory access patterns for large-scale inference workloads.
  • Ensure all code contributions include unit, performance, acceptance, and regression tests as part of a continuous integration-based development process.
  • Provide technical leadership and mentorship for a growing team of high-performance computing (HPC) and ML systems engineers.

Required Skills & Experience

  • 10+ years of professional experience in C++ software development, with a focus on performance-critical applications.
  • Deep understanding of C++ templates, and modern memory management.
  • Strong experience with SIMD programming (AVX-512, SSE, or equivalent).
  • Experience in high-performance computing (HPC), numerical computing, or ML inference optimization.
  • Knowledge of multi-threading, NUMA architectures, and low-level CPU optimization.
  • Strong background in systems-level software development, profiling tools (perfetto, VTune, Valgrind), and benchmarking.
  • Experience working with hardware accelerators (FPGAs, GPUs, or custom ASICs) and designing efficient software-hardware interfaces.

Preferred Skills (Nice to Have)

  • Hands-on experience with LLVM/Clang or GCC compiler optimizations.
  • Experience in LLM quantization, sparsity optimizations, and mixed-precision computation.
  • Knowledge of networking optimizations in distributed inference settings.

Why Join Us?

  • Work on a cutting-edge ML inference platform that redefines performance and efficiency for LLMs.
  • Tackle some of the most challenging low-level performance engineering problems in AI today.
  • Collaborate with a team of hardware, software, and ML experts building an industry-first product.
  • Opportunity to contribute to and shape the future of open-source AI inference software.


Engineering

Liberty Lake, WA

Remote (United States)

Share on:

Terms of servicePrivacyCookiesPowered by Rippling