About Tensorwave Inc.
At TensorWave, we're revolutionizing AI computing by offering the most advanced cloud services, highlighted by our deployment of AMD Instinct MI300x GPUs. Our mission is to accelerate AI innovation by removing hardware limitations and ensuring scalable, efficient solutions for AI workloads. To support our rapid growth, we're seeking a Network Engineer with experience in High-Performance Computing to join our team.
About the role
We are looking for a HPC Network Engineer with a passion for AI and advanced networking technologies. The ideal candidate will support our vision by developing and managing a networking infrastructure that underpins our innovative AI cloud services. This role involves exploring and integrating new types of network fabrics to enhance our platform's performance and scalability, ensuring optimal operation for our clients' AI projects.
What you'll do
- Collaborate with a dynamic engineering team to design and implement innovative networking solutions that meet the demands of high-performance AI workloads.
- Lead initiatives to explore and integrate new types of network fabrics, enhancing the scalability and efficiency of our AI infrastructure.
- Ensure network reliability, performance, and security for cloud services, optimizing for both AMD and NVIDIA GPU technologies.
- Work closely with the AI development team to align networking strategies with the overall goals of TensorWave's cloud platform.
- Troubleshoot and resolve complex networking issues, providing expert guidance and solutions to maintain high service levels.
Essential Skills and Qualifications
- Bachelor’s degree in Computer Science, Information Technology, or related field.
- At least 5 years of relevant experience in network engineering, with a focus on supporting high-performance computing (HPC) and AI applications.
- Strong knowledge of RCCL/NCCL, MPI and/or HPC.
- Experience with or keen interest in exploring new network fabrics and technologies, particularly in the context of AI and cloud computing.
- Familiarity with AMD and NVIDIA GPU ecosystems and their impact on network performance and configuration.
- Exceptional problem-solving abilities and a commitment to innovation in networking for AI applications.
Benefits:
We offer a competitive salary and benefits, including:
- Stock Options
- 100% paid Medical, Dental and Vision Benefits for employees
- Life and Voluntary supplemental life insurance
- Short-term disability insurance
- Flexible Spending Account
- 401(k)
- Flexible PTO
- Paid Holidays
- Parental Leave
- Mental Health Benefits through Spring Health