About Tensorwave Inc.
At TensorWave, we're revolutionizing AI computing by offering the most advanced cloud services, highlighted by our deployment of AMD Instinct MI300x GPUs. Our mission is to accelerate AI innovation by removing hardware limitations and ensuring scalable, efficient solutions for AI workloads.
About the role:
TensorWave is seeking a driven Machine Learning Engineer with expertise in training / fine-tuning, wide knowledge of open source AI libraries, exposure to kernel development and a passion for pushing the boundaries of GPU acceleration. In this pivotal role, you will empower our customers by supporting cutting-edge tools and techniques for fine-tuning and training deep learning models on AMD GPUs. Your work will directly contribute to the growth of the ROCm ecosystem and the advancement of PyTorch on AMD hardware, enabling users to harness the full potential of our AI cloud services.
Responsibilities:
- Contribute to open-source deep learning libraries like PyTorch, advocating for and implementing ROCm support and enhancements.
- Develop in-house frameworks and tools that simplify and streamline model fine-tuning and training for our customers.
- Debug and identify compatibility issues with libraries, and collaborate with internal and external teams to resolve them.
- Design and develop optimization strategies to accelerate fine-tuning and training of deep learning models on AMD GPUs.
- Conduct in-depth research and performance analysis to identify and address bottlenecks in the AMD GPU acceleration pipeline.
- Stay at the forefront of advancements in deep learning, GPU acceleration, and model optimization techniques, particularly those related to ROCm and AMD hardware.
Essential Skills & Qualifications:
- Equivalent of a Bachelor's Degree in Computer Science, Artificial Intelligence, or a related field.
- 3+ years of hands-on experience with PyTorch training and fine-tuning deep learning models.
- Strong understanding of GPU architecture, memory management, and optimization techniques.
- Proficiency in Python and C/C++ for implementing high-performance deep learning models.
- Extensive experience with LLM / transformer architecture and a deep understanding of its internals.
- Experience with GPU kernel development (CUDA or ROCm) for deep learning applications.
- Excellent communication and collaboration skills, with the ability to effectively engage with both technical and non-technical audiences.
Preferred Qualifications:
- Experience with Triton or other model deployment frameworks.
- Experience with distributed training across GPU clusters.
- Familiarity with networking protocols and technologies, especially in the context of HPC and AI.
- Contributions to open-source deep learning projects, particularly those focused on training / fine-tuning / optimizing LLMs.
- Familiarity with Python profiling and benchmarking tools.
- Familiarity with ROCm and its ecosystem, including libraries like hipBLAS and MIOpen.
- GDB Benchmarking/ Profiling/ Stacktrace
Benefits:
We offer a competitive salary and benefits, including:
- Stock Options
- 100% paid Medical, Dental and Vision Benefits for employees
- Life and Voluntary supplemental life insurance
- Short-term disability insurance
- Flexible Spending Account
- 401(k)
- Flexible PTO
- Paid Holidays
- Parental Leave
- Mental Health Benefits through Spring Health