Summer Intern, AI Evaluation

About Armilla AI

Armilla AI is a cutting-edge startup at the intersection of artificial intelligence and insurance. Based in Toronto, Ontario, Canada, we're building innovative solutions to manage, underwrite, and insure the rapidly evolving risks associated with AI systems. We're a dynamic team passionate about pioneering the future of AI risk management.

The Role: AI Evaluation & Testing

We're looking for a motivated summer intern to join our team and gain hands-on experience developing AI evaluation and testing frameworks. As a Summer Intern focused on AI Evaluation & Testing, you'll work directly with our AI assessment teams to build the tools, methodologies, and infrastructure needed to systematically evaluate and test AI systems, particularly Large Language Models (LLMs). This is a unique opportunity to work at the forefront of AI risk management while developing valuable technical skills in a startup environment.

Role Responsibilities

In this exciting internship, you'll:

Assist in designing and implementing evaluation frameworks to test AI models, with a focus on Large Language Models (LLMs).
Conduct experiments to identify potential failure modes, biases, and vulnerabilities in AI systems.
Develop automated testing scripts and tools in Python to streamline AI evaluation processes.
Contribute to in-depth quantitative analysis and research, staying ahead of emerging AI risks and industry trends.
Support the team in building datasets, benchmarks, and evaluation metrics for AI risk assessment.
Gain exposure to the insurance industry and how AI systems are assessed for insurability.

What We're Looking For

We're seeking a candidate who brings:

Currently pursuing or recently completed a degree in Computer Science, Data Science, Machine Learning, Statistics, Mathematics, or a related field.
Strong programming skills in Python, with experience in libraries such as Pandas, NumPy, or similar tools.
Familiarity with machine learning concepts and interest in AI/LLM technologies (hands-on experience is a plus but not required).
A curious, analytical mindset with strong attention to detail and problem-solving abilities.
Ability to work both independently and collaboratively in a fast-paced startup environment.
Excellent communication skills and the ability to present technical findings clearly.
Enthusiasm for learning about AI safety, evaluation methodologies, and emerging risks in AI systems.

What's In It For You

Joining Armilla AI means:

Cutting-Edge Experience: Work hands-on with the latest AI technologies, particularly LLMs, in a real-world business context.
Meaningful Impact: Your work will directly contribute to how we assess and understand AI risks, shaping our product development and risk frameworks.
Mentorship & Learning: Learn from experienced AI and insurance professionals and gain insight into an emerging industry at the intersection of technology and risk management.
Startup Culture: Experience the dynamic, collaborative environment of a growing startup where your ideas and contributions are valued.

How to Apply

Excited about the opportunity to work on AI evaluation and testing at the frontier of AI risk management? We'd love to hear from you! Please send your resume and a brief note outlining:

Your relevant coursework, projects, or experience with AI/ML and Python
Any examples of technical work (e.g., GitHub repositories, course projects, Kaggle competitions, or personal projects)
What excites you about AI evaluation and why you're interested in joining Armilla AI this summer

Technology

Toronto, Canada

Share on: