Senior AI & LLM Test Automation Engineer

AI Engineering

We are seeking an experienced Senior AI & LLM Test Automation Engineer to optimize and expand an automated testing framework for a Retrieval-Augmented Generation (RAG) Knowledge Base deployed in AWS. This role involves leveraging RAGAS metrics and LLM-as-a-judge techniques to ensure the accuracy, relevance, safety, and scalability of AI systems, with a hands-on approach to designing, implementing, and monitoring effective test solutions.

Responsibilities

Review existing LLM/RAG test automation workflows, identify gaps, and design an improved testing architecture
Implement automated test pipelines that utilize RAGAS for retrieval/generation evaluation and LLM-as-a-judge for subjective quality and safety checks
Integrate testing pipelines with AWS services such as S3, Lambda, CloudWatch, OpenSearch, RDS, and SQS
Define and manage evaluation rubrics, set metric thresholds, implement regression alerting systems, and generate reporting dashboards
Ensure scalability, reproducibility, and continuous quality monitoring within CI/CD pipeline environments
Collaborate with AI/ML engineers, DevOps teams, and product stakeholders to align testing metrics with project goals
Stay updated on advancements in LLM and RAG evaluation frameworks, integrating relevant improvements into the testing processes
Troubleshoot, document, and resolve automation framework issues to ensure robust performance and reliability
Contribute to knowledge sharing, documentation, and training sessions related to the RAG testing framework

Requirements

3+ years of proven experience in LLM and RAG evaluation frameworks, including RAGAS and prompt-based judging automation
Knowledge of AWS cloud services, particularly compute, storage, orchestration, and monitoring solutions
Familiarity with vector databases and principles of semantic similarity metrics
Background in Python or Java, with the capability to apply these in automated testing environments
Understanding of KPI formulation and the ability to translate KPIs into actionable, automated testing logic
Skills in defining and improving testing pipelines within complex AI systems
Excellent command of written and spoken English (B2+ level)

Nice to have

Proficiency in LangChain/LlamaIndex frameworks or comparable RAG frameworks
Understanding of CI/CD workflows and pipelines

Looking for something else?

Find a vacancy that works for you. Send us your CV to receive a personalized offer.

Find me a job