We are seeking an experienced Senior AI & LLM Test Automation Engineer to optimize and expand an automated testing framework for a Retrieval-Augmented Generation (RAG) Knowledge Base deployed in AWS. This role involves leveraging RAGAS metrics and LLM-as-a-judge techniques to ensure the accuracy, relevance, safety, and scalability of AI systems, with a hands-on approach to designing, implementing, and monitoring effective test solutions.
Responsibilities
- Review existing LLM/RAG test automation workflows, identify gaps, and design an improved testing architecture
- Implement automated test pipelines that utilize RAGAS for retrieval/generation evaluation and LLM-as-a-judge for subjective quality and safety checks
- Integrate testing pipelines with AWS services such as S3, Lambda, CloudWatch, OpenSearch, RDS, and SQS
- Define and manage evaluation rubrics, set metric thresholds, implement regression alerting systems, and generate reporting dashboards
- Ensure scalability, reproducibility, and continuous quality monitoring within CI/CD pipeline environments
- Collaborate with AI/ML engineers, DevOps teams, and product stakeholders to align testing metrics with project goals
- Stay updated on advancements in LLM and RAG evaluation frameworks, integrating relevant improvements into the testing processes
- Troubleshoot, document, and resolve automation framework issues to ensure robust performance and reliability
- Contribute to knowledge sharing, documentation, and training sessions related to the RAG testing framework
Requirements
- 3+ years of proven experience in LLM and RAG evaluation frameworks, including RAGAS and prompt-based judging automation
- Knowledge of AWS cloud services, particularly compute, storage, orchestration, and monitoring solutions
- Familiarity with vector databases and principles of semantic similarity metrics
- Background in Python or Java, with the capability to apply these in automated testing environments
- Understanding of KPI formulation and the ability to translate KPIs into actionable, automated testing logic
- Skills in defining and improving testing pipelines within complex AI systems
- Excellent command of written and spoken English (B2+ level)
Nice to have
- Proficiency in LangChain/LlamaIndex frameworks or comparable RAG frameworks
- Understanding of CI/CD workflows and pipelines
Looking for something else?
Find a vacancy that works for you. Send us your CV to receive a personalized offer.
Find me a job