Skip To Main Content
backBack to Search

Senior AI & LLM Test Automation Engineer

Hybrid in Ukraine
AI Engineering
& 9 others
hot

We are seeking an experienced Senior AI & LLM Test Automation Engineer to optimize and expand an automated testing framework for a Retrieval-Augmented Generation (RAG) Knowledge Base deployed in AWS. This role involves leveraging RAGAS metrics and LLM-as-a-judge techniques to ensure the accuracy, relevance, safety, and scalability of AI systems, with a hands-on approach to designing, implementing, and monitoring effective test solutions.

Responsibilities
  • Review existing LLM/RAG test automation workflows, identify gaps, and design an improved testing architecture
  • Implement automated test pipelines that utilize RAGAS for retrieval/generation evaluation and LLM-as-a-judge for subjective quality and safety checks
  • Integrate testing pipelines with AWS services such as S3, Lambda, CloudWatch, OpenSearch, RDS, and SQS
  • Define and manage evaluation rubrics, set metric thresholds, implement regression alerting systems, and generate reporting dashboards
  • Ensure scalability, reproducibility, and continuous quality monitoring within CI/CD pipeline environments
  • Collaborate with AI/ML engineers, DevOps teams, and product stakeholders to align testing metrics with project goals
  • Stay updated on advancements in LLM and RAG evaluation frameworks, integrating relevant improvements into the testing processes
  • Troubleshoot, document, and resolve automation framework issues to ensure robust performance and reliability
  • Contribute to knowledge sharing, documentation, and training sessions related to the RAG testing framework
Requirements
  • 3+ years of proven experience in LLM and RAG evaluation frameworks, including RAGAS and prompt-based judging automation
  • Knowledge of AWS cloud services, particularly compute, storage, orchestration, and monitoring solutions
  • Familiarity with vector databases and principles of semantic similarity metrics
  • Background in Python or Java, with the capability to apply these in automated testing environments
  • Understanding of KPI formulation and the ability to translate KPIs into actionable, automated testing logic
  • Skills in defining and improving testing pipelines within complex AI systems
  • Excellent command of written and spoken English (B2+ level)
Nice to have
  • Proficiency in LangChain/LlamaIndex frameworks or comparable RAG frameworks
  • Understanding of CI/CD workflows and pipelines
Looking for something else?

Find a vacancy that works for you. Send us your CV to receive a personalized offer.

Find me a job