Senior AI QA Engineer (Automation & Manual, AI-based Applications Testing)
Remote in Poland
Testing of AI-based applications& 6 others
Looking for something else?
Find a vacancy that works for you. Send us your CV to receive a personalized offer.
Find me a jobWe are seeking a skilled Senior AI QA Engineer with strong experience in both manual and automated testing and extensive exposure to AI-based application testing. The ideal candidate will test a variety of applications, including projects involving AI agents and integrations with APIs and databases. You will help ensure our solutions are reliable and accurate and meet business requirements, while also contributing to the development of our automation capabilities.
This is a fully remote position with a requirement to work from 13:00 to 21:00 Polish time, due to the client team's location.
Responsibilities
- Research and evolve automation frameworks in line with Gen AI tooling and best practices
- Design and automate evaluation of Gen AI features — grounding, answer accuracy, determinism/reproducibility, precision, recall, and criteria recall
- Build automated LLM test harnesses that scale evaluation beyond human-in-the-loop
- Selection and application of Gen AI evaluation frameworks, measuring answer quality and pipeline efficiency
- Perform manual testing as needed to validate new features, integrations, and user stories
- Build and maintain test cases from requirements and user stories
- Test applications that may include AI agents, APIs, databases, and other integrations
- Collaborate with product, engineering, and operations teams to understand requirements and deployment environments
- Track and report test results, defects, and quality metrics
- Assist with troubleshooting production issues and escalate risks as needed
- Guide and support team members, including onshore and offshore consultants
Requirements
- 5+ years of experience in software QA, with at least 1 year focused on testing AI agents, agentic solutions or LLM-based systems
- Hands-on experience with both manual and automated testing of AI agents, including prompt/instruction testing and evaluation of agentic workflows
- Strong programming skills in Python test automation — pytest or equivalent, scripting and AI/ML library integration
- Expertise in AI agent frameworks, prompt engineering and evaluation metrics for LLM-based systems
- Demonstrated experience testing and evaluating Gen AI / LLM applications — grounding, answer accuracy and hallucination/determinism checks
- Applied knowledge of Gen AI / LLM evaluation frameworks and metrics — precision, recall, criteria recall and efficiency
- Familiarity with issue and test management tools such as Jira, QMetry and TestRail
- Experience with version control systems and integrating tests into CI/CD pipelines
- Flexibility to use AI-powered tools for QA such as GitHub Copilot and LLM-based test generation
- Understanding of cloud environments, particularly AWS
- Excellent communication, collaboration and leadership skills
Nice to have
- Experience with agentic AI platforms such as LangChain, OpenAI Function Calling or similar
- Experience with AI safety, bias and reliability testing
- Experience with test data generation for AI/ML systems
