Skip To Main Content
backBack to Search

Senior AI Engineer, Agentic and RAG Systems

Remote in Georgia, & 4 others
AI Engineering
hot
Looking for something else?

Find a vacancy that works for you. Send us your CV to receive a personalized offer.

Find me a job

We are seeking a hands-on Senior AI Engineer who designs, builds, and operates production GenAI systems – agentic workflows, RAG pipelines, and LLM-backed services with real users and real SLAs. This is an engineering role, not a research role. The bar is reliability, latency, cost, observability, and safe deployment at scale, with end-to-end ownership from architecture through on-call. Typical workloads include enterprise knowledge platforms, conversational analytics, agentic automation, and LLM-augmented data products.

Responsibilities
  • Design agent orchestration (graph/state, conditional routing, tool calling, memory, checkpointing) in LangGraph / LangChain or equivalent
  • Build production RAG end-to-end: chunking, embeddings, vector stores, hybrid retrieval, reranking, caching, and grounded synthesis
  • Own Python / FastAPI services – async, SSE streaming, session handling, and structured error contracts
  • Instrument with tracing and evaluation harnesses (MLflow, OpenTelemetry, or equivalent) for accuracy, cost, and regression
  • Ship on Docker + Kubernetes (EKS/AKS/GKE) via CI/CD with test, eval, and canary gates
  • Drive LLM cost engineering – model routing, prompt optimization, caching, token accounting, and build-vs-buy decisions
  • Apply GenAI safety & governance: hallucination control, prompt-injection defense, PII handling, and HITL where required
  • Partner with data engineering on semantic layers and pipelines (PySpark / SQL where applicable)
Requirements
  • 5+ years in software engineering, with 2+ years shipping production LLM / agentic systems (not POCs or research)
  • Proficiency in Python and FastAPI (async, REST, SSE)
  • Production expertise in LangChain and LangGraph (or equivalent serious production experience with LlamaIndex, AutoGen, or MCP stacks)
  • Background in production RAG: embeddings, chunking, and hybrid retrieval with reranking and caching
  • Skills in vector databases such as Pinecone, Weaviate, pgvector, OpenSearch, or Databricks Vector Search
  • Knowledge of at least one major LLM provider in production – AWS Bedrock (preferred), OpenAI / Azure OpenAI, or Anthropic – with model selection and routing trade-offs
  • Competency in Kubernetes and Docker in real production environments (EKS/AKS/GKE)
  • Expertise in cloud engineering on AWS
  • Familiarity with observability and tracing tools (MLflow, LangSmith, OpenTelemetry), evaluation harnesses, and latency/cost ownership
  • Capability to build CI/CD for AI systems (GitHub Actions, Jenkins, or equivalent) with test/eval gates
  • Strong written and spoken English (B2 level); able to own design discussions with engineering and business stakeholders independently
Nice to have
  • Databricks depth – MLflow (tracking & serving), Vector Search, Unity Catalog / Metric Views, PySpark / SQL
  • Experience with LLM fine-tuning – PEFT, LoRA, QLoRA
  • Understanding of MCP servers and tool integration
  • Qualifications in GenAI governance & FinOps – auditability, prompt-injection hardening, PII, and token cost in regulated environments
  • Background in classical ML / DL – NLP, BERT-family, time-series, and CV