Skip To Main Content
backBack to Search

Lead Data Software Engineer

Remote in Colombia, & 2 others
Data Software Engineering& 10 others
Looking for something else?

Find a vacancy that works for you. Send us your CV to receive a personalized offer.

Find me a job

We are seeking a Lead Data Software Engineer to come on board with our team.

This position centers on developing and maintaining data infrastructure that fuels AI-powered products and intelligent agent systems. You'll get the chance to engage with state-of-the-art technologies and help shape scalable, dependable platforms within a cooperative setting.

Responsibilities
  • Plan, build, and support data ingestion and processing pipelines that supply RAG systems, covering the management of unstructured data, images, videos, metadata, and permissions
  • Oversee and fine-tune vector database infrastructure, such as Amazon Kendra alongside an active migration toward OpenSearch
  • Build evaluation datasets and performance measurement frameworks tailored to agents
  • Establish monitoring and observability pipelines for AI workloads, including dashboards for latency, quality, and cost
  • Roll out data governance, privacy guardrails, and quality controls for AI training and inference data
  • Back A/B testing and experimentation infrastructure used to evaluate agent iterations
  • Work jointly with Backend AI engineers on data schemas and embedding approaches
Requirements
  • At least 5 years of data engineering background, including direct work with AI/ML data infrastructure
  • A minimum of one year guiding and managing development teams
  • Solid Python expertise for crafting data pipelines, ETL workflows, and backend automation scripts
  • Practical production experience with vector databases, covering schema design and index management for Amazon Kendra or OpenSearch
  • Thorough grasp of search and retrieval concepts, including embedding models, chunking techniques, and retrieval optimization
  • Working familiarity with AWS services like S3, Glue, Athena, and Kinesis (or equivalents), as well as Docker and distributed data environments
  • Experience treating data quality practices such as monitoring, validation, and lineage tracking as operational standards
  • Background in defining AI/ML evaluation metrics and setting up systematic tracking using evaluation frameworks
  • English language proficiency in writing and speaking at B2+ level or higher
Nice to have
  • Exposure to LangSmith, RAGAS, or custom-built evaluation framework approaches
  • Experience with multi-modal data processing involving unstructured text, images, and videos, together with related governance
  • Hands-on participation in LLM fine-tuning data preparation
  • Familiarity with observability tools tightly integrated with AI calls, such as Langfuse or Arize
  • Background in constructing streaming data pipelines with technologies like Kafka or Kinesis