We are seeking a Lead Data Software Engineer to lead the evolution and maturity of data solutions in a dynamic GCP environment.
The role involves implementing orchestration with Airflow, optimizing data ingestion pipelines, and driving the delivery of scalable, high-performance systems.
Responsibilities
- Improve architecture to enhance the scalability and maturity of data solutions
- Implement orchestration for workflows using Airflow for efficient processes
- Build secure REST services in Python 3.x with technologies like FastAPI and asynchronous patterns
- Optimize data pipelines and workflows by leveraging BigQuery's advanced features like partitioning, clustering, and query tuning
- Design high-load systems with a focus on throughput, backpressure, and scalability
- Utilize GCP services such as Cloud Functions, Pub/Sub, and Cloud Storage to build robust data solutions
- Develop end-to-end ETL pipelines with job orchestration frameworks like Airflow
- Drive the design process from RFCs to full operational runbooks with clear documentation and ownership
Requirements
- 7+ years professional software experience; 4+ years of expertise in Python 3.x within production environments
- Knowledge of FastAPI and Pydantic, with proficiency in async patterns and secure REST APIs
- Competency in SQL, data modeling, and hands-on experience with BigQuery (partitioning, clustering, query optimizations)
- Production expertise in GCP services including Cloud Functions, Pub/Sub, Cloud Storage, IAM/Secret Manager, Cloud Build, or equivalent AWS/Azure stacks
- Background in designing high-load systems with solutions for backpressure, idempotency, and scalability
- Experience working on ETL pipelines using tools like Airflow for job orchestration
- Showcase of excellent communication, ownership, and problem-solving skills
Nice to have
- Familiarity with parsing file formats at scale like CSV, Excel, or XML along with schema evolution/versioning techniques
- Understanding of data quality frameworks such as assertions or expectations and tools like dbt or Dataform
- Background in event-driven architectures or streaming technologies; experience with Beam/Dataflow is a plus
- Expertise in concurrency and performance profiling in Python using asyncio or multiprocessing
- Knowledge of IaC practices using Terraform or SRE methodologies
- Awareness of security concepts such as authentication/authorization, service accounts, and principles of least privilege
Looking for something else?
Find a vacancy that works for you. Send us your CV to receive a personalized offer.
Find me a job