ML Data Pipeline Engineer
3 semanas atrás
We're seeking a Data Pipeline Engineer to own and evolve our exercise recognition training data infrastructure. You'll manage the end-to-end pipeline that collects, synchronizes, validates, and prepares IMU sensor and video data for ML model training.
This role combines systems engineering, data quality automation, and hands-on problem-solving in a production environment.
What You’ll Do
Pipeline Operations & Improvement
- Maintain and enhance our multi-source data collection system: IMU sensors (via mobile app) and synchronized video streams from gym-based cameras.
- Improve video capture software robustness, particularly handling network interruptions and operational monitoring.
- Deploy and monitor services in remote Linux environments with appropriate DevOps practices.
Data Quality & Validation
- Evolve our Python-based QC engine that validates data pre- and post-annotation
- Implement checks for IMU-video time synchronization, sensor health, and measurement consistency
- Apply digital signal processing techniques to identify sensor failures, connectivity issues, and measurement irregularities.
- Develop validation logic comparing annotations against sensor data to ensure temporal alignment.
Analysis & Troubleshooting
- Perform ad-hoc analysis on ~1,200+ workout tasks to classify failure modes
- Identify whether issues stem from pipeline bugs, sensor problems, or annotation errors
- Prioritize engineering work based on data quality impact and coordinate with annotation team on fixes
Tooling and Visualization
- Maintain and extend our NextJS UI serving annotators, data scientists, and stakeholders
- Create visualizations (Chart.js) for QC metrics and signal analysis
- Integrate with LabelStudio annotation interface
What You Bring
Required
- Strong Python programming skills, particularly for data processing pipelines
- Experience with time-series data and digital signal processing
- Comfortable working in Linux environments and deploying/monitoring remote services
- Ability to debug complex multi-component systems (sensors, video, networks, sync)
- Data quality mindset: designing validation rules, tracking metrics, investigating anomalies
- SQL/database experience for managing pipeline metadata
Highly Valued
- Video processing experience (RTSP streams, encoding, OCR)
- Working with sensor/IoT data and handling connectivity challenges
- NextJS or modern web frameworks for data tooling
- DevOps practices: containerization, monitoring, logging, alerting
- Experience with annotation pipelines and ML training data workflows
- Background in biomechanics, sports science, or wearable sensors
Tech Stack
- Languages: Python (primary), JavaScript/TypeScript (NextJS UI)
- Data: IMU sensor streams, video (RTSP), time-series analysis, DSP
- Tools: LabelStudio, Chart.js, Linux/bash, OCR libraries
- Infrastructure: Remote deployment, monitoring systems
You'll Thrive Here If You
- Enjoy detective work: diagnosing why data doesn't match expectations
- Balance pragmatism with quality: shipping improvements while maintaining reliability
- Communicate well across technical and non-technical stakeholders
- Can work autonomously in a small, mission-driven team
-
Data Engineer
Há 23 horas
Recife, Brasil HeartCentrix Solutions Tempo inteiroWe are seeking a highly skilled Python Data Engineer with an AI/ML focus to join our client's growing data & analytics team in Brazil. This role is ideal for someone who loves building scalable data pipelines, operationalizing machine learning workflows, and partnering closely with data scientists to bring models into production. You will design, develop,...
-
Data Engineer
Há 5 dias
Recife, Brasil HeartCentrix Solutions Tempo inteiroWe are seeking a highly skilled Python Data Engineer with an AI/ML focus to join our client’s growing data & analytics team in Brazil. This role is ideal for someone who loves building scalable data pipelines, operationalizing machine learning workflows, and partnering closely with data scientists to bring models into production. You will design, develop,...
-
Advanced Data Pipeline Developer
1 dia atrás
Recife, Brasil Bebeedataengineer Tempo inteiroJob OverviewWe are seeking a data pipeline engineer to own and evolve our exercise recognition training data infrastructure.Maintain and enhance our multi-source data collection system with IMU sensors and synchronized video streams from gym-based cameras.Improve video capture software robustness, particularly handling network interruptions and operational...
-
Advanced Data Pipeline Developer
Há 23 horas
Recife, Brasil beBeeDataEngineer Tempo inteiroJob Overview We are seeking a data pipeline engineer to own and evolve our exercise recognition training data infrastructure. Maintain and enhance our multi-source data collection system with IMU sensors and synchronized video streams from gym-based cameras. Improve video capture software robustness, particularly handling network interruptions and...
-
Data Pipeline Engineer
2 semanas atrás
Recife, Brasil beBeeDevelopment Tempo inteiroSoftware Development Specialist We are seeking a skilled Software Developer to design and develop reliable data pipelines for ingesting, mapping, and enriching large datasets from multiple external platforms. Create automation scripts and data ingestion pipelines from external dashboards or portals. Build data mappers that convert multiple schemas into a...
-
Data Engineer – Databricks
Há 23 horas
Recife, Brasil UPBI Data & AI Tempo inteiroA UPBI Data & AI , consultoria especializada em soluções digitais, parceira Microsoft e Databricks, está buscando um(a) Data Engineer com sólida experiência em Databricks e Azure para atuação em projeto internacional estratégico.Modelo: RemotoRegime: PJResponsabilidades:Desenvolver e otimizar pipelines de dados utilizando Databricks e serviços Azure...
-
Data Pipeline Engineer
2 semanas atrás
Recife, Brasil Bebeedevelopment Tempo inteiroSoftware Development SpecialistWe are seeking a skilled Software Developer to design and develop reliable data pipelines for ingesting, mapping, and enriching large datasets from multiple external platforms.Create automation scripts and data ingestion pipelines from external dashboards or portals.Build data mappers that convert multiple schemas into a...
-
Advanced Data Pipeline Developer
Há 2 dias
Recife, PE, Brasil beBeeDataEngineer Tempo inteiroJob Overview We are seeking a data pipeline engineer to own and evolve our exercise recognition training data infrastructure. Maintain and enhance our multi-source data collection system with IMU sensors and synchronized video streams from gym-based cameras. Improve video capture software robustness, particularly handling network interruptions and...
-
Senior Data Scientist Role
2 semanas atrás
Recife, Brasil Bebeemachine Tempo inteiroMachine Learning EngineerThis role entails designing, developing, and deploying large-scale machine learning systems and end-to-end pipelines on AWS.Collaborate with data scientists, software engineers, and product teams to define requirements, select algorithms, and deliver impactful ML solutions.Design, optimize, and maintain ML infrastructure –...
-
Lead Data Pipeline Developer
Há 23 horas
Recife, Brasil beBeeData Tempo inteiroKey Roles of a Data Engineer A multinational corporation is seeking an experienced data engineer to spearhead the design, development, and maintenance of large-scale data pipelines. Primary responsibilities include: Collaborating with cross-functional teams to integrate data into business applications; Developing and implementing robust data quality...